I am trying to normalize my virus-metagenomics raw counts based on ROUX et al, 2017:

The authors normalize raw counts by contig size. Afterwards, they transform it to RPKM (edgeR) as correction for different library sizes:

Before calculating any index, the read counts were first normalized by
the contig length, since viral genome lengths can be highly variable
(∼2 orders of magnitude, Angly et al., 2009).

Then, to account for potential differences in library sizes, we
compared five different methods: (i) a simple normalization in which
counts are divided by the library size, “Normalized” (ii) a method
specifically designed to account for under-sampling of metagenomes,
from the metagenomeSeq R package, “MGSeq” (iii and iv) two methods
designed to minimize log-fold changes between samples for most of the
populations, from the edgeR R package, “edgeR”, and the DESeq R
package, “DESeq”, and (v) a rarefaction approach whereby all
libraries get randomly down-sampled without replacement to the size of
the smallest library, “Rarefied” (Fig. S2).

Problem: Elsewhere, Rasmussen et al, 2019 follow Roux et al, although they affirm that RPKM normalization is done to account for contig size, not library size (they even cite Roux et al, 2017);

Prior any analysis the raw read counts in the vOTU-tables were
normalized by reads per kilobase per million mapped reads (RPKM) [48],
since the size of the viral contigs is highly variable [49]

Please, help me...
Which one is correct? Am I missing any "between-the-lines" info?

Thanks!



Source link