I have a couple of hundreds metagenomes from multiple environments and I am interested in comparing the fragment counts of a single (bacterial) gene in order to see which metagenomes have higher vs lower abundances. I do not care about comparing the fragment counts of different genes within the same sample or between samples. My comparisons will be one gene at a time.
That being said, I am trying to figure out the best way to normalize the metagenomes. Considering that I will always compare one gene at a time, I feel that normalizing by gene length (e.g. FPKM) is irrelevant. However, I would imagine that my main focus should be (fragment counts of gene of interest)/(total bacterial counts) or sth similar
I have seen in some papers that people use the 16S counts as total bacterial counts but I feel this is wrong; the 16S copies within each species may vary and therefore the bacterial composition across samples can significantly affect this value. As an alternative I was thinking sth like mapping the fastq reads to NR in order to figure out the number of bacterial reads in each sample
I have never done that before and any advice would be very welcome. I apologize for any ignorant questions or misconceptions, just trying to figure out what is the best strategy. I also see in many forums people are very angry at FPKM... which I think it could work just fine for me, especially if I cared about length...
Thanks in advance