high duplication events in shotgun metagenomics


Hi all,

I am working on a project that aims to recover Metagenome assembled genomes using shotgun metagenomics. During the quality analysis using FastQC I found high duplication levels (percent of sequences remaining if deduplicated is 27%). However, from previous amplicon metagenomics, I know that I am working with a low diversity sample in which one taxon has a high relative abundance (>70%).

I've read in some papers that duplication can harm assembler performance (both in computational cost and assembly quality). I know that everyone has a different take on deduplication, but I still wanted to ask for some recommendations on this.

Also, since I am relatively new in bioinformatics, I would find really helpful if you could share any preferred workflows or tools that you use for deduplication.





Source link