Removing paralogs from multiple sequence alignments
I will be creating multiple sequence alignments for each single-copy ortholog for a group of closely related species that have had their genomes sequenced and reads aligned to a common reference genome. What is the best way to identify genes/exons from the reference assembly that may not be single-copy orthologs in all species (i.e., paralogs resulting from duplication)? I want to be sure to exclude these from my multiple sequence alignments. Is there a rule of thumb for filtering based on expected coverage depth? Is there any software that automates this process?
• 15 views