Problems associated with handling of missing characters on bcftools consensus and vcf-consensus


I intend to construct a species-level phylogeny using an exome dataset which has multiple individuals sampled for each species. I have generated a vcf file for each species post the mpileup step. However, when I use the script, the script seems to split the sequences at the individual level yet again (I have multiple tips corresponding to each individual sampled for all species on the phylogeny).

If your sequencing has missed spots in comparison to the reference, these tool (bcftools consensus and vcf-consensus) replaces the character "N" on the VCF file with corresponding spots from the reference when I create a fasta file. This alters the distance matrix and by default makes it more closely related to the reference than it actually is. How do I fix this?





Source link