The effect of heterozygosity on genome alignment
Hi, my question may be fairly simple yet I haven't found a clear answer.
Let's say I have short-read genome sequencing data from a highly heterozygous diploid organism and I align it against a haploid reference, how will this affect the alignment, and what strategies exist to deal with this?
Will the heterozygosity just give an alignment with lower quality and coverage?
How can I distinguish between a bad alignment or an artifact of the aligner and something caused by the heterozygosity of the sample?
Also, I understand that bwa can handle chimeric reads (I'm sure the organisms I'm working with have major structural polymorphism and I'm interested in studying such polymorphisms), however, I'm not sure about how bwa handles them.
The SAM format specification mentions that chimeric reads are flagged as supplementary alignments, but it also says "The decision regarding which linear alignment is representative is arbitrary". How the choice of which alignment is selected as the representative one may affect the further analysis?
• 11 views