Thanks for your answer 🙂

This makes sense and I understand there can be duplicated regions of sequence. It was my understanding though that these tools did not detect sequence identity - they simply flag up reads if the 5' mapped coordinate is the same as that of another read. Even duplicated regions on different chromosomes would have different coordinates... they will be millions of bases apart in the concatenated genomic reference.

But maybe the tools are sorting by both coordinates and sequence?



Source link