BWA-MEM - Improperly Paired Reads


After aligning paired-end 100bp reads to a reference genome, I am getting very low properly paired percentage:

369208441   0   total (QC-passed reads + QC-failed reads)

8985531 0   secondary

289733341   0   mapped

78.47%  N/A mapped %

360222910   0   paired in sequencing

180111455   0   read1

180111455   0   read2

1393338 0   properly paired

0.39%   N/A properly paired %

280747810   0   with itself and mate mapped

0   0   singletons

0.00%   N/A singletons %

39590468    0   with mate mapped to a different chr

0   0   with mate mapped to a different chr (mapQ>=5)

I followed GATK best practices to align paired-end short-read data to a reference genome. I downloaded the short-read data from NCBI SRA into fastq files using SRA toolkit's fastq-dump, converted the fastq files into unmapped bam using Picard FastqToSam, and marked adapters using Picard MarkIlluminaAdapters. I then piped Picard SamToFastq, bwa mem, and Picard MergeBamAlignment. To get stats on the alignment, I used samtools flagstat. For several of my samples, the alignment went great (90% mapped, 80% properly paired). However, for a couple of my samples, the properly paired percentage was well below 1%. I'm wondering how I could have a normal amount of reads mapping (~78%) but have only .39% of those reads properly paired.

I have double-checked that my fastq files from fastq-dump have identical read counts, and that they are properly interleaved after Picard FastqToSam. I additionally ran Picard ValidateSamFile to troubleshoot the file output by MergeBamAlignment and found no errors.







Source link