I use deepTools regularly in my pipelines. For ATAC-seq data I noticed a potential downside to the
bamCoverage function. My sequence read length is 2x42 and alignment paired end. With the assumption that deepTools bamCoverage would align read fragments according to mated pairs, I found a small detail that may conflict with ATAC-seq datasets. Read length distribution follows a sub-, mono-, di-nucleosomal pattern (below).
The problem occurs in that deepTools default setting for paired-end is to align read mates > 4X read length which would cut my library in half and these reads will be considered single end. I think this results in a "choppy" feature on the genome browser and ultimately is incorrect (below). Moving forward I will use the
-e option to explicitly inform the command to extend all reads. Below is a genome browser track showing the differences in this option.
Here is some information about the commands:
- Insert Size Distribution:
samtools view .bam | awk '$9>0' | cut -f 9 | sort | uniq -c | sort -b -k2,2n | sed -e 's/^[ t]*//' > out.txt
bamCoverage -b .bam -o .bw --normalizeUsing RPGC --effectiveGenomeSize 2913022398 -bs 10 -p max
bamCoverage -b .bam -o .bw --normalizeUsing RPGC --effectiveGenomeSize 2913022398 -bs 10 -e -p max