gravatar for gable_works

2 hours ago by


I use deepTools regularly in my pipelines. For ATAC-seq data I noticed a potential downside to the bamCoverage function. My sequence read length is 2x42 and alignment paired end. With the assumption that deepTools bamCoverage would align read fragments according to mated pairs, I found a small detail that may conflict with ATAC-seq datasets. Read length distribution follows a sub-, mono-, di-nucleosomal pattern (below).


The problem occurs in that deepTools default setting for paired-end is to align read mates > 4X read length which would cut my library in half and these reads will be considered single end. I think this results in a "choppy" feature on the genome browser and ultimately is incorrect (below). Moving forward I will use the -e option to explicitly inform the command to extend all reads. Below is a genome browser track showing the differences in this option.

enter image description here

Here is some information about the commands:

  • Insert Size Distribution:
    samtools view [].bam | awk '$9>0' | cut -f 9 | sort | uniq -c | sort -b -k2,2n | sed -e 's/^[ t]*//' > out.txt
  • default-setting:
    bamCoverage -b [].bam -o [].bw --normalizeUsing RPGC --effectiveGenomeSize 2913022398 -bs 10 -p max
  • extendReads:
    bamCoverage -b [].bam -o [].bw --normalizeUsing RPGC --effectiveGenomeSize 2913022398 -bs 10 -e -p max

Source link