gravatar for Sam

2 hours ago by

An alignment I was running of 150 bp paired-end data took a lot of time (more than a week) to run. As the stage that took it too much time in the log file was "reporting output tracks" , I thought to limit the output of tophat via "-g 1" (--max-multihits), thinking this is the reporting of all combinations of all paired reads that takes time.

However, the alignment results are different between the two options.

For -g 20

Left reads:
Input   :  19149286
Mapped  :  18242457 (95.3% of input)
of these:  2389313 (13.1%) have multiple alignments (602054 have >20)

While for -g 1

Left reads:
Input    :  19149286
Mapped   :  18210924 (95.1% of input)
of these :  1422182 ( 7.8%) have multiple alignments (1434114 have >1)

The number of uniquely reads is larger for -g 1, than for -g 20.

If -g influences the alignment stage, not only the report stage, in what way does it?
This was asked a long time ago

seqanswers.com/forums/showthread.php?t=44608

Tophat multiple alignment and mapping rates

But was not resolved successfully.

link

modified 39 minutes ago

written
2 hours ago
by

Sam30



Source link