An alignment I was running of 150 bp paired-end data took a lot of time (more than a week) to run. As the stage that took it too much time in the log file was "reporting output tracks" , I thought to limit the output of tophat via "-g 1" (--max-multihits), thinking this is the reporting of all combinations of all paired reads that takes time.
However, the alignment results are different between the two options.
For -g 20
Left reads:
Input : 19149286
Mapped : 18242457 (95.3% of input)
of these: 2389313 (13.1%) have multiple alignments (602054 have >20)
While for -g 1
Left reads:
Input : 19149286
Mapped : 18210924 (95.1% of input)
of these : 1422182 ( 7.8%) have multiple alignments (1434114 have >1)
The number of uniquely reads is larger for -g 1, than for -g 20.
If -g influences the alignment stage, not only the report stage, in what way does it?
This was asked a long time ago
seqanswers.com/forums/showthread.php?t=44608
Tophat multiple alignment and mapping rates
But was not resolved successfully.