I want to quantify the abundance of reads mapping downstream of genes with kallisto. I have RNA seq data that contains reads arising from read-through transcription (transcription downstream of transcript 3' ends).
I use two different transcriptome files:
One reference transcriptome (containing only the real, genic transcript sequences)
One modified transcriptome, containing the exact same sequences + sequences of downstream regions
This means that the second, modifed transcriptome has the same genic target sequences + a number of intergenic target sequences.
My problem is:
For some samples the number of pseudo-aligned reads is higher when i use the non-modified transcriptome, despite the modified transcriptome contains the EXACT same sequences, just with a few other target sequences more. I wonder how this is possible, as both de Bruijn graphs contain the same target sequences, the number of pseudo-aligned reads should be equal or higher, not lower. I expected some of the reads, that originally map to genic target sequences when quantified with the non-modified transcriptome, to be aligned to intergenic regions, as the equivalence class of transcripts for this read might be extended with intergenic target sequences.
I double checked if my transcriptome files really contain the same sequences. I would be glad if someone could explain me how it is possible, that some reads cannot be aligned with my modified transcriptome, despite containing the same target sequences.