In an attempt to filter out DNA contamination from an RNA-seq experiment, I am looking for a way to extract ONLY aligned reads that contain Annotated splices.
While of course this will eliminate much of the data, the idea is that the only aligned reads that we can be absolutely sure originated from mRNA rather than DNA will contain an annotated splice.
I understand that STAR produces the splice output 'SJ.out.tab'- while this is somewhat useful, I need a way to find the read ID for each entry here.
In short, I would like the read ID for each read that contributes to the 'Number of splices: Annotated (sjdb) | ' field within 'Log.final.out'.
Thanks in advance!!