I'm afraid I have found myself in a compromising situation. Having started my PhD without much knowledge of RNASeq and set about fulfilling the study design I was assigned, I was not knowledgeable enough at this early stage to note the importance of the strandedness option.
I've sequenced a good number of patient samples as per the best protocol for assessment of splicing and DGE and moving forward as I was advised to do, with using the GTEx data as control. I'm now noticing the the gene expression is not analogous between these techniques, genes on the reverse strand aren't picked up in the analysis and it's my understanding that these reads are assigned to the forward strand genes which they overlap, if applicable.
Based on the information obtained so far - STAR cannot be re-run on the patient samples getting rid of the strandedness information for alignment. Nor, can I go back and re-sequence the GTEx data.
According to this post TruSeq strand-specificity in rsem-calculate-expression I can use the --forward-prob" parameter set to 0.5 for a non-strand-specific protocol. (Default: 0.5).
With this, RSEM seems to be able to either A) force the reads onto the forward strand only, correcting the problem, whilst unfortunately losing specificity.
or B) Only count the reads aligned to the fwd strand.
Can anyone tell me which, if either of these statements is true, or alternatively suggest and alternative.