Hi everyone,

I'm trying to do an analysis to see which parameters of the RNA-seq are better for my samples. To do that, I have 9 samples that I sequenced with 20M reads, 150bp and paired-end. I wanted to see which parameters generate more accurate results (paired-end vs single-end, 150bp vs 100bp vs 50bp and 20M vs 10M reads).

To do that, I did a hard-trimming (to obtain 100bp and 50bp), taken only the first 10M reads instead of the 20M and taken only the first fastq (to generate the single-end) and run the same programs: trim_galore for the trimming, STAR for the mapping and RSEM for the quantification, all with the same parameters (only changing the parameters regarding the paired-end).

The results are that beginning with 50bp reads generates more transcript counts than beginning with 150bp reads, and that single-end generates more transcript counts than the paired-end. I'm a bit concerned, since I don't know how single-end reads could generate more transcripts than single-end, and I think I'm analyzing something wrong, do you know how could I do this type of analysis?

Thank you all very much,

Source link