Quantification of stranded RNA-seq data



I downloaded a dataset from my species of interest (link). I checked the publication but no information was given about the library preparation they used. To infer the standess of my sample I ran salmon on the data. Salmon then inferred the most likely library type to be ISR (inward, stranded reverse)

After trimming my data I aligned the samples using hisat2 and performed quantification using featureCounts using the following commands:

hisat2 --dta --rf  -x ../indexes/cro_index -1 fastq_trimmed/$SRA_1_paired.trimmed.fastq -2 fastq_trimmed/$SRA_2_paired.trimmed.fastq | samtools view -C -T ../cro_v2_asm.fasta - | samtools sort > cram_files_trimmed/$SRA.cram;

featureCounts -p -a -s2 transcriptome_assembly.mRNA.gtf -t exon -g gene_id -o featureCounts/$SRA.tsv cram_files_trimmed/$SRA.cram

However, only around 30-40~ of the aligned reads were assigned to a feature. Reruning featureCounts without the -s2 option increased the assignment to 60-70% which is more in line with other samples from the same species.

I'm not sure if I'm misinterpreting the -s option in featureCounts and the -s2 option only considers reads mapped in the reverse strand. Should featureCoutns be used with -s 0 in this case? Am I missing something?

Thanks in advance






Source link