In my PhD thesis, I am working with differential expression analysis, trying to verify genes related to thermoregulation in beef cattle.
I have 66 sequenced, paired-end RNA samples, divided into 2 lanes per sample.
The first steps were to check the quality of these samples by FASTQC, then do the trimming (using the Trimmomatic software with the parameters: LEADING: 30 TRAILING: 30 SLIDINGWINDOW: 5: 20 AVGQUAL: 20 MINLEN: 50) and then checking the quality again of these samples by FASTQC.
The FASTQC reports did not show anything out of the ordinary, so I proceeded to align the samples using the Hisat2 software.
All of these steps were done on the Linux terminal in bash.
Post alignment I went to the R software, to count reads with FeatureCounts, the command I used was as follows:
fc <- featureCounts(files = fls, annot.ext ='/home/.../Bos_taurus.ARS-UCD1.2.102.gtf' , isGTFAnnotationFile = TRUE, GTF.featureType = "exon", GTF.attrType = "gene_id", useMetaFeatures = TRUE, strandSpecific = 0, isPairedEnd = TRUE, nthreads = 30)
As my samples were in two lanes, I added the read count by samples with the following command:
cont_fc2=(cont_fc[1:(ncol(cont_fc)-1)] + cont_fc[2:ncol(cont_fc)])[c(T,F)]
That done, I started to analyze the differential expression with Deseq. The results I got were few genes differentially expressed as in the example below:
out of 20328 with nonzero total read count adjusted p-value <0.1 LFC> 0 (up): 1, 0.0049% LFC <0 (down): 4, 0.02% outliers : 22, 0.11% low counts : 0, 0% (mean count <0)
Most of the analyzes were made using the “Beginner's guide to using the DESeq2 package”, by Love, M .; Anders, S. and Huber, W.
I did the test with EdgeR, but the results were similar.
To check if I did something wrong, I used the data that was used in the mentioned guide, data from the "parathyroidSE" package and also the raw RNA sequencing data mentioned in this guide so that I could do the same steps that I did with my data. I used the same parameters that I used in my data. This was done to check if something I did before the differential expression analysis, had an influence on the low results I found.
The results were as follows:
results with the ready-made data from the “parathyroidSE” package:
out of 32082 with nonzero total read count adjusted p-value <0.1 LFC> 0 (up): 38, 0.12% LFC <0 (down): 44, 0.14% outliers : 0, 0% low counts : 24200, 75% (mean count <210)
results with the parameters I used in my alignment:
out of 25980 with nonzero total read count adjusted p-value <0.1 LFC> 0 (up): 14, 0.054% LFC <0 (down): 14, 0.054% outliers : 0, 0% low counts : 22,946, 88% (mean count <116)
As you can see, there is a smaller amount of genes when I used the steps I did in my alignment. Can anyone help me with suggestions of what may have happened?