Hello everyone!
In my PhD thesis, I am working with differential expression analysis, trying to verify genes related to thermoregulation in beef cattle.
I have 66 sequenced, paired-end RNA samples, divided into 2 lanes per sample.
The first steps were to check the quality of these samples by FASTQC, then do the trimming (using the Trimmomatic software with the parameters: LEADING: 30 TRAILING: 30 SLIDINGWINDOW: 5: 20 AVGQUAL: 20 MINLEN: 50) and then checking the quality again of these samples by FASTQC.
The FASTQC reports did not show anything out of the ordinary, so I proceeded to align the samples using the Hisat2 software.
All of these steps were done on the Linux terminal in bash.
Post alignment I went to the R software, to count reads with FeatureCounts, the command I used was as follows:
fc <- featureCounts(files = fls,
annot.ext ='/home/.../Bos_taurus.ARS-UCD1.2.102.gtf' ,
isGTFAnnotationFile = TRUE,
GTF.featureType = "exon",
GTF.attrType = "gene_id",
useMetaFeatures = TRUE,
strandSpecific = 0,
isPairedEnd = TRUE,
nthreads = 30)
As my samples were in two lanes, I added the read count by samples with the following command:
cont_fc2=(cont_fc[1:(ncol(cont_fc)-1)] + cont_fc[2:ncol(cont_fc)])[c(T,F)]
That done, I started to analyze the differential expression with Deseq. The results I got were few genes differentially expressed as in the example below:
out of 20328 with nonzero total read count
adjusted p-value <0.1
LFC> 0 (up): 1, 0.0049%
LFC <0 (down): 4, 0.02%
outliers [1]: 22, 0.11%
low counts [2]: 0, 0%
(mean count <0)
Most of the analyzes were made using the “Beginner's guide to using the DESeq2 package”, by Love, M .; Anders, S. and Huber, W.
I did the test with EdgeR, but the results were similar.
To check if I did something wrong, I used the data that was used in the mentioned guide, data from the "parathyroidSE" package and also the raw RNA sequencing data mentioned in this guide so that I could do the same steps that I did with my data. I used the same parameters that I used in my data. This was done to check if something I did before the differential expression analysis, had an influence on the low results I found.
The results were as follows:
results with the ready-made data from the “parathyroidSE” package:
out of 32082 with nonzero total read count
adjusted p-value <0.1
LFC> 0 (up): 38, 0.12%
LFC <0 (down): 44, 0.14%
outliers [1]: 0, 0%
low counts [2]: 24200, 75%
(mean count <210)
results with the parameters I used in my alignment:
out of 25980 with nonzero total read count
adjusted p-value <0.1
LFC> 0 (up): 14, 0.054%
LFC <0 (down): 14, 0.054%
outliers [1]: 0, 0%
low counts [2]: 22,946, 88%
(mean count <116)
As you can see, there is a smaller amount of genes when I used the steps I did in my alignment. Can anyone help me with suggestions of what may have happened?
Thanks!