Hello everyone!

In my PhD thesis, I am working with differential expression analysis, trying to verify genes related to thermoregulation in beef cattle.
I have 66 sequenced, paired-end RNA samples, divided into 2 lanes per sample.
The first steps were to check the quality of these samples by FASTQC, then do the trimming (using the Trimmomatic software with the parameters: LEADING: 30 TRAILING: 30 SLIDINGWINDOW: 5: 20 AVGQUAL: 20 MINLEN: 50) and then checking the quality again of these samples by FASTQC.
The FASTQC reports did not show anything out of the ordinary, so I proceeded to align the samples using the Hisat2 software.

All of these steps were done on the Linux terminal in bash.
Post alignment I went to the R software, to count reads with FeatureCounts, the command I used was as follows:

fc <- featureCounts(files = fls, 
                    annot.ext ='/home/.../Bos_taurus.ARS-UCD1.2.102.gtf' , 
                    isGTFAnnotationFile = TRUE,
                    GTF.featureType = "exon", 
                    GTF.attrType = "gene_id", 
                    useMetaFeatures = TRUE, 
                    strandSpecific = 0, 
                    isPairedEnd = TRUE, 
                    nthreads = 30)

As my samples were in two lanes, I added the read count by samples with the following command:

cont_fc2=(cont_fc[1:(ncol(cont_fc)-1)] + cont_fc[2:ncol(cont_fc)])[c(T,F)]

That done, I started to analyze the differential expression with Deseq. The results I got were few genes differentially expressed as in the example below:

out of 20328 with nonzero total read count
adjusted p-value <0.1
LFC> 0 (up): 1, 0.0049%
LFC <0 (down): 4, 0.02%
outliers [1]: 22, 0.11%
low counts [2]: 0, 0%
 (mean count <0)

Most of the analyzes were made using the “Beginner's guide to using the DESeq2 package”, by Love, M .; Anders, S. and Huber, W.
I did the test with EdgeR, but the results were similar.
To check if I did something wrong, I used the data that was used in the mentioned guide, data from the "parathyroidSE" package and also the raw RNA sequencing data mentioned in this guide so that I could do the same steps that I did with my data. I used the same parameters that I used in my data. This was done to check if something I did before the differential expression analysis, had an influence on the low results I found.
The results were as follows:

results with the ready-made data from the “parathyroidSE” package:

out of 32082 with nonzero total read count
adjusted p-value <0.1
LFC> 0 (up): 38, 0.12%
LFC <0 (down): 44, 0.14%
outliers [1]: 0, 0%
low counts [2]: 24200, 75%
(mean count <210)

results with the parameters I used in my alignment:

out of 25980 with nonzero total read count
adjusted p-value <0.1
LFC> 0 (up): 14, 0.054%
LFC <0 (down): 14, 0.054%
outliers [1]: 0, 0%
low counts [2]: 22,946, 88%
(mean count <116)

As you can see, there is a smaller amount of genes when I used the steps I did in my alignment. Can anyone help me with suggestions of what may have happened?


Source link