DESeq2: impact of number of expressed genes in differential expression analysis
I have 10 RNA-seq samples in 2 groups (GroupA and GroupB), and 3 of GroupB have only a half of expressed genes (i.e. genes with count >0).
And I found that expression levels of all the lowly expressed genes in those samples were exaggerated (>10 times higher than other samples) after normalization in DESeq2.
I think discarding those samples are the best solution, but if I really have to keep them in the analysis what would be the best way to handle this situation?
• 145 views
Prior to running anything in DESeq2, you should pre-filter the raw counts for, e.g., mean raw count > 10 (i.e., include on those genes whose mean raw count is greater than 10 across all samples).
Even before you get to that stage, you should check standard quality control (QC) metrics relating to the FASTQ files and the alignment that was performed.
I am unsure why you are mentioning to discard the samples without mention of any QC metrics that these samples may have failed.