Over-representation Gene Ontology Analysis on Subset of DE Genes


I have run a likelihood ratio test on a three-condition comparison and received a very large number of significantly differentially expressed genes (>8,000). After performing over-representation analysis on all DE genes, no GO terms were significantly over-represented. I'm assuming this is because the significant list is about half of the background, or all genes tested for differential expression. Would it be incorrect statistical analysis to subset the top results (say, the top 1,000 DE genes by adjusted p-value) and perform over-representation analysis on that subset? It seems incorrect to take only a portion of significant results, but as a student with limited statistical knowledge I wanted to check.




Source link