You have a list of genes from DEG analysis, with p-values, FDRs, & logFCs, etc. Previously, what I do for GSEA analysis is to filter in genes with FDR < 0.25 or 0.05, rank them by logFC (in other words, pre-rank the genes by logFC), and then do GSEA. Now I am wondering if this is a good way:
There might be too many genes (typically ~50%). Assuming usually there are 4~5 pathways involved and each pathway has about 500 genes,
then the top 2,000 genes might be enough to be included for GSEA analysis.
Not sure if logFC is the best way to rank genes. Maybe use -log(PValue) as the magnitude of the rank score and the sign of
logFC as the sign of the sore? i.e., use sign(logFC) * (-log(PValue))
as the rank score?
The GSEA algorithm does not filter the expression dataset and does not
benefit from your filtering of the expression dataset. During the
analysis, genes that are poorly expressed or that have low variance
across the dataset populate the middle of the ranked gene list and the
use of a weighted statistic ensures that they do not contribute to a
positive enrichment score. By removing such genes from your dataset,
you may actually reduce the power of the statistic.
We hopefully will be able to devote some time to investigating this,
but in the mean time, we are recommending use of the GSEAPreranked
tool for conducting gene set enrichment analysis of data derived from
RNA-seq experiments. In particular: Prior to conducting gene set
enrichment analysis, conduct your differential expression analysis
using any of the tools developed by the bioinformatics community
(e.g., cuffdiff, edgeR, DESeq, etc). Based on your differential
expression analysis, rank your features and capture your ranking in an
RNK-formatted file. The ranking metric can be whatever measure of
differential expression you choose from the output of your selected DE
tool. For example, cuffdiff provides the (base 2) log of the fold
Privacy & Cookies Policy
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.