I have a list of differentially expressed lncRNAs (from RNAseq) and I would like to do GO and KEGG pathway analysis with them to find out which GO terms and pathways are getting the most upregulated/downregulated in my condition vs control.
The identifier present in all the lncRNAs is their ENSEMBL gene ID. Other identifiers suck as SYMBOL or ENTREZID cannot be mapped to all of the lncRNAs ( 3724 out of a total list of 13868 lncRNAs, with 1685 being differentially expressed).
I have tried clusterProfiler which works with the lof2FC of all the lncRNA genes and uses ENSEMBL ID as the identifier. It worked well with mRNA genes and I got a result table with top upregulated and downregulated GO terms.
#suppose I have a dataframe 'df' with 2 columns for ENSEMBLID and log2fc for all lncRNAs list <- df$log2fc names(list) <- df$ENSEMBLID list <- order(list, decreasing=T) #arrange in decraesing order of log2fc gse=gseGO(geneList = list, ont = "ALL", OrgDb= Org.Hs.eg.db) #rest I keep defaults [email protected]
For mRNA genes, this gives me a result table with Go terms and p value etc in it but for lncRNA genes, it gives me a table with 0 rows.
Similarly, I tried using the gage package which uses ENTEZID as identifiers. This worked well with mRNA genes and once again I got the top enriched GO terms and KEGG pathways. For lncRNA genes, I did get a results table for GO terms (BP, MF and CC) separately, but BP with lowest p value is 0.02 and only 12 BP terms have p < 0.05. Similarly, MF and CC have very few terms with p < 0.05. The KEGG pathways results table is all NA. Also, since only 3724/13868 ENTRZIDs could be mapped, I am not sure if using this as the identifier is a good idea.
So, I am looking for tools (R based or web based) which can be used to GO term analysis and KEGG pathway analysis of lncRNA genes using ENSEMBLID as the identifier. Any help will be very much appreciated.