gravatar for Ridha

2 hours ago by

Hello everyone!
I have a question regarding the biomaRt package. I have an RNA-seq dataset where gene identifiers are gene names(I think) and I want to retrieve the ensemble id and the description of each gene. However, the results of this annotation is different than the number of the rows I want to annotate. Is something wrong with my filter? because I am using hgnc_symbol as a filter. I am not sure whether what I have ARE actually gene ids as I am suspicious about the "genes" that start with RP1. Are these perhaps transcript ids rather than gene names? Additionally, the annotation I got have redundant ensemble IDs for the same gene ID. What should I do?
Thank you very much in advance for your help!

#filtered_resdf$gene_id
[1] "WASH7P"         "RP11-34P13.15"  "RP11-34P13.16"  "FO538757.1"     "U6"             "RP5-857K21.4"   "MTND1P23"      
[8] "MTND2P28"       "MTCO1P12"       "MTCO2P12"       "MTATP6P1"       "MTCO3P12"       "RP11-206L10.2"     "RP11-206L10.9" 
[15] "RP11-206L10.8"  "FAM87B"         "LINC01128"      "LINC00115"      "RP11-54O7.3"    "SAMD11"         "NOC2L"         
[22] "KLHL17"         "HES4"           "ISG15"          "AGRN"           "C1orf159"       "SDF4"           "B3GALT6"       
[29] "FAM132A"        "UBE2J2"         "SCNN1D"         "ACAP3"          "PUSL1"          "CPSF3L"         "RP5-890O3.9"   
[36] "CPTP"           "TAS1R3"         "DVL1"           "MXRA8"          "AURKAIP1"       "CCNL2"          "RP4-758J18.2"  
[43] "MRPL20"         "RP4-758J18.13"  "VWA1"           "ATAD3B"         "ATAD3A"         "TMEM240"        "SSU72"         
[50] "RP5-832C2.5"    "FNDC10"         "RP11-345P4.9"   "MIB2"           "MMP23B"         "CDK11B"         "RP11-345P4.10"
ensembl<-useEnsembl("ensembl",verbose = T )

ensembl<-useDataset("hsapiens_gene_ensembl",mart = ensembl)

annotation<-getBM(attributes = c("ensembl_gene_id","description","external_gene_name"),
  filters = "hgnc_symbol",
  values = filtered_resdf$gene_id,
  mart=ensembl)# values are what you want to look up
nrow(annotation)# gives 15694
nrow(filtered_resdf) gives 18748

link

modified 2 hours ago

written
2 hours ago
by

Ridha0



Source link