BioMart dropping and duplicating Ensembl IDs while retrieving corresponding gene symbols?

0

I'm trying to convert Ensembl IDs to Gene symbols within a summarized experiment object (more or less an expression matrix) using BioMart.

mart <- useDataset("hsapiens_gene_ensembl", useMart("ENSEMBL_MART_ENSEMBL"))
genes <- rownames(gse_cellgenefiltered_cohort1)
G_list <- getBM(filters= "ensembl_gene_id", attributes= c("ensembl_gene_id", "hgnc_symbol"),values=genes,mart= mart)

For some reason, there is a discrepancy between the number of Ensembl IDs I supply BioMart with and the number of Ensembl IDs it returns.

length(rownames(gse_cellgenefiltered_cohort1))

[1] 23395

length(G_list$ensembl_gene_id)

[1] 23316

Another thing I noticed, is that BioMart returns duplicated Ensembl IDs for some of them.

length(unique(G_list$ensembl_gene_id))

[1] 23314

I don't think there are any duplicated Ensembl IDs in the expression matrix.

length(unique(rownames(gse_cellgenefiltered_cohort1)))

[1] 23395

Would anyone know why this might be happening?


ensembl


BioMart

• 25 views



Source link