How to search dbSNP using a list of SNPs and retrieve Gene name (hgnc symbol if existing, otherwise just whatever is in there)

I have a list of 500.000 SNPs from which I want to obtain the gene name. I try to search with biomaRt

library(data.table)
library(biomaRt)

rs <- fread("SNPs.txt")
ensembl_version = "https://dec2016.archive.ensembl.org"
ensembl <- useMart("ENSEMBL_MART_SNP", dataset = "hsapiens_snp")


getBM(attributes=c("refsnp_id", "associated_gene"), filters="snp_filter", values=rs, mart=ensembl, uniqueRows=TRUE)

However many of the SNPs return NA or simply nothing. Show here:

      refsnp_id associated_gene
1      rs425277           PRKCZ
2     rs1571149                
3     rs1240707                
4     rs1240708                
5      rs873927                
6      rs880051           SSU72
7      rs904589                
8      rs908742                
9      rs909823                
10     rs925905                
11       rs7290                
12       rs7407                
13    rs1878745                
14    rs2296716           SSU72
15    rs2298217                
16    rs2459994

When I search some of the rsIDs which did not produce a gene name on dbSNP, they are in fact associated with a gene name in the database. My question is then, how can I connect biomaRt to dbSNP and retrieve the correct gene names for all the SNPs in the list 'SNPs.txt'?


biomaRt


R

• 69 views



Source link