gravatar for vkkodali

2 hours ago by

United States

You can download and parse the gene2refseq file from NCBI FTP site that has these mappings: ftp.ncbi.nlm.nih.gov/gene/DATA/gene2refseq.gz.

If you just have a few gene IDs to work with, you can use Entrez Direct as follows:

cat gene_id_list.txt | while read -r gid ; do 
    echo -ne "$gidt" ; 
    elink -db gene -id $gid -target protein -name gene_protein_refseq 
        | efetch -format acc 
        | paste -s -d ',' ; 
done > gene2proteins.tsv
396320  NP_990694.1,XP_015144082.1
395771  NP_990262.1,XP_025000385.1,XP_015133186.1,XP_015133180.1,XP_015133175.1

awk 'BEGIN{FS="t";OFS="t"}{a=split($2,x,","); for (i=1;i<=a;++i) {print $1,x[i]}}' gene2proteins.tsv
396320  NP_990694.1
396320  XP_015144082.1
395771  NP_990262.1
395771  XP_025000385.1
395771  XP_015133186.1
395771  XP_015133180.1
395771  XP_015133175.1



Source link