gravatar for vkkodali

3 hours ago by

United States

You can use Entrez Direct for this as shown below:

$ cat accs.txt
NC_008803.1
NCVQ01000001.1
NC_039364.1
NC_005101.4

$ epost -db nuccore -input accs.txt -format acc 
| elink -target protein 
| efetch -format fasta 
>NP_001008767.1 thioredoxin-interacting protein [Rattus norvegicus]
MVMFKKIKSFEVVFNDPEKVYGSGEKVAGRVTVEVCEVTRVKAVRILACGVAKVLWMQGSQQCKQTLDYL

However, for the four accessions in your list, nearly 24000 proteins are returned. Downloading that many proteins using efetch can quickly become a time-consuming process. If you are doing this for entire chromosomes, you may be better off with the following three-step approach:

  1. use efetch with the parameter -format acc to download a list of protein accessions
  2. downloading the entire protein datasets for the organisms of your interest from NCBI FTP
  3. use a different program such as seqkit to extract the specific protein accessions of interest



Source link