Right, so there may be some manual work. If you have access to a Unix machine, how many entries do you get with

grep '>' my_fasta.fa | sed 's#.[[:digit:]].*##' | uniq

edit:
I was able to use the Eutils command-line tools from NCBI (www.ncbi.nlm.nih.gov/books/NBK179288/). But you need all your protein IDs in a sequence separated by a comma, which we can do.

Just get the installer, run the shell file to install, and move into that directory.

./install-edirect.sh 
cd edirect

Then we need your protein sequence IDs.

Here's my fasta:

cat tmp.fa
>WP_051684486.1
MSIFGEQFLARRNRDQIDLDNALQDVYEAVTGRESIRYSINSDEQVRKELERICFYLGVRADQDVPEYND
LEDMLDYITRPFAIMRRHILLTHHWWKNGDGPLLVSKKDSDELLALIPGRLGGYYYTDFRSNKKIKLDRH
NAGEFEKEAICFYKPLPLSSLSANELTGLLFKNMAAADLAMLVLSGIGIVGVSLLIPFATKMVFEYVIPT
GAMTLVGSFSFLLISSAMVAYIIAVIKQGYADRVKVRMEVYLTHGVMGRMINFPTSFFASKSTGELYRVF
DNLREIPQILIDSVIVPIIDISLAMLFIIQIAVIVPELLVPAVITVLLQFVCMAIGTFQAYGLLNIELQQ
DRKIQGLAISVYEGIQRIKLSGSESRIMAKWAGLYSKKAKVAYPAVFPVRFQTEMIAFISMMGMLAAFYK
GFTDNISISQFVAFVAAFGMLTGSITAFSNKSKDVIKLKPVLKMSDEILKECPEVSKEKLIVDHLSGKIE
VKDLTFRYGRDLPLILDGVSFTVHPGEYVAIVGKSGCGKSTLVRIFMGFEKAVSGSVSYDDIDVERIDPR
SLRRSIGVVMQSGNLFYDSIYRNIAISAPGLSMEEAWEAAEKAGIAEDIRNMPMKMKTLIPQGGGGISGG
QRQRIMIARALAAKPNILIFDEATSALDNITQKVVQDSLDQLNCTRIVIAHRLSTIQNCDRILVLDKGRI
IEEGNYQELLKKGGFFANLIKRQQL
>WP_013276004.1
MEVLKVSAKSNPNAVAGALAGVIREKGGAEIQIIGAGALNQAVKAIAIARGYVAPSGIDLICIPAFTDIE
IDGQQRTAIKLIVEPR

We can get the protein seq ids comma-separated.

grep '>' tmp.fa | sed 's#>##' | tr  'n' ',' | sed 's#,$##' > protein_ids.csv
cat protein_ids.csv
WP_051684486.1,WP_013276004.1

Here you have a few options, but I guess one option is to just copy that string into the next command:

esearch -db protein -query WP_051684486.1,WP_013276004.1 | efetch -format fasta
>WP_051684486.1 ATP-binding cassette domain-containing protein [[Clostridium] aminophilum]
MSIFGEQFLARRNRDQIDLDNALQDVYEAVTGRESIRYSINSDEQVRKELERICFYLGVRADQDVPEYND
LEDMLDYITRPFAIMRRHILLTHHWWKNGDGPLLVSKKDSDELLALIPGRLGGYYYTDFRSNKKIKLDRH
NAGEFEKEAICFYKPLPLSSLSANELTGLLFKNMAAADLAMLVLSGIGIVGVSLLIPFATKMVFEYVIPT
GAMTLVGSFSFLLISSAMVAYIIAVIKQGYADRVKVRMEVYLTHGVMGRMINFPTSFFASKSTGELYRVF
DNLREIPQILIDSVIVPIIDISLAMLFIIQIAVIVPELLVPAVITVLLQFVCMAIGTFQAYGLLNIELQQ
DRKIQGLAISVYEGIQRIKLSGSESRIMAKWAGLYSKKAKVAYPAVFPVRFQTEMIAFISMMGMLAAFYK
GFTDNISISQFVAFVAAFGMLTGSITAFSNKSKDVIKLKPVLKMSDEILKECPEVSKEKLIVDHLSGKIE
VKDLTFRYGRDLPLILDGVSFTVHPGEYVAIVGKSGCGKSTLVRIFMGFEKAVSGSVSYDDIDVERIDPR
SLRRSIGVVMQSGNLFYDSIYRNIAISAPGLSMEEAWEAAEKAGIAEDIRNMPMKMKTLIPQGGGGISGG
QRQRIMIARALAAKPNILIFDEATSALDNITQKVVQDSLDQLNCTRIVIAHRLSTIQNCDRILVLDKGRI
IEEGNYQELLKKGGFFANLIKRQQL
>WP_013276004.1 MULTISPECIES: stage V sporulation protein S [Thermosediminibacter]
MEVLKVSAKSNPNAVAGALAGVIREKGGAEIQIIGAGALNQAVKAIAIARGYVAPSGIDLICIPAFTDIE
IDGQQRTAIKLIVEPR



Source link