gravatar for genomax

11 minutes ago by

United States

WP* records represents a single, non-redundant, protein sequence which may be annotated on many different RefSeq genomes from the same, or different, species.

You could use Entrezdirect but be ready to get a bunch of "GCF*" records for each WP accession that may be from different species. A random example below generates 16K genomes.

$ esearch -db protein -query "WP_000134546" | efetch -format xml | xtract -pattern Seq-entry_seq -element Org-ref_taxname | xargs -n 1 sh -c 'esearch -db assembly -query "$0" | esummary | xtract -pattern DocumentSummary -element RefSeq' | wc -l
   16516

$ esearch -db protein -query "WP_000134546" | efetch -format xml | xtract -pattern Seq-entry_seq -element Org-ref_taxname | xargs -n 1 sh -c 'esearch -db assembly -query "$0" | esummary | xtract -pattern DocumentSummary -element RefSeq' | head -5
GCF_011754615.1
GCF_011751615.1
GCF_011751495.1
GCF_011751485.1
GCF_011466875.1

link

modified 10 minutes ago

written
11 minutes ago
by

genomax80k



Source link