Hi all,
I've been using entrez_fetch
in R package rentrez
v1.2.2 to extract nucleotide sequences in FASTA format for a large number of GIDs. For a small minority I've found entrez_fetch
simply returns an empty string with a newline character - example below.
> entrez_fetch(db = "nuccore", id = "108597802", rettype="fasta_cds_na")
[1] "n"
I get the same result using the accession rather than the GID.
> entrez_fetch(db = "nuccore", id = "DQ640652.1", rettype="fasta_cds_na")
[1] "n"
The exact function works for most other GIDs/accessions I feed it, and it also works if I request alternative rettypes, e.g.
> entrez_fetch(db = "nuccore", id = "108597802", rettype="gb")
[1] "LOCUS DQ640652 29746 bp RNA linear VRL 12-JUN-2006nDEFINITION SARS coronavirus GDH-BJH01, complete genome.nACCESSION DQ640652nVERSION DQ640652.1nKEYWORDS .nSOURCE SARS coronavirus GDH-BJH01n ORGANISM SARS coronavirus GDH-BJH01n Viruses; Riboviria; Nidovirales; Cornidovirineae; Coronaviridae;n Orthocoronavirinae; Betacoronavirus; Sarbecovirus.nREFERENCE 1 (bases 1 to 29746)n AUTHORS Cai,J.-P., Hei,A.-L., Hu,J.-H., Wang,S.-K., Zhang,C.-B., Dai,D.-P.,n Shen,Z.-Y., Guo,J., Li,M., Wu,Y.-S., Cheng,G., He,Y.-S. and Hou,M.n TITLE Direct Submissionn JOURNAL Submitted (14-MAY-2006) National Center for Clinical Laboratory,n Beijing Hospital, 1 Da Hua Road, Dong Dan, Beijing 100730, ChinanFEATURES Location/Qualifiersn source 1..29746n /organism="SARS coronavirus GDH-BJH01"n /mol_type="genomic RNA"n /strain="GDH-BJH01"n /isolation_source="Homo sapiens lung"n /host="Homo sapiens"n /db_xref="taxon:388737"n /country="China"nORIGIN n 1 ggcttccagg aaaagccaac
Curiously though using the API through a browser also returns a blank file: example.
If anyone is able to shed some light on why these sequences aren't being returned in FASTA format properly, I'd be very grateful!