gravatar for vkkodali

2 hours ago by

United States

You can use the NCBI Assembly portal for this. If you know the organism name or better yet an NCBI Taxonomy identifier, you can query the assembly database with that and click on the 'Representative genomes' filter on the left hand side. Then, use the 'Download Assemblies' button to download data. I recommend exploring the files that you can download as the data you seek are spread out across a few files. For example, gene and protein IDs would be in the GFF3 files; RefSeq identifiers and chromosome identifiers can be found in the assembly_report.txt file and so on.

If you want to do this using command line options, you can use esearch and esummary to get the FTP paths and then download the files of your interest using wget or curl. See: www.biostars.org/p/428504/ for some ideas.



Source link