I have a couple of metagenomes that I have mapped against the IMG/VR Viral database . Most of my reads have hits within the Uncultivated Viral Genome (UViG) IDs (list) and therefore in my results file I have a long list of IDs that look like that:
IMGVR_UViG_3300008486_000101 IMGVR_UViG_2519103086_000001 IMGVR_UViG_2519103159_000003 IMGVR_UViG_2526164598_000001 IMGVR_UViG_2534681965_000001 IMGVR_UViG_3300018494_000062 IMGVR_UViG_3300018878_000007 IMGVR_UViG_3300018878_000008 IMGVR_UViG_3300018878_000079 IMGVR_UViG_3300019376_000005 IMGVR_UViG_3300019378_000008 IMGVR_UViG_3300021255_000014 IMGVR_UViG_3300021255_000002 IMGVR_UViG_3300021255_000040
where I believe that the first part is the organism (e.g. "IMGVR_UViG_3300008486") and the second part the gene (e.g. "_000101").
Using that ID I can recover all the meaningful information from the IMG website img.jgi.doe.gov/cgi-bin/vr/main.cgi?section=ViralSearch&option=uvig
But in my case I have a hundred files with tens of thousands of IDs (just this time, in the future I will have even more) - and therefore the website is not an option.
I know that poeple have used python scripts to recover IDs automatically from NCBI but I have not seen such scripts for JGI and on top of that I do not have the skills to do this on my own so any help would be very much appreciated cause eitherwise these results are completely unusable to me...
Thanks in advance