gravatar for sapuizait

2 hours ago by

Dear all

I have a couple of metagenomes that I have mapped against the IMG/VR Viral database . Most of my reads have hits within the Uncultivated Viral Genome (UViG) IDs (list) and therefore in my results file I have a long list of IDs that look like that:

IMGVR_UViG_3300008486_000101
IMGVR_UViG_2519103086_000001
IMGVR_UViG_2519103159_000003
IMGVR_UViG_2526164598_000001
IMGVR_UViG_2534681965_000001
IMGVR_UViG_3300018494_000062
IMGVR_UViG_3300018878_000007
IMGVR_UViG_3300018878_000008
IMGVR_UViG_3300018878_000079
IMGVR_UViG_3300019376_000005
IMGVR_UViG_3300019378_000008
IMGVR_UViG_3300021255_000014
IMGVR_UViG_3300021255_000002
IMGVR_UViG_3300021255_000040

where I believe that the first part is the organism (e.g. "IMGVR_UViG_3300008486") and the second part the gene (e.g. "_000101").
Using that ID I can recover all the meaningful information from the IMG website img.jgi.doe.gov/cgi-bin/vr/main.cgi?section=ViralSearch&option=uvig

But in my case I have a hundred files with tens of thousands of IDs (just this time, in the future I will have even more) - and therefore the website is not an option.

I know that poeple have used python scripts to recover IDs automatically from NCBI but I have not seen such scripts for JGI and on top of that I do not have the skills to do this on my own so any help would be very much appreciated cause eitherwise these results are completely unusable to me...

Thanks in advance
P
I



Source link