gravatar for Kumar

2 hours ago by

India

I need to parse accessory gene sequences (both dna and amino acid sequences) from roary pangenome output. I have the locus_tag list and their corresponding gbk and gff files, Is there any way to extract both amino acid and dna sequences from the gbk or gff files.The gbk and gff file were generated through prokka pipeline. Is there any tool to do the same.
The roary accessory genes locus_tag list and corresponding strain gbk and gff file samples are shown below,

locus_tag list.csv

             locus_tag/Pcissicola19
    xynB_1   BGDHLHFA_02833
    smpB     BGDHLHFA_01427

Pcissicola19.gbk

gene            complement(39965..40852)
                     /gene="xynB_3"
                     /locus_tag="BGDHLHFA_02833"
     CDS             complement(39965..40852)
                     /gene="xynB_3"
                     /locus_tag="BGDHLHFA_02833"
                     /EC_number="3.2.1.37"
                     /inference="ab initio prediction:Prodigal:002006"
                     /inference="similar to AA sequence:UniProtKB:P36906"
                     /codon_start=1
                     /transl_table=11
                     /product="Beta-xylosidase"
                     /protein_id="Prokka:BGDHLHFA_02833"
                     /translation="MPELLAFVAKHKLPIDFVTTHTYGVDGGFLDENGKQDTKLSASL
                     DAIVGDVRRVRAQIQASPFPNLPLYFTQWSSSYTPRDFVHDSYISAPYILTKLKQVQG
                     LVQGMSYWTYTDLFEEPGPPPTPFHGGFGLMNREGIRKPAWFAYKYLHALKGRDVPLS
                     DAHSLAAVDGTRVAALVWNWQQPMQAVSNTPFYTKQVPATDSAPLRMRMTHVPAGTYQ
                     LQVRKTGYRRNDPLSLYIDMGMPKDLAPRQLTQLRQATHDAPEQDRRVRVGADGVVEI
                     NVPMRSNDVVLLTLEPAAR"

Pcissicola19.gff

ID=BGDHLHFA_02833_gene;Name=xynB_3;gene=xynB_3;locus_tag=BGDHLHFA_02833
gnl|Prokka|BGDHLHFA_249 Prodigal:002006 CDS 39965   40852   .   -   0   ID=BGDHLHFA_02833;Parent=BGDHLHFA_02833_gene;eC_number=3.2.1.37;Name=xynB_3;gene=xynB_3;inference=ab initio prediction:Prodigal:002006,similar to AA sequence:UniProtKB:P36906;locus_tag=BGDHLHFA_02833;product=Beta-xylosidase;protein_id=gnl|Prokka|BGDHLHFA_02833

For your kind reference my datasets having both draft genome and complete genomes.

The expected dna and amino acid sequence output is given below respectively,

>BGDHLHFA_02833
tcagcgcgccgccggctccagcgtcagcagcaccacatcgttgctgcgcatcggcacgttgatctcgaccacgccatcggcgcccacacgcacacgccgatcctgttcgggcatcgtgcgtggcctgtcgcagctgcgtcaactggcgcggcgccaggtccttgggcatgcccatgtcgatgtacagcgacaacgggtcgttacgccgatagccggtcttgcgcacctgcagctggtacgtgccggcaggcacatgggtcatgcgcatgcgcagcggcgcgctgtcggtggcgggcacctgtttggtgtagaacggcgtattgctcaccgcctgcatgggctgctgccaattccacaccagtgcggcgacgcgcgtgccgtccactgcggcgagggaatgtgcgtcgctcagcggcacatcgcggcccttgagcgcatgcaagtacttgtaagcgaaccaggccggtttgcgaatgccttcgcgattcatcagcccaaacccgccgtggaagggcgtgggcggtgggccgggttcttcgaacagatcggtatagtccagtaactcatgccctgcaccaggccctgcacctgcttgagcttggtcaggatgtacggcgcgctgatgtaactgtcgtggacgaaatcgcgcggcgtatagctgctgctccactgggtgaagtacagcggcaggttgggaaatggcgaggcctggatctgcgcgcgcacgcgtcgcacatcgccgacgatggcatccagagatgcggacagcttggtgtcctgcttgccgttctcatcgagaaacccgccatccacgccataggtatgcgtggtgacgaagtcgatcggcagtttgtgcttggcaacgaaggccagcagttccggcac

>BGDHLHFA_02833
MPELLAFVAKHKLPIDFVTTHTYGVDGGFLDENGKQDTKLSASLDAIVGDVRRVRAQIQASPFPNLPLYFTQWSSSYTPRDFVHDSYISAPYILTKLKQVQGLVQGMSYWTYTDLFEEPGPPPTPFHGGFGLMNREGIRKPAWFAYKYLHALKGRDVPLSDAHSLAAVDGTRVAALVWNWQQPMQAVSNTPFYTKQVPATDSAPLRMRMTHVPAGTYQLQVRKTGYRRNDPLSLYIDMGMPKDLAPRQLTQLRQATHDAPEQDRRVRVGADGVVEINVPMRSNDVVLLTLEPAAR

link

modified 39 minutes ago

written
2 hours ago
by

Kumar40



Source link