Extract specific gene id from an annotation file

0

The following is the content of a text file I want to write a small script, if I give input the common name of it will give the output which is also gene name but which is associated gene id of the same line. For reference gaps are created in between two lines are to differentiate two different lines. In original file there is no gaps break, I provided the output and input scenario below on this page.

SDRB02000004.1 Genbank gene 6018 10396 . + . gene_id "TEA_012962"; transcript_id ""; gbkey "Gene"; gene_biotype "protein_coding"; locus_tag "TEA_012962";
SDRB02000004.1 Genbank transcript 6018 10396 . + . gene_id "TEA_012962"; transcript_id "gnl|WGS:SDRB|TEA014503.1"; gbkey "mRNA"; locus_tag "TEA_012962";orig_protein_id "gnl|WGS:SDRB|TEA014503.1:cds_7"; orig_transcript_id "gnl|WGS:SDRB|TEA014503.1"; product "hypothetical protein"; transcript_biotype "mRNA";
SDRB02000004.1 Genbank exon 6018 6864 . + . gene_id "TEA_012963"; transcript_id "gnl|WGS:SDRB|TEA014504.1"; locus_tag "TEA_012963"; orig_protein_id "gnl|WGS:SDRB|TEA014504.1:cds_7"; orig_transcript_id "gnl|WGS:SDRB|TEA014504.1"; product "hypothetical protein"; transcript_biotype "mRNA"; exon_number "1";
SDRB02000004.1 Genbank exon 7548 7685 . + . gene_id "TEA_012962"; transcript_id "gnl|WGS:SDRB|TEA014503.1"; locus_tag "TEA_012962"; orig_protein_id "gnl|WGS:SDRB|TEA014503.1:cds_7"; orig_transcript_id "gnl|WGS:SDRB|TEA014503.1"; product "hypothetical protein"; transcript_biotype "mRNA"; exon_number "2";
SDRB02000004.1 Genbank exon 7802 7923 . + . gene_id "TEA_012962"; transcript_id "gnl|WGS:SDRB|TEA014503.1"; locus_tag "TEA_012962"; orig_protein_id "gnl|WGS:SDRB|TEA014503.1:cds_7"; orig_transcript_id "gnl|WGS:SDRB|TEA014503.1"; product "hypothetical protein"; transcript_biotype "mRNA"; exon_number "3";
Input -  TEA_012962 TEA_012963 ...
output- TEA014503  TEA014504 ...


R


RNA-seq


Linux

• 106 views



Source link