gravatar for adityabandla

3 hours ago by

I have a FASTA file of gene's like this

>CS1RZP_12510_67 # 64357 # 65316 # -1 # ID=1311_67;partial=00;start_type=ATG;rbs_motif=3Base/5BMM;rbs_spacer=13-15bp;gc_cont=0.675
ATGGCCGTCGTCACCATGAAGCAGATGCTTGATTCCGGCGTGCATTTCGG
GCATCAGACCCGCCGCTGGAACCCGAAGATGAAGCGCTACATCCTGACCG
AGCGCAACGGGATCTACATCATCGACCTGCGGCAGACGCTCACCTACGTC

I would like to extract the corresponding lines for these specific genes from a GFF3 file which contains info on other genes as well which aren't in the above FASTA.

CS1RZP_12510    Prodigal_v2.6.3 CDS 3   617 134.5   -   0   ID=1311_67;partial=00;start_type=ATG;rbs_motif=3Base/5BMM;rbs_spacer=13-15bp;gc_cont=0.675;conf=100.00;score=133.83;cscore=137.59;sscore=-3.75;rscore=-3.31;uscore=-4.34;tscore=4.54;
CS1RZPR4D_68    Prodigal_v2.6.3 CDS 1000    1218    56.4    +   0   ID=1_2;partial=01;start_type=ATG;rbs_motif=AGGA;rbs_spacer=5-10bp;gc_cont=0.612;conf=100.00;score=56.40;cscore=44.46;sscore=11.94;rscore=7.30;uscore=0.10;tscore=4.54;

I am looking for an AWK solution. Grep using -wf is extremely slow



Source link