gravatar for mdgn

2 hours ago by

Hello,

I am trying to do partitioning my plastid alignment using Arabidopsis.gff file as reference. I wanted to first make a position file which had assigned keys and values to each position in the AT reference sequence.

NC_000932.1 RefSeq  exon    527 551 .   +   .   ID=exon-ArthCt098-1;Parent=rna-ArthCt098;Dbxref=GeneID:1466263;gbkey=tRNA;gene=trnT
NC_000932.1 RefSeq  CDS 554 570 .   +   0   ID=cds-NP_051054.1;Parent=gene-ArthCp017;Dbxref=Genbank:NP_051054.1,GeneID:844775;Name=NP_051054.1;gbkey=CDS;gene=psbD
NC_000932.1 RefSeq  CDS 610 612 .   +   0   ID=cds-NP_051055.1;Parent=gene-ArthCp018;Dbxref=Genbank:NP_051055.1,GeneID:844773;Name=NP_051055.1;Note=CP43;gbkey=CDS;gene=psbC

I wanted to define a range of numbers (or count with while loop), and give each location in the sequence a value according to the reference. So the script should search for the intervals in 4th and 5th columns and between that interval assigning each number as a key and 3rd column for respective values.

My desired output is below;

    "1": "na"
    "2": "na"
    "3": "na"
    .
    .
    "526": "na"
    "527": "exon"
    "528": "exon"
    "529": "exon"
    .
    .
    .
    "550": "exon"
    "551": "exon"
    "552": "na"
    "553": "na"
    "554": "exon"
    "555": "exon"
    .
    .
    .
   "570":"exon"
   "571":"na"
   .
   .
   .
    "610": "na"
    "611": "CDS"
    "612": "CDS"
    .
    .
    .
    until the range ends.

I believe a while loop with a count is the correct way, but I am failing to end it correctly and it goes into an endless loop. I tried directly assigning the values as strings but I realize library is what I need because the data is too big to handle. I would be glad if you can show a way to solve this.



Source link