gravatar for projetoic

2 hours ago by

f = open('Denv4-X-gb_AY947539.txt', 'r')
z = f.read()
count_inicio = sum(map(lambda x : 1 if '-' in x else 0, z)) 
count_fim = sum(map(lambda x : 1 if '-' in x else 0, reversed(z))) 
print(count_inicio, count_fim)
Output>
479 479

file contents:

lcl|NC_002640.1_cds_NP_073286.1_1 [gene=POLY] [locus_tag=DV4_gp1]
     [db_xref=GeneID:5075729] [protein=polyprotein]
     [protein_id=NP_073286.1] [location=102..10265] [gbkey=CDS]
     ------------------------------------------------------------ ---------------------------------atgaaccaacgaaaaaaggtggttaga ccacctttcaatatgctgaaacgcgagagaaaccgcgtatcaacccctcaagggttggtg
     aagagattctcaaccggacttttttctgggaaaggacccttacggatggtgctagcattc
     atcacgtttttgcgagtcctttccatcccaccaacagcagggattctgaagagatgggga
     cagttgaagaaaaataaggccatcaagatactgattggattcaggaaggagataggccgc
     ------------------------------------------------------------ 

gb:AY947539|Organism:Dengue virus 4|Strain
     Name:H241|Segment:null|Subtype:4|Host:Human
     ggtcgtgtggaccgacaaggacagttccaaatcggaagcttgcttaacacagttctaaca
     gtttgtttagatagagagcagatctctggaaaaatgaaccaacgaaaaaaggtggttaga
     ccacctttcaatatgctgaaacgcgagagaaaccgcgtatcaacccctcaagggttggtg
     aagagattctcaaccggacttttttccgggaaaggacccttacggatggtgctagcattc
     atcacgtttttgcgagtcctttccatcccaccaacagcagggattctgaaaagatgggga
     cagttgaagaaaaacaaggccatcaaaatactgactggattcaggaaggagataggccgc
     atgctgaacatcttgaatggaagaaaaaggtcaacaatgacattgctgtgcttgattccc

For example I need to take the sequence lcl | NC_002640.1_cds_NP_073286.1_1> --- AATG-GG ---- and count the number of "-" at the beginning and end

And then cut into Myseq1 gb: AY947539 | Organism: Dengue virus 4 | GGGAATG-GGAAAA characters according to the amount of "-"

TALE 3 "-" in Myseq start and 3 at the end 4 ... So the output I want is AATF-GG. But first I need to make this "-" count from the beginning and the end.

How do I count symbols in a given string / text and as a result of that count remove characters from another string / text in the same file?



Source link