gravatar for projetoic

4 hours ago by

I have a reference sequence (CDS) and an aligned sequence in the same file. Format fasta.aln or aln.The alignment was done with MAFFT.

Input:

RefSeq - - - - - - AAGCTGC

Seq1     AAAAAAGGGGGG

Output I would like:

Seq1 GGGGGG

It would be a removal from sequence 1 according to the symbol "-" of RefSeq. I would like to extract only the CDS after the alignment. Is there any way to do this from the command line or some programming language? I tried to do it with biopython but was not successful!

f = open('Denv4-X-gb_AY947539.txt', 'r')
con = f.readlines()
con = [i.strip() for i in con]
length = len(con[0].split("-")[0])
result = f'{con[0].split("-")[0]} {con[0].split("-")[0][length:]}'
print(result)
f.close()
f = open('Denv4cds.txt', 'a') f.write(f'n{result}')

Do you have a script or module or library that can do this?

I'm using windows wsl.

I'm a beginner in bioinformatics

the generated file just printed the first line ...



Source link