Hey I'm a new student in bioinformatics and I'm working on this project - I want to replace some nucleotides with a missing "-", let's say I want to replace a bit from the beginning of the sequence, and a bit from the end of the sequence. How should I go about doing this, and in a scalable manner?

this is the code I have so far. I'm not sure how to edit these sequences, is it better if I use a numpy array? What do I use to write

fasta = {}

with open('example.fasta') as file_one:
    for line in file_one:
        line = line.strip()
        if not line:
            continue
        if line.startswith(">"):
            active_sequence_name = line[1:]
            if active_sequence_name not in fasta:
                fasta[active_sequence_name] = []
            continue
        sequence = line
        fasta[active_sequence_name].append(sequence)

seqMat = np.array(fasta)

output:
{'seq1': ['AAATATATATATATATATTATATATTATATATATTATATATATAT'],
'seq2': ['GCGCGAGATAGGGCGCGCGCGCGCGATTAGCGAGGCGCGCGCGGC'],
'seq3': ['TCTCTCTCTCTCTCTTCTCTCTCTCTCTCTCTCTCTCTCTCTCTC']}

And this is what I have as an array. What is the best way to replace nucleotides?

from Bio import SeqIO
import os
import numpy as np

pathToFile = open("example.fasta")

allSeqs = []
for seq_record in SeqIO.parse(pathToFile, """fasta"""):
        allSeqs.append(seq_record.seq)
seqMat = np.array(allSeqs)

Output:
array([['A', 'A', 'A', 'T', 'A', 'T', 'A', 'T', 'A', 'T', 'A', 'T', 'A',
'T', 'A', 'T', 'A', 'T', 'T', 'A', 'T', 'A', 'T', 'A', 'T', 'T',
'A', 'T', 'A', 'T', 'A', 'T', 'A', 'T', 'T', 'A', 'T', 'A', 'T',
'A', 'T', 'A', 'T', 'A', 'T'],
['G', 'C', 'G', 'C', 'G', 'A', 'G', 'A', 'T', 'A', 'G', 'G', 'G',
'C', 'G', 'C', 'G', 'C', 'G', 'C', 'G', 'C', 'G', 'C', 'G', 'A',
'T', 'T', 'A', 'G', 'C', 'G', 'A', 'G', 'G', 'C', 'G', 'C', 'G',
'C', 'G', 'C', 'G', 'G', 'C'],
['T', 'C', 'T', 'C', 'T', 'C', 'T', 'C', 'T', 'C', 'T', 'C', 'T',
'C', 'T', 'T', 'C', 'T', 'C', 'T', 'C', 'T', 'C', 'T', 'C', 'T',
'C', 'T', 'C', 'T', 'C', 'T', 'C', 'T', 'C', 'T', 'C', 'T', 'C',
'T', 'C', 'T', 'C', 'T', 'C']], dtype='<u1')< p="">



Source link