How to loop on two fasta files for hetero-dimerization analysis
I am taking an approach but have not been able to get my expected output. I have two fasta files as below - what I do is I make reverse-complement of 5mers of each record in
file1 and then I look for any record in
file2 if has or does not have any of reverse-complement 5mers.
>seq1 ACCGGT >seq2 AATTTC
>seq3 CCGGT >seq9 GGGGGGCCC
So based on the dummy examples above - the reverse-complement of
exists in the
seq3 of file2.fa
(the CCGGT) - So then the
seq1 should be marked against the
seq3in the output (like having a counter - I am not sure how to do it).
At the end I would like to have a table as below showing every
seq in file1 against every
seq in file2 with a mark representing if they hetero-dimerized or not. Based on my example only
seq1 from file1 hetero-dimerizes with the
seq3 in the file2.
file2 seq3 seq9 file1 seq1 1 0 seq2 0 0
This is the code that I have tried so far - it does not give expected output - I also could not set the script in a way to write output in table format - any help to fix the code is appreciated.
with open('file1.fa', 'r') as fw, open('file2.fa', 'r') as rv: for fw_record in SeqIO.parse(fw, 'fasta'): fcnt = 0 lst =  for i in range(len(fw_record.seq)): kmer = str(fw_record.seq[i:i + 5]) if len(kmer) == 5: C_kmer = Seq(kmer).complement() lst.append(C_kmer[::-1]) #print(lst) for rv_record in SeqIO.parse(rv, 'fasta'): rcnt = 0 if any(items in rv_record.seq for items in lst): fcnt+=1 rcnt+=1 if fcnt == 0 and rcnt == 0: print('>>' + fw_record.id) print('>>' + fw_record.seq) if fcnt > 0 and rcnt > 0: print('>>>' + fw_record.id) print('>>>' + fw_record.seq)
• 88 views