How to loop on two fasta files for hetero-dimerization analysis

0

I am taking an approach but have not been able to get my expected output. I have two fasta files as below - what I do is I make reverse-complement of 5mers of each record in file1 and then I look for any record in file2 if has or does not have any of reverse-complement 5mers.

file1.fa

>seq1
ACCGGT
>seq2
AATTTC

file2.fa

>seq3
CCGGT
>seq9
GGGGGGCCC

So based on the dummy examples above - the reverse-complement of ACCGG in seq1 (file1)
exists in the seq3 of file2.fa (the CCGGT) - So then the seq1 should be marked against the seq3in the output (like having a counter - I am not sure how to do it).

At the end I would like to have a table as below showing every seq in file1 against every seq in file2 with a mark representing if they hetero-dimerized or not. Based on my example only seq1 from file1 hetero-dimerizes with the seq3 in the file2.

     file2   seq3   seq9
file1 

seq1          1         0

seq2          0         0

This is the code that I have tried so far - it does not give expected output - I also could not set the script in a way to write output in table format - any help to fix the code is appreciated.

Thanks

with open('file1.fa', 'r') as fw, open('file2.fa', 'r') as rv:
    for fw_record in SeqIO.parse(fw, 'fasta'):
        fcnt = 0
        lst = []
        for i in range(len(fw_record.seq)):
            kmer = str(fw_record.seq[i:i + 5])
            if len(kmer) == 5:
                C_kmer = Seq(kmer).complement()
                lst.append(C_kmer[::-1])
        #print(lst)

        for rv_record in SeqIO.parse(rv, 'fasta'):
            rcnt = 0
            if any(items in rv_record.seq for items in lst):
                fcnt+=1
                rcnt+=1

        if fcnt == 0 and rcnt == 0:
            print('>>' + fw_record.id)
            print('>>' + fw_record.seq)

        if fcnt > 0 and rcnt > 0:
            print('>>>' + fw_record.id)
            print('>>>' + fw_record.seq)


fasta


script


python


sequence

• 88 views



Source link