gravatar for waqaskhokhar999

2 hours ago by

I have multi fasta nucleotide file, I am interested to replace the ambigious nucleotides (R, Y, S, K, e.t.c) with a possible combination of nucleotides and save both options. For example, the following nucleotide sequence (sudo example):

>Seq1
ATGGKCRCCGCSGT

Contain ambiguous nucleotides (K, R, S), where K= G or T, R=A or G, and S=G or C

so the Seq1 variant will be like this:

>Seq1_a
ATGGGCACCGCGGT

And Seq1_b variant will be like this:

>Seq1_b
 ATGGTCGCCGCCGT

One option is to use sed using the following command:

sed 's/K/G/g; s/R/A/g; s/S/G/g' input.txt > output_1

to generate:

>Seq1_a
ATGGGCACCGCGGT

And again use sed command:

sed 's/K/T/g; s/R/G/g; s/S/C/g' input.txt > output_2

to generate:

>Seq1_b
ATGGTCGCCGCCGT

And then combine output from both files to generate output like this:

>Seq1_a
ATGGGCACCGCGGT
>Seq1_b
ATGGTCGCCGCCGT

There must be a more elegant way to do this, Any help will be highly appreciated.



Source link