Removing specified range of bases from middle of the contigs and creating new sequences

0

Hi
I am trying to exclude a span of nucleotides from inbetween my contigs. I have the contig IDs and the start:end position as given below. I would like to remove all the bases between start & end.

Example;

> ID length start..end

>Contig1 100    20..35

> Contig2 30    3..12

If contig1 looks like below, I want to exclude the bases in bold by splitting the sequence and then create a new sequence with the bases on either side of the excluded region.

Input:

>Contig1
TTGTTCAACGGATCCACCT***GTTGCCAAGAGTGCTTCAGTACATTGCTCACGGCTGAA***TCCCATATCCATCAAAGCACAAGATTTGAATTCACTCGAGGATCTGCTTCGTCGACCATTGGAAATGAAAAAATTACAATTACACATTGAATTTGTAAAGCTTGAAATTAATGA

Output:

>Newcontig
TTGTTCAACGGATCCACCTTCCCATATCCATCAAAGCACAAGATTTGAATTCACTCGAGGATCTGCTTCGTCGACCATTGGAAATGAAAAAATTACAATTACACATTGAATTTGTAAAGCTTGAAATTAATGA

Most of the tools that i encountered remove the bases from trailing ends. Please let me know if you have any suggestions to specify the start:end and create a new sequence excluding that region.

I have about 100 contigs to clean this way.
Any help would be highly appreciated. Thank you so much!


Contigs


Assembly


fastafile

• 142 views



Source link