Trimming an alignment around a given sequence

1

Hello, I have a simple question but actually I have troubles finding how to solve it. I have many alignment files. And I want for each of them get rid in the alignment of overhang bases. The twist is that I want this done relatively to a specific sequence, which is always the first of the alignment and is always named "Scaffoldxxxx" (where x are numbers and others). Here is a pic

what needs to be cropped is in the red squares

As you see, I want to trim everything that is upstream and downstream of the start and end of the first sequence. That's easy in a sequence editor such as Jalview. But as I have thousands of alignments, I need to automate it. Surprisingly I don't even know where to start.

Many thanks for any help or insight.

As requested, here is a sample alignment :

>Scaffold_2:57492774-57492872
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
-------------------------------cttgagctggggtctggccatggggtaaa
gaagcagcagcagagacagaccaatgccaatgaggattccatactgcacacagtcacaag
catgggtta---------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
-----------
>_R_TRINITY_DN28760_c0_g2_i7
tgtttgtgtgtcttgatttacaaaaatgatgcacagtaaatgttgataatttttacgact
gctgaggagatacaaggaacatggtaattgtgtaatgaagacaatgccagcttactaaat
gtattactttctgctgtgtgacaatgatacttacacgggtgcggcataaaactatctcaa
ctcctttccgttccccttccaacaccatgcttcgtataccatgtagactggagaagtcga
ggcccgatatgtggaccatgtccaggatgacggccctggggacgtcctcctccagggcca
ggcccatgaccttgtcctccaggtactctatggctggaaagaacaggccctggtcaggct
ggatcagcaccagccccggctgctccttcaccttgagttgtggccgggccatggggtaga
gcaggagcaacaaggacagcccgataccgatcaggatcccatactcaatgcccacacata
gtgaacccaaaaaggtggccacatgcacaaacagatcccatttattggtgcgccacaaaa
cggggatgattttgtagtcgaccatctgcatgacggccatgatgatgaccgcggccagcg
ctgacttggggatgtagtaacagtagggcaccaggaaggccagtactaacaggatcaggg
accctgtgaaaagaccattcgccggtgttcttacaccgctctgtgagttgacagcagttc
tggaaaaactgccggtgacaggataggaatgaacaaaggaactgagaatgttggcagtac
ctatagctatcaactcttgtgtaggatcaatcttatagttattcacacgagctgcaacat
aaacaaaagtgtcatgaatttcatatcagcgacaaaaacttttacctaataaataaagtt
ttaaaaaggag

I would need to trim this alignment so that its length is the length of the first sequence. I don't want in the second sequence what is upstream and downstream of the bases that match the first sequence.


pairwise


script


crop


alignment

• 121 views



Source link