grep sequence

1

Hi,

I have a fasta file with sequences like the following. The pair of sequences have a similar header. I want to generate a file with the sequences which have a header with no "shuffled". How to do that in bash?

>AABR03119176.1/72910-72785
UCCCCCAGAGUCUGGGCUUGGUGCUUUGCAGUGCUGGCGACCUAUUCCCUUUGACGAUCCCUAGGUGGAGAUGGGGCAUGAGGAUCCUCCAGGGGAAUAGCUCACCGCCACUGGGCAACAGGCCUA
>AABR03119176.1/72910-72785-shuffled
CCGCUAGCGUGAUUGGGGACGGGAUCGACCGGUGGCCCGCCGACGCCUCACCUCAUACUCGUAUGUGAUGCCGAGGGCUAGGUAAGAUGGUUGAACGCUCUAGAGUGCCCUCUGAACUUAGCCUCU
>AANN01820944.1/1549-1423
UUUCCCUCAGAAUAGGCUUGUUGCUUUACAGUACUGGUGAUCCAUUCUCUUUGAUGAUCCCcUAGGUGGAGAUGGGGCAUGAGGAUCCUCCAAGGGAAAGACUCAUCAUCACUGGGCAACAGCCUUA
>AANN01820944.1/1549-1423-shuffled
AGGCUCUGACAUAGACUCUUCUUUAGUGGGCGCGCCGACACAUACCUGUcUGAGGAGAUCGAAAUGUGUAGUCCGACAGAACUAAACAAGACUCGUCGGUGCUUAGACUUCUUUCCUGUUUGCGAUU


grep

• 53 views

cat <yourFile> | paste - - | grep -v 'shuffled' | sed 's/t/n/g' > new_file

cat your file, put header and sequence on one line (paste) , grep all lines that do not match 'shuffled' (grep -v ) , put data back in two lines header+sequence (sed)


Login
before adding your answer.



Source link