gravatar for k.kathirvel93

2 hours ago by

India

Hi EveryOne,

I have a multifasta file which is converted from BWA bam file. I want to extract only sequences contains specific primers on the start, middile or end. How can i do it with awk or sed or grep. Thanks in advance.

The Input file looks like this :

>M01015:63:000000000-D2M18:1:1101:17027:1479
TTCTCTCTTCTCTCTTCTTCCTCTTTTCTTTTCTCTCTCTTTTTTTTCTTCTTTTTCTTCTTTTTTTCTT
TTCCTTTTTTTTTCTTTTTTTTTTTTTTTTTTTTTTTTTTCTCCTTTTTCCTTCTTTCTTTTTCTTTTTT
CTCCTCTTCTTTTTTCTTTTTTTTCTTTCTTTTTTTCTCCTTTTTTTTTTTTCTTTTTTTCTTCCCTTTT
TTTTTTTCCCTTTTTTCTTTTTTTTTTTCTTCCTTTTTTT

>M01015:63:000000000-D2M18:1:1101:17027:1479
TCCTCTCTCTCTCTTCTCCCTCCTCCCTTTCTCTCTTCTCTCTTTTCTCTTTCTTTTCTTTTTTCTCTTT
TCCCTTTTTCCCTTCTTTCTTTTCTTTTTTTTTTTCTTTTTTCTTTTCTTTTTTTTTTCTCTTTTTTTCT
TCTTTTTTTTTCTCCTCTTTTTTTTTTTCTTTTTTTTTTTTTTTCTCTTTTTCTTCTTTTTTTTTTTTTT
CTTTTTTTCTTTTTTTTTTTTTTTTTTCTCTCTCTTTTTT

>M01015:63:000000000-D2M18:1:1101:15901:1612
GGCACTCGTATCGATGCGGCCGCGTTCGTTTGTTTATACACCTGCTCGTGCTTGTTTATGCATCTGCCAT
CTCCCTTCTGCTTATTTCTGTCTCCGATGCCTCTGTACTCCTTAGCCTTTCAGCTCCTGCCGCCTGTTTC
CCTGTGATGCAACAAGCTTACTCTGCACCAATGATGCAGCAGCCAGCTCAATCTAACGCAGCCAGTGATT
AGTTAGACGCGTGCCTGTGATTAGTTAGACGCGTGCCAGT

>M01015:63:000000000-D2M18:1:1101:15901:1612
GCCTCTGTCCCTCTTCTACCTATTCCTTGCCCCCCTCTTCCTTATTCCTTCCCCGCCTCTTCCTTATCTC
TGCCTTCTTTCTTTTGACCTCTCTCCTTCCTCATTGGTGCAGCGTTAGCTTGTTGCTTCACTGGGAAACT
TGCGGCAGGAGCTGCCCTGCTTCTGCGTCCTGACTCTTCGCCTTCCGTAATTTCCCGTTCGGTGTTGCCT
GTTTCTTCTACCAGCTCGCTCAGTTTTTTATTCTTTCGA

>M01015:63:000000000-D2M18:1:1101:16395:1620
GGCACTCGTATCGATGCGGCCGCGGTTATCTCTTCCCGCTGCACTGCCTTTTAGGCGTTCTTTTGTTCCG
GCCCCCTCTCCCCCCGGGTTCCCTGCTTTCCCCTGTGCGCTATTCCTGTTCTAGATGCTTTACTGTCCCC
CTCCGCTCCCGGCTTCTCGGTCAGTTTCCCCGTGCTTAGTTAGACGCGTGCTTCTGGC

>M01015:63:000000000-D2M18:1:1101:16395:1620
GCCTCTAGCACGCGTCTAACTAATCACTTTCCCCCTCCCCGTTAATCCGGGTTCTGTCTTGTTCAGTCAT
TCCTCTCGCCCCGCCCTCGCTCACTGGCTCTTGCTGCCTACCCGGGTTCAGTACTCGCCGTCCCTTATGA
ACCCCTCTTTGGCCTTGCTCCGGGTGGTGTTTCCCGCGGCCGCATCGATACGAGTGCCCTGTTTCTTATA
CACTTCTGACGCTGCCGCCGAATATAGCGGTGTCGTTCTT

>M01015:63:000000000-D2M18:1:1101:15366:1643
GGCACTCGTATCGATGCGGCCGCGGTAAACTCCACCCGGACCAACGCCAAATAGTGTTTCATAAGGTACT
TCCCTTACTCCCCCCGTGTAGGCTGCTTTTGCCCCTCTTCTCTTGCTGGCCTAGATGAATTACTGTCCTC
TACCTAACCCTTCTTATCTGTCAGTTTCACCGTTTTTTGTTAGTCGCGTGCTCTTTGCCTTTTTCTTCTA
CCTCTCTCCTCTCTCACTATACTTCTGTCCATCTTTTTTT

>M01015:63:000000000-D2M18:1:1101:15366:1643
GCCTCTAGCACGCGTCTCACTAATCCCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTCTT
TCCTCTTTTCTTTCCTTTTCTCTCTTTTTTTTTCTCCTTTCCCTTTTTCTGTTCCTTCCGTCCCTTTTGT
TCCCCTTTTTTTCCTTTCTCCTTTTTTTTTTTTCCCTTGCTCTTTCTTTTCTCTTCTCCTTTTTCTTTTA
CTTATCTTCCGTTTCCTTCGTCTTTCTCTTTTTATATTTT

>M01015:63:000000000-D2M18:1:1101:17421:1643
GGCACTCGTATCGATGCGGCCGCGGTGATGTTAGTCGCGTGCCGTGTTTTGTTACACGCGTGCCAGTGAT
TAGTTAGACGCGTGCTAGAGGC

>M01015:63:000000000-D2M18:1:1101:17421:1643
GCCTCTAGCACGCGTCTAACTAATCACTGGCCCGCGTCTCTCTAATCTCGGCTCGCGTCTCACTTCCCCG
CGGCCGCATCGATACGAGTGCC

>M01015:63:000000000-D2M18:1:1101:16505:1648
GGCACTCGTATCGATGCGGCCGCGTGTGATTTCTTCGACTTGTCCTAGCGTCCTCTCTCTTATCTACTTC
TTCGACCCCTCTCGACTCCTTTTCATCTCCTATTCCCTTTTCTGCTTCCCTATATTCTCTTCTTTTTTCT
TTTTTTTTTTTTGCTTATTCTTCCTTATCACTTTTTTTTTTCTACTCTATGCTTCCTGTCTGTCTCGTTT
CTGCCTCGTTGGTTTATTTTTCCTGCCTCTTTCTTTTTTT

>M01015:63:000000000-D2M18:1:1101:16505:1648
GCCTCTAGCACGCGTCTAACTAATCACTCTCTTCCTTTTCTTTTCTTTTGCCTTGTCTCTTCTTCCCCTC
TCTTGCTTCCCTCTACTTCTTTTTTTTTTTTTTCTTCCGTCTCCTTCTTTTTTTCTTCTCTACTTTTTTT
TCTTCTTTTTTTTTCTTTCTCTTTTTTTCTTTCTTTTTTCTTTCTTTTTCTTCTTCTTTTTTTCTATTTT
CTTCTCTTCTACTCTCTTTTCTTTTTTCTTCTTTTTCTTT

>M01015:63:000000000-D2M18:1:1101:17397:1654
GGCACTCGTATCGATGCGGCCGCGGGTGATGTGATTAGTTATACGCGTGCTAGTGGC

>M01015:63:000000000-D2M18:1:1101:17397:1654
TCCTCTAGCACGCGTCTAACTAATCACATCACCCGCGGCCGCATCGATACGAGTGCC

I want to extract only sequences(With headers) contains "ggcactcgtatcgatgcggccgcg" sequnces on it.
like this

>M01015:63:000000000-D2M18:1:1101:16505:1648
GGCACTCGTATCGATGCGGCCGCGTGTGATTTCTTCGACTTGTCCTAGCGTCCTCTCTCTTATCTACTTC
TTCGACCCCTCTCGACTCCTTTTCATCTCCTATTCCCTTTTCTGCTTCCCTATATTCTCTTCTTTTTTCT
TTTTTTTTTTTTGCTTATTCTTCCTTATCACTTTTTTTTTTCTACTCTATGCTTCCTGTCTGTCTCGTTT
CTGCCTCGTTGGTTTATTTTTCCTGCCTCTTTCTTTTTTT



Source link