Trying to use python regular expressions to filter fasta file sequence headers
Apologies if I do not follow the correct question formatting, this is my first time posting. My question is regarding the use of python regular expressions. I have a fasta file of sequences following the format:
>NODE_143195_length_100_cov_16076.000000 TTGTGTTGGTTGTTGTGTTGCCTGTCTTGGTGGCGGTTGTGTTGGCTGCTTTCGTGTCAG TCTCTTCACCGATGTTATGTTGCTCTGTTGTGGCTCCGGC >NODE_143196_length_100_cov_15891.000000 CTTGTGTTGGTTGTTGTGTTGCCTGTCTTGGTGGCGGTTGTGTTGGCTGCTTTCGTGTCA GTCTCTTCACCGATGTTATGTTGCTCTGTTGTGGCTCCGG >NODE_143197_length_100_cov_15696.000000 GCTTGTGTTGGTTGTTGTGTTGCCTGTCTTGGTGGCGGTTGTGTTGGCTGCTTTCGTGTC AGTCTCTTCACCGATGTTATGTTGCTCTGTTGTGGCTCCG
I am trying to filter by both length and coverage. I want to filter sequences less than 5000bp and less than 100 coverage. I have been trying different variations of the following line:
But I cannot seem to make it work. If anyone can help me, would be greatly appreciated.
• 32 views