I have a fasta file with some long runs of Ns. I would like to split the sequences with strings of 100 Ns or longer by replacing these strings with a header: >some_string
It isn't necessary to have the header strings be unique, I can renumber the headers afterwards.
One issue: the fasta is normally formatted for 70 nt in a line, so the 100 Ns might be spread over multiple lines.
Can anyone suggest a unix script or software package?
Thanks for any advice.