I created a fasta file from a ChIP-seq_summit.bed where each sequence is about 500bp.
I want to remove sequences containing a specific motif, so I thought to use this code:
grep -E ".[GC][AT]T[GC]AAA[GA]" ChIP-seq_summit.fa -v >ChIP-seq_summit_no_cons.fa
it's not clear to me wether this remove only the motif in the sequences or the entire line that belongs to.
I'm asking because by using the first file for finding motifs, MEME works fine but using this modified file it says that there are sequences shorter then the minimum allowed
Thank you 🙂