I have downloaded a genome assembly from genbank (refseq) and it apparently contains some nucleotides that are not either ACTGN (according to the error file from the radinitio program).
I would like to try and find out what these are prior to fixing the file. I've tried various combinations of grep...
grep -i -v [ACTGN]+ sequence.fas
etc., but they either find everything in the file, or nothing.
I would like to do a "simple" grep that finds lines that contains any characters IN ADDITION to [ACTGN] (either case). I can get rid of fasta headers by piping grep -v '>'