gravatar for dpearton

2 hours ago by

South Africa


I have downloaded a genome assembly from genbank (refseq) and it apparently contains some nucleotides that are not either ACTGN (according to the error file from the radinitio program).

I would like to try and find out what these are prior to fixing the file. I've tried various combinations of grep...

grep -i -v [ACTGN]+ sequence.fas

etc., but they either find everything in the file, or nothing.

I would like to do a "simple" grep that finds lines that contains any characters IN ADDITION to [ACTGN] (either case). I can get rid of fasta headers by piping grep -v '>'


