I was able to succeessfully remove adapters from my PE 150bp x 2 reads using bbduk.sh, but I kept seeing that I had a long string of Gs in my R2 sequences (~ 0.1% of my R2 reads).
I reran the original fastq files with bbduk to remove the string of Gs, and this worked for most of my PE files except for one pair.
When I ran this trimmed set of PE reads through FastQC (after bbduk), I received an error that said:
Failed to process file EA_Pool-POW_1-1a_S28_L001_R1_CLEANEST.fastq uk.ac.babraham.FastQC.Sequence.SequenceFormatException: Ran out of data in the middle of a fastq entry. Your file is probably truncated at uk.ac.babraham.FastQC.Sequence.FastQFile.readNext(FastQFile.java:179) at uk.ac.babraham.FastQC.Sequence.FastQFile.next(FastQFile.java:125) at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:77) at java.lang.Thread.run(Thread.java:722)
Has anyone encountered this issue before with output fastqs from bbduk?
Should I just not worry about the ~0.1% of GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG kmers in my R2 reads?