paired-end fastq corruption cleanup: intersect of record headers in SeqIO.parse() iterators
My paired-end .bcl files are slightly corrupted, so that after aligning, R2.fastq is slightly smaller than R1.fastq. Using grep I've found that none of the R2 have random truncation, but instead are only missing full records. Unfortunately, rhese records are missing at random throughout the file, rather than one large chunk. I would like to use SeqIO to remove records missing in R2.fastq from R1.fastq and I.fastq.
Is there a way to find the intersect of record headers in SeqIO iterators of R1, R2, I?
• 38 views