I am having an unexplained problem with the code I wrote, using SeqIO of Biopython.
I am doing several filtering steps for a fastq file using this code:
def extract_from_fastq(fq, output_fq): """ Takes a fastq file, examines each read using all the above functions, and writes to a new file the non-ambiguous reads :param fq: the fastq file :param output_fq: the output fastq file after filtering """ input_iterator = SeqIO.parse(fq, "fastq") #goes over each record and tests if the read meets the requirements short_iterator = (rec for rec in input_iterator if filter_by_quality(rec.letter_annotations["phred_quality"]) and filter_by_single_nucleotide_appearance(rec.seq) and filter_by_long_stretches_repeats(rec.seq)) #writes to a new file after the conversion to a fastq format again SeqIO.write(short_iterator, output_fq, "fastq")
The problem is that the created file sometimes includes only the last record (the last 4 lines of the input fastq), so I assume it is overwritten in each iteration. However, sometimes it does work and I get all records in one file!
Any idea why is this and how to avoid it?