When to merge sequencing data from multiple lanes (FastQToSam, SamToFastq, BWA, MergeBamFiles, or additional step)?
I am following GATK's Best Practice Workflow for germline short variants discovery in single samples. The pipeline is composed of the following steps:
FastqToSam MarkIlluminaAdapters FastqToSam bwa mem MergeBamAlignment MarkDuplicates BaseRecalibrator ApplyBQRS ValidateSamFile HaplotypeCaller
I work with paired-end sequencing data, and mostly each sample has one forward read
FASTQ file, and one reverse read
FASTQ file. However, I have a couple of samples for which sequencing data is divided onto multiple lanes. I have seen that most commands in the pipeline do not allow for multiple lanes input, but only one forward and one reverse (or one unmapped bam and one aligned
MergeBamAlignment). Should I merge all forward and all reverse
FASTQ files before starting the pipeline (quality of each dataset seems comparable according to
FastQC/multiQC) or only later (and, if so, which step would be the best)?
Thanks for your input.
• 183 views