When to merge sequencing data from multiple lanes (FastQToSam, SamToFastq, BWA, MergeBamFiles, or additional step)?

1

Hi,

I am following GATK's Best Practice Workflow for germline short variants discovery in single samples. The pipeline is composed of the following steps:

FastqToSam
MarkIlluminaAdapters
FastqToSam
bwa mem
MergeBamAlignment
MarkDuplicates
BaseRecalibrator
ApplyBQRS
ValidateSamFile
HaplotypeCaller

I work with paired-end sequencing data, and mostly each sample has one forward read FASTQ file, and one reverse read FASTQ file. However, I have a couple of samples for which sequencing data is divided onto multiple lanes. I have seen that most commands in the pipeline do not allow for multiple lanes input, but only one forward and one reverse (or one unmapped bam and one aligned bam for MergeBamAlignment). Should I merge all forward and all reverse FASTQ files before starting the pipeline (quality of each dataset seems comparable according to FastQC/multiQC) or only later (and, if so, which step would be the best)?

Thanks for your input.


alignment


pipeline


ngs


picard


sequencing

• 183 views



Source link