genomicdbImport issue


Hello everyone,

I need some help, I have around 50,000 WES gvcfs which I am trying to merge using (gatk genomicdbimport tool) and do the Multi-sample calling later. even though I am using the biggest node on our cluster the merging is extremely slow. it takes almost 5-6 days to merge a batch of only 5000 samples. I was just wondering if you guys have a solution to this problem. I tried different things but no success so far.

gatk --java-options "-Xmx250g -Xms250g" GenomicsDBImport --genomicsdb-workspace-path $SCRATCH/database/UKB_Database --batch-size 5000 -L $SCRATCH/interval.list --sample-name-map $SCRATCH/cohort.sample_map.txt --tmp-dir $SCRATCH/temp/ --reader-threads 6

gatk --java-options "-Xmx250g -Xms250g" GenomicsDBImport --genomicsdb-update-workspace-path $SCRATCH/database/Batch1 --batch-size 5000 --sample-name-map $SCRATCH/batch0.txt --tmp-dir $SCRATCH/temp/ --reader-threads 6

these are the commands that I am using, Since I need some selected regions so in the first command I am passing an interval list in .bed format..

looking forward to hearing from you. thanks in advance.

Kind Regards,




before adding your answer.

Traffic: 2353 users visited in the last hour

Source link