I am using GATK 126.96.36.199 to filter variants according to the VQSR steps listed here: How to Filter variants with VQSR . I am working with WGS samples where the cohort sizes are about 3-5 samples. According to this gatk documentaiton , I should be fine since "the procedure tends to work well enough with at least one whole genome or 30 exomes. Anything smaller than that scale is likely to run into difficulties, especially for the indel recalibration."
I am continually getting the following error on the indel recalibration step (and then subsequently on the snp recalibration step):
A USER ERROR has occurred: Bad input: Values for ReadPosRankSum annotation not detected for ANY training variant in the input callset. VariantAnnotator may be used to add these annotations.
Here is my call:
java -Xmx30g -Xms30g -jar gatk.jar VariantRecalibrator -V Kindred_sites_only.variant_filtered.vcf.gz -O Kindred_indels.recal \ --tranches-file Kindred_indels.tranches \ --trust-all-polymorphic \ -tranche 100.0 -tranche 99.95 -tranche 99.9 -tranche 99.5 -tranche 99.0 -tranche 97.0 -tranche 96.0 -tranche 95.0 -tranche 94.0 \ -tranche 93.5 -tranche 93.0 -tranche 92.0 -tranche 91.0 -tranche 90.0 -an FS -an ReadPosRankSum -an MQRankSum -an QD -an SOR \ -an DP --mode INDEL --max-gaussians 4 --resource:mills,known=false,training=true,truth=true,prior=12 Mills_and_1000G_gold_standard.indels.hg38.vcf.gz \ --resource:axiomPoly,known=false,training=true,truth=false,prior=10 Axiom_Exome_Plus.genotypes.all_populations.poly.hg38.vcf.gz \ --resource:dbsnp,known=true,training=false,truth=false,prior=2 dbsnp_146.hg38.vcf.gz --rscript-file src/scripts/Kindred_indels.plots.R
Does anyone have any insight into why this error is occurring? I have used different dbsnp version (138, 144, 146 -- all downloaded from the Broad) and I have validated the input vcf file with ValidateVariants. As the error suggests, I could use VariantAnnotator -- but a) it is still in beta and b) I am already using dbsnp_146.hg38.vcf.gz within the VariantRecalibrator call so it seems redundant.
Any suggestions are greatly appreciated.