Thanks very much for providing the VCF that you're using. For others, it's
This VCF is corrupt and does not conform to the VCF specification. It has the following issues:
- whitespace in 'INFO' column
- contig '2' not defined in header
- 'FORMAT/GL' should be declared as Number
- 'FORMAT/PP' not defined in header
- 'FORMAT/BD' not defined in header
I was able to fix the VCF with these commands (below). Unfortunately, the 'FORMAT' field is a complete mess, so, I made an 'executive' decision to remove it, leaving just 'FORMAT/GT'. This loses some info, but leaves you with a validated VCF for anything else that you may want to do.
zcat GEUVADIS.chr2.PH1PH2_465.IMPFRQFILT_BIALLELIC_PH.annotv2.genotypes.vcf.gz | sed 's/ damaging/_damaging/g' | bgzip > test.vcf.gz ;
tabix -p vcf test.vcf.gz ;
bcftools annotate -x 'FORMAT' --force test.vcf.gz -Oz > test.fixed.vcf.gz ;
This will initially show the warnings relating to 'FORMAT', but the use of --force allows us to skip these warnings. Also, by removing the problematic 'FORMAT' tags, we avoid the subsequent segmentation fault that occurs.
bcftools annotate -x ID test.fixed.vcf.gz -Oz > test.fixed.noID.vcf.gz ;
java -jar SnpSift.jar annotate dbSnp144.vcf test.fixed.noID.vcf.gz