bcftools filter by protein prediction


I have VEP annotated vcf files with following content:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  file
chr1    183937  .       G       A       58.9    PASS    CSQ=||||||||||||MODIFIER|FO538757.1|ENSG00000279928|ENST00000624431|unprocessed_pseudogene||4/4|||||;AC=1;AN=2  GT:GQ:DP:AD:VAF:PL      0/1:51:26:15,11:0.423077:58,0,51
chr1    601436  .       C       T       4.9     PASS    CSQ=||||||||||||MODIFIER|AL669831.3|ENSG00000230021|ENST00000634337|processed_transcript|4/5||404||||,||||||||||||MODIFIER|AL669831.3|ENSG00000230021|ENST00000634833|processed_transcript|3/6||317||||;AC=1;AN=2    GT:GQ:DP:AD:VAF:PL      0/1:5:26:19,7:0.269231:3,0,17 

I would like to filter out protein coding variants, but get following errors:

bcftools view -f "protein_coding" file > out
[E::bcf_write] Broken VCF record, the number of columns at chrX:152737049 does not match the number of samples (0 vs 1)
[main_vcfview] Error: cannot write to (null)

bcftools filter -i 'BIOTYPE="protein_coding"' file > aaa 
[filter.c:2491 filters_init1] Error: the tag "BIOTYPE" is not defined in the VCF header

How should I filter such variants, if the field is in CSQ field between pipes?

Thank you!



Source link