If your error message is something similar like this:
java.lang.IllegalArgumentException: ERROR: inconsistent number of alleles for sample LH05 at marker [1 1088185 . A G]
then we've run into the same problem. I realized that the vcftools filtering is omitting genotype information, that's why beagle can't recognize the alleles. To be more specific, I extracted the line of chr1, 223216, and my diploid sample LH05 had
where the others (the normal ones) had something like:
./.:1,2:3:.:.:.:0,0,0:. (the first item separated by
: is the genotype info, should be two of them because I have diploid samples)
I checked my files and found it happened as a single
1 as well.
The reason is I used vcftools filter (maf) to process results from GATK VariantFiltration step. This is actually not my first time discovered this problem with vcftools (last time I used --min-alleles and --max-alleles). That's why your un-vcftools-filtered vcf runs smoothly with beagle. I don't understand why other software never caught this error, probably because they regard
0/0 and continued anyway. This could be a problem if your statistic is sensitive to missing alleles.
Anyway, if people are using vcftools for filtering, PLEASE CHECK your results.