If your error message is something similar like this:

java.lang.IllegalArgumentException: ERROR: inconsistent number of alleles for sample LH05 at marker [1 1088185 . A G]

then we've run into the same problem. I realized that the vcftools filtering is omitting genotype information, that's why beagle can't recognize the alleles. To be more specific, I extracted the line of chr1, 223216, and my diploid sample LH05 had


where the others (the normal ones) had something like:



./.:1,2:3:.:.:.:0,0,0:. (the first item separated by : is the genotype info, should be two of them because I have diploid samples)

I checked my files and found it happened as a single 1 as well.

The reason is I used vcftools filter (maf) to process results from GATK VariantFiltration step. This is actually not my first time discovered this problem with vcftools (last time I used --min-alleles and --max-alleles). That's why your un-vcftools-filtered vcf runs smoothly with beagle. I don't understand why other software never caught this error, probably because they regard .as ./. or 0/0 and continued anyway. This could be a problem if your statistic is sensitive to missing alleles.

Anyway, if people are using vcftools for filtering, PLEASE CHECK your results.

Source link