If your error message is something similar like this:

java.lang.IllegalArgumentException: ERROR: inconsistent number of alleles for sample LH05 at marker [1 1088185 . A G]

then we've run into the same problem. I realized that the vcftools filtering is omitting genotype information, that's why beagle can't recognize the alleles. To be more specific, I extracted the line of chr1, 223216, and my diploid sample LH05 had

.:0,0:.:.:0|1:1088185_A_G:.:1088185

where the others (the normal ones) had something like:

0|1:2,4:6:72:0|1:1088185_A_G:162,0,72:1088185

0/0:14,0:14:42:.:.:0,42,390:.

./.:1,2:3:.:.:.:0,0,0:. (the first item separated by : is the genotype info, should be two of them because I have diploid samples)

I checked my files and found it happened as a single 1 as well.

The reason is I used vcftools filter (maf) to process results from GATK VariantFiltration step. This is actually not my first time discovered this problem with vcftools (last time I used --min-alleles and --max-alleles). That's why your un-vcftools-filtered vcf runs smoothly with beagle. I don't understand why other software never caught this error, probably because they regard .as ./. or 0/0 and continued anyway. This could be a problem if your statistic is sensitive to missing alleles.

Anyway, if people are using vcftools for filtering, PLEASE CHECK your results.



Source link