I'm doing phasing with beagle 5.2 on SNP data from illumina microarray.
Starting from an unphased VCF with around 600,000 SNPs.
I also trio-phased the same VCF, so I have a phased control VCF
I run a simple pipeline,
java -Xmx4g -jar beagle.28Jun21.220.jar impute=false gt=source.vcf map=./hapmap/plink.chr1.GRCh37.map out=out iterations=40 ref=./chr1.1kg.phase3.v5a.b37.bref3 chrom=1
The genetic map from hapmap, and the reference from 1000genomes.
The resulting "phased" VCF from beagle differs greatly from the one I got from the trio phasing. Anyone knows any parameter tunning to apply in order to have the proper phased output? I tried larger window (up to 100Cm), more iterations (up to 120), larger overlap (up to 5Cm), with no good results.
I tried to reduce the reference human assembly, extraction only the positions that are present in the source VCF, using bedtools:
in 1st place I uncompress the bref3:
java -jar unbref3.28Jun21.220.jar chr1.1kg.phase3.v5a.b37.bref3 > chr1.1kg.phase3.v5a.b37.vcf
the I extract the intersection between this VCF and the source file:
bedtools intersect -b source.vcf -a chr1.1kg.phase3.v5a.b37.vcf > reduced.chr1.1kg.phase3.v5a.b37.vcf
at a last step I ran beagle again:
java -Xmx4g -jar beagle.28Jun21.220.jar impute=false gt=source.vcf map=./hapmap/plink.chr1.GRCh37.map out=out iterations=40 ref=./reduced.chr1.1kg.phase3.v5a.b37.bref3 chrom=1
but the "phase" output still different form the phased data confirmed by trio.
Any clues or suggestions?
Thank you in advance.
PS: This is an extract from the source VCF:
##fileformat=VCFv4.2 ##FILTER=<ID=PASS,Description="All filters passed"> ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT arivcf 1 82154 rs4477212 A . . . . GT 0/0 1 752566 rs3094315 G . . . . GT ./. 1 752721 rs3131972 A . . . . GT 0/0 1 768448 rs12562034 G . . . . GT ./. 1 776546 rs12124819 A . . . . GT ./. 1 798959 rs11240777 G A . . . GT 1/0 1 800007 rs6681049 T . . . . GT ./. 1 838555 rs4970383 C . . . . GT 0/0 1 846808 rs4475691 C T . . . GT 0/1 1 854250 rs7537756 A . . . . GT 0/0 1 861808 rs13302982 A G . . . GT 0/1 1 873558 rs1110052 G T . . . GT 0/1 1 882033 rs2272756 G A . . . GT 1/0