I'm doing phasing with beagle 5.2 on SNP data from illumina microarray.

Starting from an unphased VCF with around 600,000 SNPs.

I also trio-phased the same VCF, so I have a phased control VCF

I run a simple pipeline,

java -Xmx4g -jar beagle.28Jun21.220.jar impute=false gt=source.vcf map=./hapmap/plink.chr1.GRCh37.map out=out iterations=40 ref=./chr1.1kg.phase3.v5a.b37.bref3 chrom=1

The genetic map from hapmap, and the reference from 1000genomes.

The resulting "phased" VCF from beagle differs greatly from the one I got from the trio phasing. Anyone knows any parameter tunning to apply in order to have the proper phased output? I tried larger window (up to 100Cm), more iterations (up to 120), larger overlap (up to 5Cm), with no good results.

I tried to reduce the reference human assembly, extraction only the positions that are present in the source VCF, using bedtools:

in 1st place I uncompress the bref3:

java -jar unbref3.28Jun21.220.jar chr1.1kg.phase3.v5a.b37.bref3 > chr1.1kg.phase3.v5a.b37.vcf

the I extract the intersection between this VCF and the source file:

bedtools intersect -b source.vcf -a chr1.1kg.phase3.v5a.b37.vcf > reduced.chr1.1kg.phase3.v5a.b37.vcf

at a last step I ran beagle again:

java -Xmx4g -jar beagle.28Jun21.220.jar impute=false gt=source.vcf map=./hapmap/plink.chr1.GRCh37.map out=out iterations=40 ref=./reduced.chr1.1kg.phase3.v5a.b37.bref3 chrom=1

but the "phase" output still different form the phased data confirmed by trio.

Any clues or suggestions?
Thank you in advance.

jp

PS: This is an extract from the source VCF:

##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  arivcf
1       82154   rs4477212       A       .       .       .       .       GT      0/0
1       752566  rs3094315       G       .       .       .       .       GT      ./.
1       752721  rs3131972       A       .       .       .       .       GT      0/0
1       768448  rs12562034      G       .       .       .       .       GT      ./.
1       776546  rs12124819      A       .       .       .       .       GT      ./.
1       798959  rs11240777      G       A       .       .       .       GT      1/0
1       800007  rs6681049       T       .       .       .       .       GT      ./.
1       838555  rs4970383       C       .       .       .       .       GT      0/0
1       846808  rs4475691       C       T       .       .       .       GT      0/1
1       854250  rs7537756       A       .       .       .       .       GT      0/0
1       861808  rs13302982      A       G       .       .       .       GT      0/1
1       873558  rs1110052       G       T       .       .       .       GT      0/1
1       882033  rs2272756       G       A       .       .       .       GT      1/0



Source link