Beagle: Skip intervals with no common markers

2

Hi
I am using Beagle to perform genotype imputation. I first used conform-gt to adjust genomic position, allele order and chr strand of the markers in my vcf.gz data to match the reference panel. Then I ran this command with Beagle to perform imputation per chromosome:

 java -Xmx50g -jar beagle.25Nov19.28d.jar gt=chr1.vcf.gz out=imputed_b37_imputed ref=chr1.1kg.phase3.v5a.b37.bref3 map=plink.chr1.GRCh37.map chrom=1 impute=true

After several hours of running, I get the following error:

ERROR: Reference and target files have no markers in common in interval: 
       1:165113264-205459274

Common markers must have identical CHROM, POS, REF, and ALT fields.
Exiting program.

How can I skip the intervals with no common markers and proceed with imputation, without exiting the program ?


beagle


plink


vcf


conformgt

• 952 views

updated 2 hours ago by

▴

10

written 16 months ago by

▴

10

Hi so I had the same issue. The problem is that your phasing window size is too small compared to the average spacing of the markers in your input genotypes data set. Beagle estimates haplotypes across windows, or intervals of the genome, and if this window size is too small there will be some windows created during runtime with no common markers in them at all (common markers being markers present in both your dataset and the reference panel you are using). The fix is simple: simply re-run your code but this time increase the window size by setting beagle's window parameter: window=[positive float]. The default window size is 40.0. (Beagle's window size parameter is not measured in units of base pair.) An example call is:

        beagle 
            ref= chr20.referencePanel.vcf.gz
            map=plink.GRCh37.map 
            gt=chr20.inputGenotypes.vcf.gz 
            chrom=2 
            nthreads=20 
            window=100.0 

Be careful though. As you increase the window size, the runtime memory that beagle needs to perform the imputation will increase. This makes sense since, as we chose bigger windows, each window will include more snvs from the reference panel and so the haplotype estimation across that window becomes more computationally expensive. You need to strike a balance between window size and memory allocation.

yes... pls , I have the same problem 🙁


Login
before adding your answer.

Traffic: 2741 users visited in the last hour



Source link