Hi.
I am running a GWAS pipeline and am now imputing my genotypes. I have run shape-it for pre-phasing and am now running Impute2 for imputation. I have split each chromosome into chunks and run the command with no errors or warnings. However for some chunks, the .imputed file produced is empty and I don't know why. There is still information in the info and summary files though. Has something gone wrong or I can't find out if this is a normal output for some genomic locations?

Here is the summary output file produced for one chunk where there is no output in the .imputed file

Thanks very much for your help.
Kitty

======================

Copyright 2008 Bryan Howie, Peter Donnelly, and Jonathan Marchini
Please see the LICENCE file included with this program for conditions of use.

The seed for the random number generator is 578320895.

Command-line input: impute2 -use_prephased_g -known_Swiss_shapeit_chr22_map_1000G.phased.haps -h 1000GP_Phase3_chr22.hap.gz -l1000GP_Phase3_chr22.legend.gz -m genetic_map_chr22_combined_b37.txt -Ne 20000 -int 15000001 20000000 -o chr22_chunks/chr22_impute2_chunk1.imputed


Nomenclature and data structure

 Panel 0: phased reference haplotypes
 Panel 2: phased study haplotypes

For optimal results, each successive panel (0,1,2) should contain a subset of the SNPs in the previous panel. When the data structure deviates from this ideal configuration, IMPUTE2 tries to use as much of the available information as possible; see documentation for details.


Input files

     Panel 0 haplotypes: 1000GP_Phase3_chr22.hap.gz
     Panel 0 hap legend: 1000GP_Phase3_chr22.legend.gz
     Panel 2 known haps: Swiss_shapeit_chr22_map_1000G.phased.haps
            genetic map: genetic_map_chr22_combined_b37.txt

Output files

            main output: chr22_chunks/chr22_impute2_chunk1.imputed
            SNP QC info: chr22_chunks/chr22_impute2_chunk1.imputed_info
         sample QC info: chr22_chunks/chr22_impute2_chunk1.imputed_info_by_sample
            run summary: chr22_chunks/chr22_impute2_chunk1.imputed_summary
            warning log: chr22_chunks/chr22_impute2_chunk1.imputed_warnings

Data processing

-reading genetic map from -m file
--filename=[genetic_map_chr22_combined_b37.txt]
--read 3753 SNPs in the analysis interval+buffer region

-reading Panel 2 haplotypes from -known_haps_g file
--filename=[Swiss_shapeit_chr22_map_1000G.phased.haps]
--detected 1065 individuals
--read 782 SNPs in the analysis interval+buffer region
--added 782 new SNPs based on known haplotypes

-reading Panel 0 haplotypes from -h and -l files
--filename=[1000GP_Phase3_chr22.hap.gz]
--filename=[1000GP_Phase3_chr22.legend.gz]
--detected 5008 haplotypes
--read 117035 SNPs in the analysis interval+buffer region

-removing SNPs that violate the hierarchical data requirements
--no SNPs removed

-removing reference-only SNPs from buffer region
--removed 8307 SNPs

-checking strand alignment between Panel 2 and Panel 0 by allele labels
--flipped strand due to allele mismatch at 367 out of 782 SNPs in Panel 2

-aligning allele labels between panels

-removing non-aligned genotyped SNPs
--removed 0 out of 774 SNPs with data in multiple panels


Data summary

[type 0 = SNP in Panel 0 only]
[type 1 = SNP in Panel 1]
[type 2 = SNP in Panel 2 and all ref panels]
[type 3 = SNP in Panel 2 only]

-Upstream buffer region
--0 type 0 SNPs
--0 type 1 SNPs
--0 type 2 SNPs
--0 type 3 SNPs
--0 total SNPs

-Downstream buffer region
--0 type 0 SNPs
--0 type 1 SNPs
--55 type 2 SNPs
--0 type 3 SNPs
--55 total SNPs

-Analysis region (as defined by -int argument)
--107954 type 0 SNPs
--0 type 1 SNPs
--719 type 2 SNPs
--8 type 3 SNPs
--108681 total SNPs

-Output file
--107954 type 0 SNPs
--0 type 1 SNPs
--719 type 2 SNPs
--8 type 3 SNPs
--108681 total SNPs

-In total, 108736 SNPs will be used in the analysis, including 774 Panel 2 SNPs

-setting storage space


Run parameters

    reference haplotypes: 5008 [Panel 0]
       study individuals: 1065 [Panel 2]
       sequence interval: [15000001,20000000]
                  buffer: 250 kb
                      Ne: 20000
       input call thresh: 0.900
 burn-in MCMC iterations: 0
   total MCMC iterations: 1 (1 used for inference)

HMM states for imputation: 500 [Panel 0->2]
active flags: <-use_prephased_g>


Run log

RESETTING PARAMETERS FOR "SURROGATE FAMILY" MODELING
-setting mutation matrices
-setting switch rates

diploid sampling success rate: (no diploid sampling performed)

haploid sampling success rate: (no haploid sampling performed)


Imputation accuracy assessment

The table below is based on an internal cross-validation that is performed during each IMPUTE2 run. For this analysis, the program masks the genotypes of one variant at a time in the study data (Panel 2) and imputes the masked genotypes by using the remaining study and reference data. The imputed genotypes are then compared with the original genotypes to produce the concordance statistics shown in the table. You can learn more about this procedure and the contents of the table at mathgen.stats.ox.ac.uk/impute/concordance_table_description.html.

In the current analysis, IMPUTE2 masked, imputed, and evaluated 765735 genotypes that were called with high confidence (maximum probability >= 0.90) in the Panel 2 input file (-g or -known_haps_g).

When the masked study genotypes were imputed with reference data from Panel 0, the concordance between original and imputed genotypes was as follows:

Interval #Genotypes %Concordance Interval %Called %Concordance
[0.0-0.1] 0 0.0 [ >= 0.0] 100.0 94.5
[0.1-0.2] 0 0.0 [ >= 0.1] 100.0 94.5
[0.2-0.3] 0 0.0 [ >= 0.2] 100.0 94.5
[0.3-0.4] 0 0.0 [ >= 0.3] 100.0 94.5
[0.4-0.5] 1902 43.9 [ >= 0.4] 100.0 94.5
[0.5-0.6] 14508 54.4 [ >= 0.5] 99.8 94.6
[0.6-0.7] 14318 61.3 [ >= 0.6] 97.9 95.4
[0.7-0.8] 17094 70.0 [ >= 0.7] 96.0 96.1
[0.8-0.9] 25654 77.7 [ >= 0.8] 93.8 96.7
[0.9-1.0] 692259 97.4 [ >= 0.9] 90.4 97.4



Source link