I am trying to impute SNP array data on the Michigan imputation server. My overlap with the reference panel is about 60%, but is leading to some dropped chunks due to poor overlap. I plotted MAF of sites that do not overlap with reference and find that they are the rarest of the rare among all my input, so does make sense that those would be less likely to be in the reference panel. One thought I had to boost my overlap and prevent dropped chunks was to

  1. take out snps with low MAF pre-imputation, which should boost my overlap
  2. do the imputation
  3. put the snps with low MAF back in after imputation.

Is this a thing that people do? Or am I just stripping important information from my input?


