After providing answers to over 100 questions here, I now have one of my own. Actually, this is a two-part question. What tool(s) do you use to calculate genetic heterogeneity from SNP genotype data collected across an entire chromosome or genome? If the measure of heterogeneity is at or near zero, then the individual (human, animal, plant) is a product of inbreeding. This number will rise as the parents come from increasingly divergent genetic backgrounds.

That then brings up the second part of the question. For those of you who have looked into such measures of genetic diversity or heterogeneity, how useful is this and what kinds of values can I expect from the human genome-wide SNP genotypes I have? A preliminary and crude analysis gave me 69% of SNPs across chromosome 7 as homozygous, but that value rises to 92+% across two small HLA loci. That seems interesting but I don't know where to go with this.

You will have to correct for each SNP the level of homozygosity for the level of homozygosity within the population of interest. When I have few samples I use the hapmap frequencies for that, but with many samples it is probably better to calculate the population frequencies for each allele from your data.

For several chips I've noticed that the minor allele frequencies for the HLA region SNPs is quite low. This will give a high rate of homozygous SNPs if you do not correct for that.

As I understand, genetic heterogeneity is a populational measure. For haplotye imputation, I favor BEAGLE. I think that getting good and suficient data is the hard part of the business.

Sincerely, I don't know a tool really able to calculate genetic diversity/heregeneity in a population genetics sense. Only R has useful packages/tools (DEMEtics, popgen, genetics, pegas). But even those must be hacked most of the time to accept SNP data. So, normally I develop my own approach based on the ideas in this paper. Nevertheless, there are a lot of problems with such analysis. The effective number of genes per locus is highly variable across a chromosome/genome. This discrepancy is even higher between regions with quite different recombination rate. Low diversity could simply reflect insuficient populational sampling or biased haplotype reconstruction.

Complementary to it, biased gene conversion and/or genetic hitchhiking could give you the same impression. Hence, low diversity could be the result of excess recombination in the presence of homology, selection at linked loci or low effective population size at that locus. You cannot distinguish them without a linkage map or similar.


In the present scenario is there any software/tool which can Calculate Genetic Heterogeneity From Genotype or SNP Data?

As a follow-up question: are there ways to also quantify heterogeneity from RNA-seq (or transcriptomics) data?

