Ibd (Identity By Descent) And The Chosen Value Of Pi_Hat


I try to understand how to chose the optimized pi_hat parameter for a dataset. In many articles, they chose 0.2 as pi_hat, and everything above that is considered to be cryptic relatedness or duplicates.

I've tested IBD on HapMap, the files I use can be found here: ftp.ncbi.nlm.nih.gov/hapmap/genotypes/2009-01_phaseIII/plink_format/. I first remove all annotated offspring from HapMap. Then I peform IBD to see if it still finds samples with cryptic relatedness to each other. The steps I peform are the following (in PLINK):

1) LD-prune:

plink --file hapmap --indep-pairwise 50 5 0.2
plink --file hapmap --extract plink.prune.in --recode --out hapmap_pruned

(2) IBD:

plink --file hapmap_pruned --genome --min 0.2

The results shows that many cryptic related samples can be found with a pi_hat of 0.2 as threshold, even if all offspring were initially removed. My question is, is this a normal behavior? Or should one increase the pi_hat? How to find out a "good" pi_hat for a custom dataset?



Source link