I am working on the HGDP-CEPH data set with 929 samples released recently. I selected biallelic SNPs, passed through filters including genotyping rate (95%), minor allele frequency (0.01) and used PLINK2 --king to determine relatedness. The recently released paper on HGDP mentioned the 929 samples are unrelated, but I found around ~72 pairs of them higher than the KING threshold for second-degree relatives (0.0884). I repeated several times with the same results, so wondering if anyone familiar with the dataset/ plink KING relatedness who can help out.


