I have downloaded genomic data for 1000 g phase I samples from www.ncbi.nlm.nih.gov/projects/faspftp/1000genomes/.
I checked the resultant .FAM file (1092 rows, each corresponds to 1 sample in 1000 g phase I release) and noted that there is a column named
member whose first 20 cases are :
HG00096 HG00097 HG00099 HG00100 HG00101 HG00102 HG00103 HG00104
HG00106 HG00108 HG00109 HG00110 HG00111 HG00112 HG00113 HG00114
HG00116 HG00117 HG00118 HG00119
I wish to determine the population (eg, CHB, JPT, CEU) and super population (eg, EAS, EUR, AFR) from the member IDs. To do so, I downloaded pedigree file from www.internationalgenome.org/faq/can-i-get-phenotype-gender-and-family-relationship-information-samples/.
The pedigree file has 3501 rows rather than 1092 rows. This file has a column namded
Individual ID whose contents are: HG01879, HG01880, HG01881, etc. However, none of the
member in my .FAM file can be found among the 3501 rows of the pedigree file! These two files are completely irrelevant.
I would like to ask if it is possible to determine population source of the 1092 1000 g samples from their
member ID. If yes, where could I find such meta data that relates ID to population source?