gravatar for lxiao63

3 hours ago by

I have downloaded genomic data for 1000 g phase I samples from www.ncbi.nlm.nih.gov/projects/faspftp/1000genomes/.

I checked the resultant .FAM file (1092 rows, each corresponds to 1 sample in 1000 g phase I release) and noted that there is a column named member whose first 20 cases are :

HG00096 HG00097 HG00099 HG00100 HG00101 HG00102 HG00103 HG00104
HG00106 HG00108 HG00109 HG00110 HG00111 HG00112 HG00113 HG00114
HG00116 HG00117 HG00118 HG00119

I wish to determine the population (eg, CHB, JPT, CEU) and super population (eg, EAS, EUR, AFR) from the member IDs. To do so, I downloaded pedigree file from www.internationalgenome.org/faq/can-i-get-phenotype-gender-and-family-relationship-information-samples/.

The pedigree file has 3501 rows rather than 1092 rows. This file has a column namded Individual ID whose contents are: HG01879, HG01880, HG01881, etc. However, none of the member in my .FAM file can be found among the 3501 rows of the pedigree file! These two files are completely irrelevant.

I would like to ask if it is possible to determine population source of the 1092 1000 g samples from their member ID. If yes, where could I find such meta data that relates ID to population source?

Thank you.

link

modified 2 hours ago

by

JC9.5k

written
3 hours ago
by

lxiao630



Source link