gravatar for lxiao63

2 hours ago by

Dear all,

I am very much a beginner in genetic data analysis. I am recently trying to learn to perform GWAS in R through the article "A guide to genome-wide association analysis and post-analytic interrogation". During SNP imputation, the authors used SNP data on Chr16 for demonstration. The authors used read.pedfile function in snpStats package to load "chr16_1000g_CEU.ped" and "chr16_1000g_CEU.info" files into R (files publicly available from www.mtholyoke.edu/courses/afoulkes/Data/GWAStutorial/).

I wish to find 1000 g SNP data for other chromesomes. From ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/analysis_results/integrated_call_sets/, I found vcf.gz and vcf.gz.tbi files associated with each chromosome. For example, for chromosome 16, I found "ALL.chr16.integrated_phase1_v3.20101123.snps_indels_svs.genotypes.vcf.gz" and "ALL.chr16.integrated_phase1_v3.20101123.snps_indels_svs.genotypes.vcf.gz.tbi".

My questions are:

  1. Are the vcf.gz and vcf.gz.tbi files for Chr16 I found equivalent to the "chr16_1000g_CEU.ped" and "chr16_1000g_CEU.info" files the authors provided? If yes, I may just download SNP data for other chromosomes for my own GWAS.

  2. I understand the vcf.gz file contains genotype information and vcf.gz.tbi contain position information. I tried to load these two files which I downloaded from 1000 g webpage into R but I failed. I also resorted to an 8-year-old post in Biostars (Loading 1000 Genomes Vcf Files In R) but it did not work. My guess is that the vcf.gz file is analogous to the "chr16_1000g_CEU.ped" in the paper and the vcf.gz.tbi file is analogous to the "chr16_1000g_CEU.info" file. But I did not find ways to convert vcf.gz to .ped and vcf.gz.tbi to .info before loading into R. Nor did I find methods that can load vcf.gz and vcf.gz.tbi directly into R. Any solution is welcome.

Thanks,
Patrick Lv



Source link