The contents of those files are explained in the README; however, I'm not sure how well supported is that format specification (if it is even defined as a format specification).

I have an imputed 1000 Genomes Phase III dataset on my computer that I utilise for various projects —including LD analyses— when needed. If you follow this tutorial, you'll also have this imputed dataset: Produce PCA bi-plot for 1000 Genomes Phase III - Version 2

Once you have all of those files, generating a LD heatmap for any region is as simple as this, taken from my notes:

1, extract variants overlapping the region of interest +/- 1Kbp from the 1000 Genomes Phase III data

bcftools view chr3.1kg.phase3.v5.vcf.gz -R Gene_hg19.bed | 
  bcftools annotate -Ob -I +'%CHROM:%POS:%REF:%ALT' > Region.bcf ;
bcftools index Region.bcf ;

Here, the co-ordinates of the region of interest are stored in the BED file. I also set the VCF ID field to CHROM:POS:REF:ALT if it's not already set

2, extract different populations

bcftools view -Ob --samples-file EAS.list Region.bcf > EAS.bcf ;
bcftools index EAS.bcf ;

Here, EAS.list comprises a list of sample IDs representing the East Asian (EAS) super population. I identified these from 20130606_g1k.ped

3, convert the BCF file to PLINK format (requires PLINK >= 1.9)

plink --noweb --bcf EAS.bcf 
  --vcf-idspace-to _ 
  --allow-extra-chr 0 
  --split-x b37 no-fail 
  --out EAS ;

4, export data to HaploView format

plink --noweb --bfile EAS 
  --chr 3 --from-bp 135364584 --to-bp 135399507 
  --snps-only no-DI 
  --out EAS_Haploview ;

5, determine haplotype blocks in HaploView

I have written the following notes:

Start HaploView. In the ‘Welcome to HaploView’ window, select the
‘LINKAGE Format’ tab. Then, click on the ‘browse’ button and select your PED file as ‘Data
File’ - this is the PED file that you generated from PLINK.

Click ‘OK’.

Select the ‘LD Plot’ tab to generate a LD heatmap. To save the plot as a Portable Network Graphics (PNG) file, go to File > Export current tab to PNG

6, LD heatmap in R

Then, you can read that HaploView-exported file back into R to mess around with it:


par(mar=c(0,0,0,0), mfrow=c(4,1))

haploblocks <- readPNG("EAS_Haploblocks.png")
plot(1:2, type="n", main="", xlab="", ylab="", xaxt="n", yaxt="n", axes=FALSE)
lim <- par()
rasterImage(haploblocks, lim$usr[1], lim$usr[3], lim$usr[2], lim$usr[4]-0.2)

points <- cbind(x=c(1.925, 1.96, 1.96, 1.925), y=c(1.5, 1.5, 1.1, 1.1))
legend.gradient(points, cols=brewer.pal(n=9, name="Greys"), title=bquote(R^2), limits=c(0,1))


NB - the gene track I added also via R, and also via readPNG(). I previously downloaded it from UCSC and saved it as a separate PNG


Source link