gravatar for Volka

2 hours ago by

Hi all, I am working on VCF files on imputed data from the Sanger imputation server, and ran into a problem where there are entries that share the same SNP ID, but have different positions and alleles. An example below:

1 5265430 rs71574343 G C . PASS
RefPanelAF=0.321086;AN=210;AC=105;INFO=0.940381 GT:ADS:DS:GP
0|0:0.05,0:0.05:0.95,0.05,0 ...

1 5265438 rs71574343 C T . PASS RefPanelAF=0.302516;AN=210;AC=41;INFO=0.940706
GT:ADS:DS:GP 1|1:0.75,1:1.75:0,0.25,0.75 0|0:0,0:0:1,0,0
0|0:0,0:0:1,0,0 0|1:0,0.95:0.95:0.05,0.95,0
1|0:1,0.05:1.05:0,0.95,0.05 ...

How should I handle these? I am looking to carry out QC with PLINK, but these duplicates cause errors. I have already used the following commands beforehand to remove duplicates and IDs that are '.':

bcftools norm -d all in.dose.vcf.gz -o out.vcf

bcftools view -e 'ID=="."' in.vcf -o out.vcf

link

written
2 hours ago
by

Volka60



Source link