gravatar for yy237

2 hours ago by

Canada

Hi,

I used the TOPMed Imputation server to impute SNPs for my genotype data. After converting the output VCF files to plink binary files, I used --list-duplicate-vars in plink to check duplications of SNP IDs in my .bim file and realized the pairs of "SNP ID duplications" were actually the same SNP (same position) but with the reference and alternative alleles flipped in orders (please see below for the first few rows of my plink.dupvar file).

In such a case, should I still exclude all the duplicate IDs from the dataset moving forward? Does that mean I will be losing information for both SNPs in each pair of SNP duplicates?

(So far I only checked the .bim file for chromosome 1, and 1472 pairs of duplicate IDs were identified)

CHR POS ALLELES IDS

1 869598 T,TA chr1:869598:T:TA chr1:869598:TA:T

1 1177060 C,CTG chr1:1177060:C:CTG chr1:1177060:CTG:C

1 1293960 T,TCGGGG chr1:1293960:T:TCGGGG chr1:1293960:TCGGGG:T

1 1693590 A,AC chr1:1693590:A:AC chr1:1693590:AC:A

1 2299253 A,AG chr1:2299253:A:AG chr1:2299253:AG:A

1 2423169 T,TTTTG chr1:2423169:T:TTTTG chr1:2423169:TTTTG:T

1 2808716 G,GC chr1:2808716:G:GC chr1:2808716:GC:G

Thank you!

link

modified 2 hours ago

written
2 hours ago
by

yy23720



Source link