gravatar for abyousaf

3 hours ago by

I am trying to merge vcf files across chromosomes 1-22. I am using bcftools v1.9 in order to do this. The code I am using is bcftools merge 'myfile1.vcf.gz' 'myfile2.vcf.gz'etc....'myfile22.vcf.gz' -o myfile1_22.vcf.gz

However I get the following error: "Error: Duplicate sample names (1310229_1310229), use --force-samples to proceed anyway."

I'm afraid to use --force-samples because I don't understand how this will affect the merged vcf file and how many duplicates there are. The data is from the UK Biobank and the VCF files are massive in size (total across chromosomes =1.3TB).

Any suggestions to actually solve the error rather than use --force-samples?

NOTE: I am VERY VERY new to biostatistical analysis. I appreciate your advice heavily. I would appreciate it more if your advice was structured for a beginner.

link

modified 2 hours ago

by

Medhat8.7k

written
3 hours ago
by

abyousaf0



Source link