Changing sample name in multiple VCF files

1

Hi everyone,

This is more of a scripting question but hopefully someone can help.

I have used GATK's HaplotypeCaller to call SNPs for 150 samples, and Picard's GatherVcfs to merge each sample into a single GVCF file. I now want to import the 150 merged GVCFs into GenomicsDB to perform joint genotyping.

Somewhere along the way every sample has been renamed 'Sample1' and so GenomicsDBImport is throwing out a duplicated samples error.

Does anyone know an efficient way of replacing the sample names in all 150 files? I thought about doing a nested loop in Linux something like:

for F in $(cat $fileList)  
do
    for G in $(cat $newNames)  
    do
        bcftools reheader ${F}.g.vcf.gz -s $G 
    done
done

But

a) I'm not sure if I have that loop set up correctly

b) I'm getting more confused by bcftools requiring a file and not a string as input. Would I need to create 150 files with a single name in and then provide a list of those file within newNames?

Any help would be much appreciated!


gatk


bcftools


linux


vcf

• 150 views



Source link