gravatar for nitinra

2 hours ago by

Hello all,

I am trying to calculate nucleotide diversity on 192 samples and have used vcftools and pixy to calculate it. However, the results from both pipelines are dissimilar. Is there a way to evaluate which one is the accurate estimate of nucleotide diversity?

Here is the pipeline I used:

vcftools --vcf input.vcf --max-missing 0.1 --minQ 30 --maf 0.1 --remove lowdepthindividuals --recode --recode-INFO-all --out output_filtered.vcf
bcftools +prune -l 0.2 -w 50kb output_filtered.vcf -Ov -o output_filtered_ldpruned.vcf

Pi calculations:

vcftools --vcf output_filtered_ldpruned.vcf --window-pi 10000 --out pi


pixy --stats pi --vcf output_filtered_ldpruned.vcf --zarr_path ./zarr 
--window_size 10000 --populations allpop.list --bypass_filtration yes 
    --bypass-invariant-sites yes --outfile_prefix results/combined

The results from VCFtools have pi estimates between 0 - 0.020 whereas the ones from pixy has estimates from 0.1 - 0.4. What could be causing the discrepancy between the two methods?


modified 1 hour ago

2 hours ago


Source link