gravatar for kstafford32

2 hours ago by

Hello,

I conducted imputation using TOPMED, and received an output of vcf.gz files separated for each 22 chromosomes. I would like to now compute a PRS analysis using PRSice-2.

How can I properly convert my vcf.gz files to concated plink binary file or concated .bgen file and retain the snpIDs, pvalues, and alleles column?

According to the PRSice tutorial as well as other forums, PRSice does not accept vcf.gz files, only plink bed files or .bgen files.

Thus, I attempted to make a concated .bgen file using this:

vcf-concat *.vcf.gz | gzip -c > imputedtopmedresults.concat.ALLchrs.vcf.gz
ml qctool
qctool -g imputedtopmedresults.concat.ALLchrs.vcf.gz -vcf-genotype-field GP -og imputedtopmedresults.concat.ALLchrs.converted.bgen

I then fed this imputedtopmedresults.concat.ALLchrs.converted.bgen file in as my base data for the PRSice code:

Rscript PRSice.R 
    --prsice ./PRSice_linux 
    --base imputedtopmedresults.concat.ALLchrs.converted.bgen 
    --target MDD.QC.gz 
    --thread 1 
    --stat BETA 
    --beta 
    --binary-target F

This error was returned:

Error: Column for the effective allele must be provided!
Error: Column for the SNP ID must be provided!
Error: Column for the P-value must be provided!

During the conversion from vcf.gz to .bgen, it was clear that my snp-id's pvalues, and alleles were not retained. I then tried to convert my vfc.gz files using another method, to plink binary files:

for i in {1..22}; do
bcftools norm -Ob -m-any chr$i.dose.vcf.gz > chr$i.dose.bcf
done

for i in {1..22}; do
bcftools index chr$i.dose.bcf
done

ml plink
for i in {1..22}; do
plink --bcf chr$i.dose.bcf --const-fid 0 --make-bed --out chr$i_ped; done

I fed the plink binary file into PRSice and the same error occurred.

I went back to check the vcf.gz file and these headers are there:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT

How can I properly convert my vcf.gz files to concated plink binary file or concated .bgen file and retain the snpIDs, pvalues, and alleles column?

Or perhaps TOPMED doesn't provide pvalues, etc., and I am missing something here...?

Thank you

link

modified 1 hour ago

by

Sam3.3k

written
2 hours ago
by

kstafford320



Source link