gravatar for serpalma.v

2 hours ago by



We are analyzing a WGS data of 60 samples (6 groups, 10 samples/group) produced by HiSeq4000. The mean coverage per sample is 25x (lowest sample is 15x).

Now we realized we need to sequence more samples in order to better estimate the allele frequencies. Due to budget and technical constrains we came down to sequence 90 samples (6 groups, 15 samples/group) at a target coverage of 5x. This time on a NovaSeq platform.

Now each group has 25 samples (10 from Hiseq4000 and 15 from NovaSeq).

Our aim is to do population analysis using SNP allele frequencies after combining the Hiseq4000 (25x coverage) data and the NovaSeq (5x coverage) data.

My plan for the new batch (NovaSeq - 5x) is to run it through the steps of GATK's best practices until HaplotypeCaller and then combine it with the original batch (Hiseq4000 - 25x) using CombineGVCFs and do joint calling with GenotypeGVCF.

I am working with mice samples, so I will do VQSR afterwards.

Is there an issue with doing joint variant calling and VQSR using information from different thechnologies?

A similar thread is found here but data was produced with the same thechnology. Nonetheless, it is mentioned that different patterns of coverage could potentially create confusion in model building during VQSR.

I know this is not a "do this, do that" answer. I would appreciate comments and suggestions.

DISCLAIMER: I have posted this question on the gatk forum a while ago (~2mo), but they haven't had time to address my concerns.

Source link