gravatar for 275237370

2 hours ago by

Dear developer.

I am working on measuring the sv calling performance of vg software using nanopore data (80× coverage) and Illumina Platinum Genome data (50× coverage) in rice. It was confirmed that insertion recall in vg was very low score. But, sample test data from vg software is high score. I would be grateful if you could tell me the reason of low recall score.

I used rice reference file and PAV information (from sample1) for constructing graph for vg.

Is there any way to improve it?

Can you peruse the command below?

Use [vg: variation graph tool, version v1.25.0 "Apice"]

Use [toil-vg: version 1.6.2a1]

for i in $(seq 1 12);do vg construct -r ref.fa -v Chr$i.vcf.gz -S -R Chr$i -C -p -f -a -t 48 > Chr$i.vg;done
vg ids -j  for i in $(for i in $(seq 1 12); do echo Chr$i.vg ;done) 
vg index -t 48 -x all.xg $(for i in $(seq 1 12); do echo Chr$i.vg ;done) 
for i in $(seq 1 12);do vg prune -r Chr$i.vg -t 48 > Chr$i.pruned.vg
vg index -g all.gcsa $(for i in $(seq 1 12); do echo Chr$i.pruned.vg; done)
vg map -x all.xg -g all.gcsa -f sample1_1.fastq.gz -f sample1_2.fastq.gz >sample1.aln.gam
vg pack -x all.xg -g sample1.aln.gam  -Q 20 -t 48 -o sample1.pack
vg pack all.xg -k sample1.pack -s sample1 -t 48 >sample1.vcf
toilvg  vcfeval  ./jobStore .   --vcfeval_baseline truth.vcf.gz  --call_vcf sample1.vcf.gz  --sveval --vcfeval_sample sample1 --realTimeLogging --realTimeStderr  --min_sv_len 50 --ins_max_gap 1000

 
Result

Recall_INS=0.5085
Recall_DEL=0.9496

Thanks for your help.
Best wishes,

link

modified 1 hour ago

by

_r_am31k

written
2 hours ago
by

2752373700



Source link