gravatar for lima80in

2 hours ago by

hi I'm learning and doing an exercise on variant calling in e coli
I'm an amature and need help
I started with 6 files fastq format for ecoli from a study in ena
i also downloaded the genome file fasta format for that particular strain
i did the follwing in the sequence
fastq runs,
qc on trimmed data,
BWA index on the fasta file,
samtools faidx,
samtools dict,
align trimmed data to the reference using bwa mem (i get a sam file),
convert, sort, index using samtools,
next I used GATK to mark duplicates,
GATK to add or replace groups,

next im supposed to use GATK BaseRecalibrator for which i need a known site reference for polymorphisms in ecoli
this is a vcf file

how am i supposed to get this file or arrive at the step

the ecoli strain I'm looking at is ecoli rel606

Source link