I have some mouse C57 targeted panel sequencing data. I want to call somatic variants using GATK. Because of the cost, there are only 4 normal-tumour matched samples. and the rest 16 tumour samples have no matching normal.
Q1) Should I be using the latest assembly GRCm39 as the reference for bwa or GRCm38? My concern is the files needed in later steps might not be available for GRCm39. e.g. dbSNP availableon GRCm38 but cannot find any on GRCm39.
Q2) Should I process the tumour with normal samples differently in
tumour with matched normal mode for the 4 T-N matched samples and
tumour only mode for the remaining 16 samples?
Q3) I have trouble finding these two files:
--known-sites sites_of_variation.vcf for
--germline-resource af-only-gnomad.vcf.gz for
I found 2 links for
--known-sites sites_of_variation.vcf .
Do I need to prepare the files as per: genomics/gatk-mouse-mm10.md at master · igordot/genomics · GitHub. It is taking hours to download one file and NCBI connection keeps dropping...
I have also found the following vcf files. Are the two below suitable to use as
--known-sites sites_of_variation.vcf ? Whats the difference? and which one should be used?
Sanger REL-1505 mouse strain specific vcf:
Should I be using both
C57...indels.vcfas input for
--known-sites sites_of_variation.vcf ?
Lastly, I cannot find anything on the Mutect2 required
--germline-resource af-only-gnomad.vcf.gz Could you help please?
Sorry for the million questions and thank you in advance!