Hi,

I am building an NGS pipeline from scratch. FASTQ files have been aligned to the hg19 reference with BWA-MEM. Samtools was used for sorting and creating the index. Picard tools was used for marking duplicates and estimate the library complexity.

At this point, I want to run GATK BaseRecalibrator. However, I get this error message:

A USER ERROR has occurred: Input files reference and features have incompatible contigs: No overlapping contigs found.
reference contigs = [NC_000001.10, NT_113878.1, NT_167207.1, NC_000002.11, NC_000003.11, NC_000004.11, NT_113885.1, NT_113888.1, NC_000005.9, NC_000006.11, NC_000007.13, NT_113901.1, NC_000008.10, NT_113909.1, NT_113907.1, NC_000009.11, NT_113914.1, NT_113916.2, NT_113915.1, NT_113911.1, NC_000010.10, NC_000011.9, NT_113921.2, NC_000012.11, NC_000013.10, NC_000014.8, NC_000015.9, NC_000016.9, NC_000017.10, NT_113941.1, NT_113943.1, NT_113930.1, NT_113945.1, NC_000018.9, NT_113947.1, NC_000019.9, NT_113948.1, NT_113949.1, NC_000020.10, NC_000021.8, NT_113950.2, NC_000022.10, NC_000023.10, NC_000024.9, NT_113961.1, NT_113923.1, NT_167208.1, NT_167209.1, NT_167210.1, NT_167211.1, NT_167212.1, NT_113889.1, NT_167213.1, NT_167214.1, NT_167215.1, NT_167216.1, NT_167217.1, NT_167218.1, NT_167219.1, NT_167220.1, NT_167221.1, NT_167222.1, NT_167223.1, NT_167224.1, NT_167225.1, NT_167226.1, NT_167227.1, NT_167228.1, NT_167229.1, NT_167230.1, NT_167231.1, NT_167232.1, NT_167233.1, NT_167234.1, NT_167235.1, NT_167236.1, NT_167237.1, NT_167238.1, NT_167239.1, NT_167240.1, NT_167241.1, NT_167242.1, NT_167243.1, NW_004070864.2, NW_003571030.1, NW_003871056.3, NW_003871055.3, NW_003315905.1, NW_003315906.1, NW_003315907.1, NW_004070863.1, NW_003871057.1, NW_004070865.1, NW_003315903.1, NW_003315904.1, NW_003315908.1, NW_004504299.1, NW_003571032.1, NW_003571033.2, NW_003315909.1, NW_003571031.1, NW_003871060.1, NW_003871059.1, NW_003315910.1, NW_004775426.1, NW_003315911.1, NW_003871058.1, NW_003315912.1, NW_003315913.1, NW_004775427.1, NW_003315915.1, NW_003315916.1, NW_003571035.1, NW_003315914.1, NW_003571034.1, NW_003315920.1, NW_003571036.1, NW_003315917.2, NW_003315918.1, NW_003871061.1, NW_004775428.1, NW_003315919.1, NW_004070866.1, NW_003871063.1, NW_003315921.1, NW_004504300.1, NW_003871062.1, NW_004775429.1, NW_004166862.1, NW_003571039.1, NW_003571038.1, NW_004775430.1, NW_003871064.1, NW_003571041.1, NW_003571037.1, NW_003871065.1, NW_003315922.2, NW_003571040.1, NW_003571042.1, NW_004775431.1, NW_003871066.2, NW_003315923.1, NW_003315924.1, NW_003315928.1, NW_003871067.1, NW_003315929.1, NW_003315930.1, NW_003315931.1, NW_004504301.1, NW_004070869.1, NW_003315925.1, NW_004070867.1, NW_004070868.1, NW_003315926.1, NW_003315927.1, NW_003571043.1, NW_003871071.1, NW_003315932.1, NW_003315934.1, NW_003315935.1, NW_003871068.1, NW_004504302.1, NW_003871070.1, NW_004775432.1, NW_003871069.1, NW_003315933.1, NW_004070870.1, NW_003871075.1, NW_003871082.1, NW_003315936.1, NW_003571045.1, NW_003871073.1, NW_003871074.1, NW_003571046.1, NW_004070871.1, NW_003871081.1, NW_003871079.1, NW_003871077.1, NW_003871080.1, NW_003871078.1, NW_003871072.2, NW_003871076.1, NW_003571048.1, NW_003571049.1, NW_003871083.2, NW_003571047.1, NW_003571050.1, NW_003315938.1, NW_003315939.1, NW_003315941.1, NW_003315942.2, NW_004504303.2, NW_003315940.1, NW_003315937.1, NW_003571051.1, NW_004166863.1, NW_003315943.1, NW_003315944.1, NW_003871084.1, NW_003315945.1, NW_003871085.1, NW_003315946.1, NW_004070872.2, NW_003315952.2, NW_003315951.1, NW_003315950.2, NW_004775433.1, NW_003871090.1, NW_004166864.2, NW_003315949.1, NW_003315948.2, NW_003871091.1, NW_003871093.1, NW_003871092.1, NW_003315953.1, NW_003571052.1, NW_003871086.1, NW_003315947.1, NW_003871088.1, NW_003315954.1, NW_003315955.1, NW_003871089.1, NW_003871087.1, NW_003315956.1, NW_003315959.1, NW_003315960.1, NW_003315957.1, NW_003315958.1, NW_003315961.1, NW_003871094.1, NW_003571053.2, NW_003315962.1, NW_003315964.2, NW_003315965.1, NW_003315963.1, NW_004775434.1, NW_004166865.1, NW_003571054.1, NW_003571055.1, NW_003571056.1, NW_003571057.1, NW_003571058.1, NW_003571059.1, NW_003571060.1, NW_003571061.1, NW_003315966.1, NW_003871095.1, NW_004504304.1, NW_003571063.2, NW_003315967.1, NW_003315968.1, NW_003315969.1, NW_003315970.1, NW_004775435.1, NW_004070874.1, NW_004070873.1, NW_004070875.1, NW_003871096.1, NW_003315972.1, NW_003315971.2, NW_004504305.1, NW_004070876.1, NW_003571064.2, NW_003871098.1, NW_003871099.1, NW_004070879.1, NW_004166866.1, NW_004070880.2, NW_004070877.1, NW_004070881.1, NW_004070882.1, NW_003871100.1, NW_003871101.3, NW_004070883.1, NW_004070884.1, NW_004070885.1, NW_003871102.1, NW_004070878.1, NW_004070891.1, NW_004070892.1, NW_004070893.1, NW_004070886.1, NW_004070887.1, NW_004070888.1, NW_004070889.1, NW_004070890.2, NW_003871103.3, NT_167244.1, NT_113891.2, NT_167245.1, NT_167246.1, NT_167247.1, NT_167248.1, NT_167249.1, NT_167250.1, NT_167251.1, NC_012920.1]
  features contigs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, Y]

After running the GATK command the first time, I saw that it needed an additional index reference.dict file. To create the file, I run gatk CreateSequenceDictionary -R reference.fasta (as recommended on this page gatk.broadinstitute.org/hc/en-us/articles/360035531652-FASTA-Reference-genome-format) on the same reference file that was used for all previous analysis steps.
Previously, the reference file was only processed by the bwa index reference.fasta command. I used the same reference.fasta file for the entire pipeline.

The reference files look fine to me; I assume the error arises due to the chromosome labels (features contigs) in the gnomAD.vcf file used as --known-sites in the command:

gatk BaseRecalibrator -I sample.sorted.bam -R reference.fasta --known-sites gnomad.genomes.r2.1.1.sites.vcf --known-sites gnomad.exomes.r2.1.1.sites.vcf -O recal_data.table

Am i supposed to edit these input files to match the contigs labels? Do you recommend using other population vcf files? Any other idea on how to fix this issue?

Any help would be appreciated.



Source link