cwl GATK GenotypeGVCFs error in linux due to quotes in filenames

0

Hi noticed that everytime I try to run .cwl scripts that include GATK GenotypeGVCFs the runner encounter an error that is related to how the previous step creates filenames in genomic DB (from GATK GenomicsDBImport called through .cwl too):

Invalid filename: '8$1$146364022' contains illegal characters

and actually investigating the genomic DB directory (GenomicsDBImport output) it actually creates filenames for each chromosome directory within quotes that then raise the error above:

[email protected]:/media/kong/enrico/MCD/cwl-run-DIR$ ls -thal MCD_n15/
total 136K
drwxrwsr-x  3 enrico lab 4.0K Dec  5 10:47  ..
drwx------  4 enrico lab 4.0K Dec  5 10:35 'X$1$155270560'
drwx------ 25 enrico lab 4.0K Dec  5 10:29  .
drwx------  4 enrico lab 4.0K Dec  5 10:29 '22$1$51304566'
drwx------  4 enrico lab 4.0K Dec  5 10:25 '21$1$48129895'
drwx------  4 enrico lab 4.0K Dec  5 10:22 '20$1$63025520'
drwx------  4 enrico lab 4.0K Dec  5 10:17 '19$1$59128983'
drwx------  4 enrico lab 4.0K Dec  5 10:10 '18$1$78077248'
drwx------  4 enrico lab 4.0K Dec  5 10:04 '17$1$81195210'
drwx------  4 enrico lab 4.0K Dec  5 09:56 '16$1$90354753'
drwx------  4 enrico lab 4.0K Dec  5 09:49 '15$1$102531392'
drwx------  4 enrico lab 4.0K Dec  5 09:42 '14$1$107349540'
drwx------  4 enrico lab 4.0K Dec  5 09:34 '13$1$115169878'
drwx------  4 enrico lab 4.0K Dec  5 09:28 '12$1$133851895'
drwx------  4 enrico lab 4.0K Dec  5 09:17 '11$1$135006516'
drwx------  4 enrico lab 4.0K Dec  5 09:06 '10$1$135534747'
drwx------  4 enrico lab 4.0K Dec  5 08:55 '9$1$141213431'
drwx------  4 enrico lab 4.0K Dec  5 08:45 '8$1$146364022'
drwx------  4 enrico lab 4.0K Dec  5 08:35 '7$1$159138663'
drwx------  4 enrico lab 4.0K Dec  5 08:22 '6$1$171115067'
drwx------  4 enrico lab 4.0K Dec  5 08:09 '5$1$180915260'
drwx------  4 enrico lab 4.0K Dec  5 07:56 '4$1$191154276'
drwx------  4 enrico lab 4.0K Dec  5 07:43 '3$1$198022430'
drwx------  4 enrico lab 4.0K Dec  5 07:28 '2$1$243199373'
drwx------  4 enrico lab 4.0K Dec  5 07:09 '1$1$249250621'
-rwx------  1 enrico lab 8.4K Dec  5 06:49  vidmap.json
-rwx------  1 enrico lab  18K Dec  5 06:49  vcfheader.vcf
-rwx------  1 enrico lab 1.4K Dec  5 06:49  callset.json
-rwx------  1 enrico lab    0 Dec  5 06:49  __tiledb_workspace.tdb

This happens every single time I have a GenomicsDBImport output in my Linux Ubuntu 18.04.5

Does anybody worked this around? I know I can call it from GATK outside .cwl but for pipeline purposes I'd like to be able to pass this DB through .cwl too.

Thank you very much in advance for any help! Below my cwl script:

#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: CommandLineTool
label: gatk GenomicsDBImport on GATK docker images

hints:
  DockerRequirement:
    dockerPull: broadinstitute/gatk:latest
  ResourceRequirement:
    coresMin: $(inputs.GenomicsDBImport_coresMin)
    ramMin: $(inputs.GenomicsDBImport_ramMin)

requirements:
  InlineJavascriptRequirement: {}

baseCommand: gatk
arguments: [ "GenomicsDBImport" ]

inputs:
  - id: interval_list
    type: File
    inputBinding:
      position: 1
      prefix: '-L'
  - id: cohort_name
    type: string
    inputBinding:
      position: 2
      prefix: '--genomicsdb-workspace-path'
  - id: gvcf_files
    type:
      - type: array
        items: File
        inputBinding:
          position: 0
          prefix: '-V'
          separate: true
    secondaryFiles:
      - .tbi

outputs:
  GenomicsDBImport_directory:
    type: Directory
    outputBinding:
      glob: $(inputs.cohort_name)


gatk


cwl


GenotypeGVCFs

• 220 views



Source link