Hi all,I wrote this script to loop over two sets of fastq files I have (sample1_R1 and sample1_R2 sample2_R1 sample2_R2) I want cutadapt to do its thing on sample1 as paired end and sample2 as paired end.

Originally I had issues with this loop iterating over the trimmed output but I changed "${name}_trimmed.R2.fastq.gz" for -p, and -o, to "${name}_trimmed.fastq.gz" and this took care of that issue. My for loop can be found in the code below under COMMAND EXECUTED.

Now, the problem is this:

  1. This change I made to the output file names, writes the correct names for the output files in my folder. so I have sample1_R1_trimmed.fastq.gz and the same sample1_R2_trimmed.fastq.gz. Good so far, although there is something I noticed in the output summary file( when the script is executed I have job_ID.o and job_ID.e files, the summary report I am referring to is the Job_ID.o file ) , which if not redirected to a text file would spit out as "JobID.o" file.

Below is the first part of my Job_ID.o report:

Notice the command line parameters. It is taking in all R1 at a time and similarly taking all R2s at a time( not posted the R2 part). So basically the script is scmehowtelling Cutadapt to treat the files as single end, even though I have the -o and -p option which is issue #1.

Second issue is, despite this, the output files (sample1_R1_trimmed.fastq.gz and sample1_R2_trimmed.fastq.gz) that are written in my folder have the correct R1 and R2 names! Although I don't believe the the read trimmed is complteley good.

Why is my script treating this as SE? It could be I gave ${name}.fastq.gz twice and hence treating it as SE? I need to give it two files and I can't take delete one of these. So is there a way to indicate to use R1 and R2 at this point in code? I tried putting ${name}_R1.fastq.gz and that did not process any files at all! Am I losing my mind here? or not getting this Cutadapt thing right? After all this might be a Cutadapt thing and not a scripting thing?

So can I do to see if it is indeed Cutadapt problem or my code! you guessed it right! It is my code 🙂 If you scroll down below my code here, you can see I ran Cutadapt directly on command line without a for loop and compared the summary of trimming to the output from the cold below on both sample 1 and sample2.

 ---- COMMAND EXECUTED: ---------------------------------------------------------
    (   for i in *.fastq.gz;  do name=(`basename "$i" .fastq.gz`); cutadapt -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCA -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT -o ${name}_trimmed.fastq.gz -p ${name}_trimmed.fastq.gz ${name}.fastq.gz ${name}.fastq.gz; done )
    --------------------------------------------------------------------------------
    This is cutadapt 2.9 with Python 3.7.7
    Command line parameters: -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCA -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT -o sample1_R1_trimmed.fastq.gz -p sample1_R1_trimmed.fastq.gz sample1_R1.fastq.gz sample1_R1.fastq.gz
    Processing reads on 1 core in paired-end mode ...
    Finished in 82.19 s (115 us/read; 0.52 M reads/minute).

Here is the Cutadapt directly on commadline for both sample1 and sample2 as PE, to see if the numbers made sense .

for sample1

total read pairs processed:            716,612
  Read 1 with adapter:                  **33,407** (4.7%)
  Read 2 with adapter:                  **33,450** (4.7%)

for sample 1: Below is summary for Cutadapt launched through my script ( command in the script given above).

Total read pairs processed:            716,612
  Read 1 with adapter:                  **33,407** (4.7%)
  Read 2 with adapter:                  29,236 (4.1%)

for sample 1: For R2:

Total read pairs processed:            716,612
  Read 1 with adapter:                  29,435 (4.1%)
  Read 2 with adapter:                  **33,450** (4.7%)

Similar pattern for sample2. (not attached here)
See? Too many funky things going on! I am not writing this script correctly and can't figure out what the problem is? I tried many modification to this code later to see if I can fix something. But honestly it is all a blind hunt! I want to understand my problem in my code here. Any help from anyone to correct this is greatly appreciated!

ADD REPLY • link • edit • moderate written 1 hour ago by geneart$$ • 10



Source link