How to cat 2 paired end reads files together in one file using loops for multiple libraries ?


Hello let's say I have a directory that have multiple paired end reads and want to create a file containing the corresponding pairs, so this is something like this:

cat R1.fastq,gz R2.fastq.gz > R.fastq.gz

but given the fact that I have multiple libraries, doing this manually could take a lot of time, so how can I automatize this task to all my libraries using for loop ?

I tried with the below code but it stores all the multiple files in each output file:

for i in *_1.fastq.gz
base=$(basename $i "_1.fastq.gz")
cat $R1 $R2 > ${base}.fastq.gz

Thanks for reading ūüôā

PD: IMPORTANT: this way of concatenating reads is needed for MASH program, for other programs (e.g some assembly programs) the best thing to save this files is using interleave formats.





‚ÄĘ 70 views

for i in *_1.fastq.gz
  base=(basename $i "_1.fastq.gz")
  cat ${base}_1.fastq.gz ${base}_2.fastq.gz > ${base}.fastq.gz

Does that make sense to you?

You should not be concatenating paired-end files in an end-to-end fashion this way. Tools are not going to be able to understand these files and you will likely end up with erroneous results. You could interleave the reads if you want to create a single file per sample. That can be achieved using BBMap suite. Not all tools understand interleaved data files. So keep that in mind. in1=sample_R1.fq.gz in2=sample_R2.fq.gz out=sample.fq.gz 

before adding your answer.

Traffic: 2240 users visited in the last hour

Source link