gravatar for dazhudou1122

3 hours ago by

Hi Everyone,

I have some fastq data downloaded from Illumina sequence hub. In these data, the sequences from one samples were split into four .gz files (i dont know why illumina does that). All the files together is about 18G. I first try to zcat every four files into one, but the file size inflated significantly from 18G to 78G:

for i in $(ls *.fastq.gz | rev | cut -c 22- | rev | uniq); 
do zcat ${i}_L001_R1_001.fastq.gz ${i}_L002_R1_001.fastq.gz ${i}_L003_R1_001.fastq.gz ${i}_L004_R1_001.fastq.gz > ./zcat_fastq/${i}.fastq.gz ;
done

I then did the dumb way, guzip all files, cat them together and then gzip them all, but now the files size is about 15Gb.

gunzip *.gz

for i in $(ls *.fastq | rev | cut -c 19- | rev | uniq); 
do cat ${i}_L001_R1_001.fastq ${i}_L002_R1_001.fastq ${i}_L003_R1_001.fastq ${i}_L004_R1_001.fastq > ./cat_fastq/${i}.fastq ;
done

gzip *.fastq

Can anyone please advice what is going on and which method is correct? Thank you!

Best,

Wenhan

link

modified 3 hours ago

by

Ram32k

written
3 hours ago
by

dazhudou1122110



Source link