gravatar for Rogerio Ribeiro

2 hours ago by

Hi biostars.

I'm trying to estimate the insert size of a pair-end illumina library. I have limited information about library preparation and sequencing protocol (2x125). To estimate the insert size I used mapped the reads to a genome using bbmap.
The reads were processed using trimmomatic (to remove adaptors and only keep reads with size = 125).

java -jar trimmomatic-0.39.jar PE -threads 10 -phred33 R1_001.fastq.gz R2_001.fastq.gz R1.adaptor.cut.fastq.gz R1.unpaired.fastq.gzip R2.adaptor.cut.fastq.gz R2.unpaired.fastq.gzip NexteraPE-PE.fa:2:30:10 MINLEN:125 ref=cro_genome.fasta in1=R1.pair.fastq.gz in2=R2.pair.fastq.gz -ihist=ihist.txt reads=5000000

From the result file:

Mean 427.278

Median 295

Mode 234

STDev 536.271

The mean value seems to correspond to expected values. However, I find the std deviation value a little bit too high.
It this the typical standard deviation in a pair-end library?

Source link