Hi biostars.
I'm trying to estimate the insert size of a pair-end illumina library. I have limited information about library preparation and sequencing protocol (2x125). To estimate the insert size I used mapped the reads to a genome using bbmap.
The reads were processed using trimmomatic (to remove adaptors and only keep reads with size = 125).
java -jar trimmomatic-0.39.jar PE -threads 10 -phred33 R1_001.fastq.gz R2_001.fastq.gz R1.adaptor.cut.fastq.gz R1.unpaired.fastq.gzip R2.adaptor.cut.fastq.gz R2.unpaired.fastq.gzip NexteraPE-PE.fa:2:30:10 MINLEN:125
bbmap.sh ref=cro_genome.fasta in1=R1.pair.fastq.gz in2=R2.pair.fastq.gz -ihist=ihist.txt reads=5000000
From the result file:
Mean 427.278
Median 295
Mode 234
STDev 536.271
The mean value seems to correspond to expected values. However, I find the std deviation value a little bit too high.
It this the typical standard deviation in a pair-end library?