gravatar for domelevo

3 hours ago by

Dear all,

I got some Illumina data in from an external provider. It seems it's from a NovaSeq (instrument ID starts with @A00, and according to that post...)

So, it's paired-end RNA sequencing data (2x151bp), prepared with the TruSeq Stranded mRNA LT Sample Prep Kit following the protocol "TruSeq Stranded mRNA Sample Preparation Guide, Part #15031047 Rev. E". I'm surprised to see the quality scores look like that:
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:F:FFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFF:FFFF:FFFFFFFFFFFF

or like that:
FFFF:,FFFFFFFFFF:FFFFFF:FFFFF,:FF::F,:FFFFFFFFFFFF,FFF:F:FF:FF,FFFFFFFFF,FFFFF:F,F:FF,,FFFFFF,FF:F:F,FF:FFFF:F:F
FFFFF,F,FFF:FFFF:FFF:FFFFFFFFFFF:FF::FF

or like that:
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FF::F:FFFF:FF:FFF,F:FFF:::F,:F:F:FFFFF,FFFFFF:FF,FFFFFFF:FF,FFFFFF,::F,:F:::F,FFF,FFF:FFF,FFF::F,:FFFFFFFF,,:F,FFF

Of course 'F' is a very good score (Q37), but ':' is only Q25, and ',' is even worse (Q11). Have you guys observed such very short basecalling quality dips in otherwise good-looking RNA-based data?



Source link