For my pipeline to pre-process RNA-Seq reads prior to reference genome mapping, I have assessed contaminant levels for various sequences using FastQ_Screen (from Babraham Bioinformatics, that brought us the widely used FastQC).
I have pasted below FastQ_Screen results for
RAW READS (before any of the pre-processing steps)
and
FINAL PROCESSED READS (after all of my pre-processing steps have been completed)
Based on those 2 data tables below, could you please comment on whether:
- contaminant levels in final processed reads are low enough to use for mapping to ref. genome?
- Is it safe to assume that persistent low contaminant levels, for cat, dog, mouse, human will just contribute to un-mapped category during mapping to my plant target genome, rather than results in inaccurate mapping?
- contaminant levels in raw reads were originally low enough, indicating this library was a decent sample to start off with?
- the differences between raw reads and full processed reads suggest over-processing?
- any other observation pops up that I have not considered even inquiring about...
Thank you!
FASTQ_Screen for RAW READS
Genome / Reference #Reads_processed #Unmapped %Unmapped #One_hit_one_genome %One_hit_one_genome #Multiple_hits_one_genome %Multiple_hits_one_genome #One_hit_multiple_genomes %One_hit_multiple_genomes Multiple_hits_multiple_genomes %Multiple_hits_multiple_genomes
adapters 14198683 14189047 99.94 18 0 1 0 1833 0.01 7784 0.05
PhiX 14198683 14198683 100 0 0 0 0 0 0 0 0
lambda 14198683 14198683 100 0 0 0 0 0 0 0 0
UniVec 14198683 14183620 99.89 27 0 38 0 1496 0.01 13502 0.1
Bacterial_masked 14198683 14101866 99.32 200 0 62423 0.44 541 0 33653 0.24
Bact_Symbiont 14198683 14175648 99.84 2 0 110 0 18 0 22905 0.16
Mitoch 14198683 14127750 99.5 0 0 0 0 68693 0.48 2240 0.02
rRNA 14198683 12192293 85.87 0 0 0 0 380549 2.68 1625841 11.45
Target_Ref_genome 14198683 277861 1.96 8511350 59.94 3272369 23.05 50369 0.35 2086734 14.7
Cat_masked 14198683 14071938 99.12 484 0 126 0 74413 0.52 51722 0.36
Dog_masked 14198683 14085865 99.21 697 0 209 0 76317 0.54 35595 0.25
Mouse_masked 14198683 13967382 98.38 450 0 155 0 90013 0.63 140683 0.99
Human_masked 14198683 14121230 99.46 377 0 75 0 48239 0.34 28762 0.2
FASTQ_Screen for FINAL PROCESSED READS
Genome / Reference #Reads_processed #Unmapped %Unmapped #One_hit_one_genome %One_hit_one_genome #Multiple_hits_one_genome %Multiple_hits_one_genome #One_hit_multiple_genomes %One_hit_multiple_genomes Multiple_hits_multiple_genomes %Multiple_hits_multiple_genomes
adapters 11269161 11269161 100 0 0 0 0 0 0 0 0
PhiX 11269161 11269161 100 0 0 0 0 0 0 0 0
lambda 11269161 11269161 100 0 0 0 0 0 0 0 0
UniVec 11269161 11268923 100 0 0 0 0 139 0 99 0
Bacterial_masked 11269161 11252080 99.85 58 0 13305 0.12 262 0 3456 0.03
Bact_Symbiont 11269161 11266803 99.98 1 0 0 0 23 0 2334 0.02
Mitoch 11269161 11230197 99.65 0 0 0 0 38149 0.34 815 0.01
rRNA 11269161 11263548 99.95 0 0 0 0 1047 0.01 4566 0.04
Target_Ref_genome 11269161 101482 0.9 7978575 70.8 3115013 27.64 23426 0.21 50665 0.45
Cat_masked 11269161 11253212 99.86 8 0 9 0 4986 0.04 10946 0.1
Dog_masked 11269161 11251633 99.85 23 0 20 0 6081 0.05 11404 0.1
Mouse_masked 11269161 11251045 99.84 20 0 11 0 5271 0.05 12814 0.11
Human_masked 11269161 11256012 99.88 14 0 3 0 4471 0.04 8661 0.08