gravatar for Anand Rao

2 hours ago by

United States

For my pipeline to pre-process RNA-Seq reads prior to reference genome mapping, I have assessed contaminant levels for various sequences using FastQ_Screen (from Babraham Bioinformatics, that brought us the widely used FastQC).

I have pasted below FastQ_Screen results for
RAW READS (before any of the pre-processing steps)
and
FINAL PROCESSED READS (after all of my pre-processing steps have been completed)

Based on those 2 data tables below, could you please comment on whether:

  1. contaminant levels in final processed reads are low enough to use for mapping to ref. genome?
  2. Is it safe to assume that persistent low contaminant levels, for cat, dog, mouse, human will just contribute to un-mapped category during mapping to my plant target genome, rather than results in inaccurate mapping?
  3. contaminant levels in raw reads were originally low enough, indicating this library was a decent sample to start off with?
  4. the differences between raw reads and full processed reads suggest over-processing?
  5. any other observation pops up that I have not considered even inquiring about...

Thank you!

FASTQ_Screen for RAW READS

Genome / Reference  #Reads_processed    #Unmapped   %Unmapped   #One_hit_one_genome %One_hit_one_genome #Multiple_hits_one_genome   %Multiple_hits_one_genome   #One_hit_multiple_genomes   %One_hit_multiple_genomes   Multiple_hits_multiple_genomes  %Multiple_hits_multiple_genomes
adapters    14198683    14189047    99.94   18  0   1   0   1833    0.01    7784    0.05
PhiX    14198683    14198683    100 0   0   0   0   0   0   0   0
lambda  14198683    14198683    100 0   0   0   0   0   0   0   0
UniVec  14198683    14183620    99.89   27  0   38  0   1496    0.01    13502   0.1
Bacterial_masked    14198683    14101866    99.32   200 0   62423   0.44    541 0   33653   0.24
Bact_Symbiont   14198683    14175648    99.84   2   0   110 0   18  0   22905   0.16
Mitoch  14198683    14127750    99.5    0   0   0   0   68693   0.48    2240    0.02
rRNA    14198683    12192293    85.87   0   0   0   0   380549  2.68    1625841 11.45
Target_Ref_genome   14198683    277861  1.96    8511350 59.94   3272369 23.05   50369   0.35    2086734 14.7
Cat_masked  14198683    14071938    99.12   484 0   126 0   74413   0.52    51722   0.36
Dog_masked  14198683    14085865    99.21   697 0   209 0   76317   0.54    35595   0.25
Mouse_masked    14198683    13967382    98.38   450 0   155 0   90013   0.63    140683  0.99
Human_masked    14198683    14121230    99.46   377 0   75  0   48239   0.34    28762   0.2

FASTQ_Screen for FINAL PROCESSED READS

Genome / Reference  #Reads_processed    #Unmapped   %Unmapped   #One_hit_one_genome %One_hit_one_genome #Multiple_hits_one_genome   %Multiple_hits_one_genome   #One_hit_multiple_genomes   %One_hit_multiple_genomes   Multiple_hits_multiple_genomes  %Multiple_hits_multiple_genomes
adapters    11269161    11269161    100 0   0   0   0   0   0   0   0
PhiX    11269161    11269161    100 0   0   0   0   0   0   0   0
lambda  11269161    11269161    100 0   0   0   0   0   0   0   0
UniVec  11269161    11268923    100 0   0   0   0   139 0   99  0
Bacterial_masked    11269161    11252080    99.85   58  0   13305   0.12    262 0   3456    0.03
Bact_Symbiont   11269161    11266803    99.98   1   0   0   0   23  0   2334    0.02
Mitoch  11269161    11230197    99.65   0   0   0   0   38149   0.34    815 0.01
rRNA    11269161    11263548    99.95   0   0   0   0   1047    0.01    4566    0.04
Target_Ref_genome   11269161    101482  0.9 7978575 70.8    3115013 27.64   23426   0.21    50665   0.45
Cat_masked  11269161    11253212    99.86   8   0   9   0   4986    0.04    10946   0.1
Dog_masked  11269161    11251633    99.85   23  0   20  0   6081    0.05    11404   0.1
Mouse_masked    11269161    11251045    99.84   20  0   11  0   5271    0.05    12814   0.11
Human_masked    11269161    11256012    99.88   14  0   3   0   4471    0.04    8661    0.08



Source link