My experiment is the following:

  • Temperature 26C vs. Temperature 30C
  • Treatment Saline vs. Treatment BMC
  • Timepoints 0, 2, and 3.

My intention is to create GLM model to look at differential abundance between groups:

  1. 26C vs. 30C for Saline treatment
  2. 26C vs. 30C for BMC treatment
  3. Saline vs. BMC at 26C
  4. Saline vs. BMC at 30C

In doing so, I would like to include all the samples possible when calculating the dispersion which is why I'm using a GLM instead of a Fisher's Exact Test for subsets of samples. I would also like to incorporate the ordered time information.

# Functions
read_dataframe = function(path, sep="t") {
        df = read.table(path, sep=sep, row.names=1, header = TRUE, check.names=FALSE)
        return(df)
}

# Counts
X = read_dataframe("https://pastebin.com/raw/J7kmL8Ly")
# OG0000000 OG0000001   OG0000002   OG0000003   OG0000004
# T2_10_SALINE_TEMP-PE-D710-D505-1_S10  16909   55  5382    5894    1964
# T2_11_BMC_CONTROL-PE-D711-D505-1_S11  24296   60  2772    3962    1374
# T2_12_BMC_CONTROL-PE-D712-D505-1_S12  24619   60  7351    5389    560
# T2_13_BMC_CONTROL-PE-D701-D506-1_S13  22420   15  2172    2778    930
# T2_14_BMC_CONTROL-PE-D702-D506-1_S14  20049   82  4655    6211    553

# Metadata
df_metadata = read_dataframe("https://pastebin.com/raw/PANaC3r5")
#   temperature treatment   collection_time_numeric
# 1_T0_RNA-PE-D711D501-1_S143   26C NaN 0
# 2_T0_RNA-PE-D709D506-1_S134   26C NaN 0
# 4_T0_RNA-PE-D709D505-1_S133   26C NaN 0
# 5_T0_RNA-PE-D709D504-1_S132   26C NaN 

If the T0 timepoint is throwing everything off then it can be removed.

I'm trying to figure out how to do this from the following sources: www.bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf

However, my situation isn't described so please no RTFM responses. If I make a design matrix, it will basically be a binary vector for Treatment_BMC, a binary vector for Treatment_30C, and a numeric vector for the collection time.

If I use this as the design matrix when calculating the dispersion, then how would I for example do #1 above where I calculate the 26C vs. 30C for just the Saline treatment? Does this not make sense to calculate the dispersion for everything?

I could use some guidance a bit here.



Source link