Hi Everyone,

I'm a complete novice to DEG analysis and linear models and I have some questions regarding the setup of the design matrix. I have read up some posts in this forum with similar experimental design, but they don't really have the answers I'm looking for. My experiment was designed as follows:

1) Two genotype groups (Genotype: WT vs. KO)

2) Two treatments conditions for each genotype group (Condition: Ctrl vs. Trt)

3) Equal number of both sex in each genotype group under each treatment condition (Sex: F vs. M)

When we first designed this experiment, sex was not a factor we considered and the main purpose was just to see whether the expression profiles of the two genotypes differ at steady-state (ctrl) and after stimulation (trt). We included equal number of both sex in each group just in case of sex bias. However, when we did PCA analysis, we actually saw some differences between sex in each genotype, and this difference is further increased after treatment.

Now, the questions we would like to answer are:

1) If we just want to see how genotype and treatment interact (E.g.: Ctrl WT vs Trt WT || Ctrl KO vs Trt KO || Ctrl WT vs Ctrl KO || Trt WT vs Trt KO), should I use a design=~Genotype+Condition+Genotype:Condition and follow the comparison setups here

or now knowing there are variations in sex, use a design=~Sex+Genotype+Condition+Genotype:Condition (to take care of differences in sex) and still follow the same comparison setups as indicated in the link above?

2) If we also want to see how gene expression differs between sex within a genotype group and between two genotype groups under each treatment condition (e.g. F vs M in Ctrl WT || F vs M in Ctrl KO || F vs M in Trt WT || F vs M in Trt KO || F Ctrl WT vs F Ctrl KO || M Ctrl WT vs M Ctrl KO || F Trt WT vs F Trt KO || M Trt WT vs M Trt KO), how should I set up the design matrix? I have very limited knowledge on how interaction terms work and I'm not sure what I should do in order to get all those comparisons. I would really appreciate it if someone can provide some advise.

Also, I've read in some other posts that for complex design such as this, maybe it's better to name each sample using all three factors (e.g. F_Ctrl_WT, F_Trt_WT etc.) and just use the "contrast" command to call out the groups I'm interested in comparing. Will this work? How is this different than using the "~A+B+C+A:C+B:C" type of setup?

Thanks so much for your help!


