Dear all,
I need some advice on controlling for the variable "days since disease onset" for differential expression analysis using limma. I have the following sample groups which I want to compare:
- Group 0 - control patients, disease-free
- Group 1 - disease patients with mild symptoms
- Group 2 - disease patients with moderate symptoms
- Group 3 - disease patients with severe symptoms
I plan to do a general "disease vs healthy" analysis and then look at individual comparisons, i.e 1 vs 0, 2 vs 0, 2 vs1, etc....
I have age and gender for all samples, which is simple to add to the limma model design as covariates. However, I also have "days since disease onset", which is the number of days since the onset of disease, and this day is when a sample was taken to be analysed. This, unfortunately, is significantly different between group1 and the rest (group 2 and 3). This is the summary of the "days since disease onset" variable.
- Group 0 - NA
- Group 1 - samples were taken on average 20 (95% CI 17-24) days after the first symptoms
- Group 2 - samples were taken on average 12 (95% CI 10-13) days after the first symptoms
- Group 3 - samples were taken on average 11 (95% CI 9-13) days after the first symptoms
There is also a correlation of "days since disease onset" and genes within group 1, group 2 and group 3.
When comparing the control group to the disease groups as individuals and as a whole (case vs control, where groups 1, 2, 3 are merged and treated as one group), how do I control for the variation in "days since disease onset"? Is a simple case of assigning all the controls to 0 for "days since disease onset" variable and then using "days since disease onset" in the model design? Is this the same when comparing individual disease groups to control, i.e group 0 vs group 1?
When comparing group 1 to group 2, I can put "days since disease onset" as a covariate in the limma model design, to account for variation within groups, however, as they are significantly different between groups, does this mean the DE results may be a reflection gene expression change over the time of disease rather than symptom severity?
Thanks in advance!