I am looking for suggestions on how to analyze my bulk-RNAseq data.
Briefly, I have 87 samples of sorted CD4 and CD8 T cells, two timepoints (pre-chemotherapy and post-chemotherapy), two sites (bone marrow (BM) and peripheral blood (PB)), three groups (non-responders (NR) to therapy, responders (CR) to therapy, healthy donors(HD)). So for patient 001 (assume he/she is a complete responder) I'll have: 001_CD8_pre_BM_CR, 001_CD8_post_BM_CR, 001_CD4_pre_BM_CR, 001_CD4_post_BM_CR, 001_CD8_pre_PB_CR, 001_CD8_post_PB_CR, 001_CD4_pre_PB_CR, 001_CD4_post_PB_CR (8 samples per patient, 4 samples per HD since they have only one timepoint).
The main question we want to answer is what are the DEG across the groups (NR, CR, HD), then if this difference is present pre- and/or after chemo.
My approach would be to split my dataset in 8 subsets (CD8_pre_BM, CD8_post_BM, CD4_pre_BM, CD4_post_BM, CD8_pre_PB, CD8_post_PB, CD4_pre_PB, CD4_post_PB) and then use a Wald to test CR vs NR vs HD in each of them. Another approach could be to do not split according to the timepoint and then make a design as ~ group + timepoint.
Do you think it's appropriate? Other ideas?
Thank you very much