Hi, I am new to RNA-seq data analysis.
Would you mind giving me any advice?
I am planning to construct a cell_evaluation model using 3'end RNA-seq(quantSeq) not using RT-qPCR described below;
Step 1 (preparation of cell_type prediction model)
- RNAseq (quantSeq) library preparation of clone A and clone B (six biological replicates vs six biological replicates)
- Obtain around 10M read per sample by single-end reading with rapid mode on HiSeq2500
- Mapping to hg38 with STAR and measure raw count data with HTseq
- Eliminate the batch effect (between A and B) with ComBat-seq if necessary
- Normalise raw count data by the TMM method of edgeR (exact)
- Differentially expressed genes (FDR < 0.05) estimation by the edgeR (exact)
- Eliminate raw count DGE (raw count < 20)
- Prepare data_frame with CPM of each DGE vs cell_type evaluation (A = 1, B = 0) on R
exp: Xy = data.frame(list[geneA:geneX], list[cell_type]])
- The best model for predicting the clone (A or B) is obtained from the bestglm function of R; bestglm(Xy, family=binomial(link="logit"), IC="AIC") . . .(*)
--> Logistic multiple regression equation is obtained
Step2 (evaluation of the cell_type prediction model)
- Prepare randomized sample from another cohort
- Raw count data obtained and converted to CPM
- Each DEG's CPM was put in the equation and obtain result
- ROC analysis
Do I need technical replicates?
Are there any considerable confounds or batch effects?