gravatar for RNAseqer

2 hours ago by

Hello everyone, I have a question as to EdgeR's differential expression analysis and the use of log2 transformation as part of the normalization process. I want to test for DE using TMM normalized log2 transformed counts data (log2 transformation serving as a means to reduce skew and the effects of outliers). I know it is possible to output normalized, log2 transformed values by using cpm() on the output of EdgeR's cpm() function:

#Use counts and phenotype data to create a dgelist object:

group <- phenotype_data_matrix$test_group
df.counts <- as.data.frame(counts_matrix)
dgelist <- DGEList(counts=df.counts, group=group)

# conduct normalization. The normalization factors will be stored in the object with an unaltered copy of the counts matrix:

calc.normfact.out <- calcNormFactors(dgelist, method = "TMM")

#extract normalized, log2 transformed  expression data using cpm() note: a default prior count of .25, scaled to effective library size will be added by EdgeR to prevent taking log of 0 counts:

cpm.tmm.log2 <- cpm(calc.normfact.out, log=TRUE, normalized.lib.sizes = TRUE)

My concern is this: does EdgeR use log2 transformation as part of the normalization prior to estimating differential expression? If it is not a default, is it possible to secify this as part of the normalization process, or would such a transformation of counts into log2 transformed values actually violate the assumptions EdgeR makes about the RNA-seq data's distribution? Below is a copy of the code I'm using to test for DE between group A and group B,

#additional covariates you want in your glm:

batch <- phenotype_data_matrix$Batch
sex <- phenotype_data_matrix$sex

#set up model with additional relevant covariates:

design <- model.matrix(~group + batch + sex)

# carry out DE analysis to identify DE genes between the two categories in 'group' (this is specified by 'coef=2') adjusting for Batch and Sex using glm():

dge.estDisp.out <- estimateDisp(calc.normfact.out, design)
fit <- glmFit(dge.estDisp.out, design)
lrt <- glmLRT(fit, coef = 2)

I have included my code here so that A )in case there is there is a simple way to modify it to use log2 transformed-normalized values provided that is not the default and B) Allow you to point out any gross mistakes or misconceptions on my part. Many thanks for any help you can provide!

link

modified 2 hours ago

written
2 hours ago
by

RNAseqer 140



Source link