I was doing a differential gene expression analysis on a microarray dataset with three controls and 7 case samples. I used the limma package in RStudio to do the analysis. Wrote the following code based on Bioconductor vignettes and online tutorials:

library(limma)

edata<-exprs(group) #EXPRESSION MATRIX

mod<-model.matrix(~group_pheno$status)  #group_pheno contains data on the case/control phenotype of the samples

fit<-lmFit(edata,mod)

ebay<-eBayes(fit)

res<-topTable(ebay,sort.by = "none",number = dim(edata)[1])

I got around 91 differentially expressed genes from the above analysis (adj.P.val < 0.05).

However, when I tried to do the same analysis in the GEO2R option in the GEO database, I got no DEGs at the significance level adj.P.Val < 0.05. The lowest reported adj.P.value was around 0.079. I checked the Rscript given in the webpage for this analysis (given below)

# load series and platform data from GEO

gset <- getGEO("GSE34526", GSEMatrix =TRUE, AnnotGPL=TRUE)
if (length(gset) > 1) idx <- grep("GPL570", attr(gset, "names")) else idx <- 1
gset <- gset[[idx]]

# make proper column names to match toptable 
fvarLabels(gset) <- make.names(fvarLabels(gset))

# group names for all samples
gsms <- "0001111111"
sml <- c()
for (i in 1:nchar(gsms)) { sml[i] <- substr(gsms,i,i) }

# log2 transform
ex <- exprs(gset)
qx <- as.numeric(quantile(ex, c(0., 0.25, 0.5, 0.75, 0.99, 1.0), na.rm=T))
LogC <- (qx[5] > 100) ||
          (qx[6]-qx[1] > 50 && qx[2] > 0) ||
          (qx[2] > 0 && qx[2] < 1 && qx[4] > 1 && qx[4] < 2)
if (LogC) { ex[which(ex <= 0)] <- NaN
  exprs(gset) <- log2(ex) }

# set up the data and proceed with analysis
sml <- paste("G", sml, sep="")    # set group names
fl <- as.factor(sml)
gset$description <- fl
design <- model.matrix(~ description + 0, gset)
colnames(design) <- levels(fl)
fit <- lmFit(gset, design)
cont.matrix <- makeContrasts(G1-G0, levels=design)
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2, 0.01)
tT <- topTable(fit2, adjust="fdr", sort.by="B", number=250)

tT <- subset(tT, select=c("ID","adj.P.Val","P.Value","t","B","logFC","Gene.symbol","Gene.title"))
write.table(tT, file=stdout(), row.names=F, sep="t")

It seems that here they have used contrast matrix designs to calculate the DEGs. Can you please explain the correct approach to this problem and Where the mistake is in my code?

Thanks in Advance!



Source link