I have a dataset consisting of 5 subjects, whose DNA methylation is collected at time 0 and 1 using the 850k Illumina chip.
I wanted to find the differentially methylated sites. For that I used the R limma library; I built a linear model and used moderated t and F statistics. Additionally I used multiple testing to remove the false positives.
However even with all this, I end up with 750 000 diferentially methylated sites, which seems too large to be true. I think I used the functions correctly but I want to be sure, especially whether the design matrix is right.
Here's the script I've been using
"Uses the limma library to find Diferentially Methylated Sites" library(limma) library(multtest) src = "http://www.biostars.org/./Data/Methylation M Values" t0File = "t0_M.csv" t1File = "t1_M.csv" t0File = paste(src, t0File, sep = "/") t1File = paste(src, t1File, sep = "/") t0 = read.csv(t0File) t1 = read.csv(t1File) Data = merge(t0, t1, by = "row.names", all = TRUE) rownames(Data) = Data$Row.names Data = Data[, -1] design = c(rep(0, 5), rep(1, 5)) #design = data.frame( t0 = c(rep(1, 5), rep(0, 5)), t1 = c(rep(0, 5), rep(1, 5))) linearFit = lmFit(Data, design) #contrasts = makeContrasts(contrasts = "t0/t1", levels = design) #linearFit = contrasts.fit(linearFit, contrasts) BayesFit = eBayes(linearFit, proportion = 0.05) result = decideTests(BayesFit, p.value = 0.05) type = c("BH") multTestResult = mt.rawp2adjp(BayesFit$p.value, type)
Here's the distribution of both of the groups