I am running regressions using limma on 450k array DNA methylation data.
I'm interested in the relationship between circulating levels of a hormone and DNAm in ~144,000 CpG sites I've preselected based on variability (non variable sites likely won't tell us anything about our variable of interest).
I run regressions using limma, and plot the p-values, but they don't look uniform at all. The odd thing is that this is exactly the same for all 3 hormone measurements. I tried to remove covariates in case there is collinearity going on, many different versions of these models look exactly the same.
What would explain this kind of distribution - sparse at high and low p-values, but uniform across the rest?
Is there some way I can 'fix' this? More robust tests I can run etc. to see if there is a relationship between my hormones and DNAm?