I have MBD-seq datasets of Dnmt1 Knock-Out and control cells.
As I expected, the genome browser showed that Dnmt1 KO lower the global DNA methylation levels (i.e. much less genomic regions were captured by MBD).
What I'm curious here is how I should normalize such data which the samples are expected to be different overall reads?
For example, if KO and control have 100 and 10,000 detected bins (> 0 reads), and each of them has a million total reads, each bin in KO will 100 times higher reads leading to the biased quantification of MBD-seq enrichment.
Would it be OK if I subset 1/100 of the reads from a KO sample to compensate the differences?
How do everyone think?