Hi Everyone:

It seems most peak callers default return only significant peaks (like I use MACS2 with parameter -q 0.05). Then these sig-peaks' narrowPeak file (or bed files) can be load into software like Diffbind, MAnorm2 for normalisation, consensus counting .etc. Finally, differential analysis will be done on them.

However, I tried to ask MACS2 return me all peaks (-q 1), and the distribution is different with sig peaks (below figure), which means if we do peak filtering at initial stage, no matter what analysis we do later, we are not dealing with peaks origin distribution. It may not influence much if I want to find differential regions, draw correlation heatmap .etc. However, I think some situations may be influenced, like:

For example, after differential calling, we may do p.value adjustedment based on number of total peaks compared. I think the value would be different if we include all peaks or only select those sig-peaks.

Another example is, I want to have a null distribution for peaks (exactly the below density plot), or compared statistics after differential analysis. If I initially do peak filtering, I can never get null distribution right? It's more like we select just the top 100 students from A school and B school each, then compare scores between them, no matter how we compare, we can never know the score distribution of all students for school A or school B right?

The third example is, if I want to check how many peaks (sig or not) are enriched in a certain gene, without all peak information, I can never know. I can only know how many sig peaks enriched each gene.

enter image description here

I am new to ChIP-SEQ, and I want to know if it's possible that we don't do any filtering, just put all peaks (-q -1) returned into nearly all analysis. The significance filtering can be done manually after calculation at any stage. Is that work?

In another word, I want to know peaks with higher q value means they have less intensity (like a unexpression gene, or low methylated CpG), or they are low-quality, unreliable results should be discarded.

Source link