I am working with DNA-methylation in salmon and have recently aquired data from an RRBS experiment. Fastqc reports that my reads consist of around 40% PCR duplicates, which is quite high. However, I have read that I should not remove duplicates e.g. by simply removing reads that have the exact same start and stop position in the genome when working with RRBS data, but this did not come with a proper explanation. This sort of makes sense to me because of the way the library prep is performed: MspI cleaves only CCGGs + size selection of fragments --> you will probably end up with fragments that are pretty similar, and they might therefor be termed PCR duplicates of each other by fastqc. This is of course based on my non-exhaustive understandig of these processes.
I can´t seem to find any good explanations on how to perform a proper PCR duplicate removal for RRBS data, if that is indeed called for (which I suspect it is).
Does anyone know how to do this or can anyone point me to where I might find this information?
Thanks in advance!