Is there a model (or a reason) why genome with low abundance only show a few portions of the reference genome?
I am doing a WGS on virus sequences; the mean coverage was 25, and for endogenous retroviruses I have such kind of coverage:
But for other, such as HHV7, I only have sparse reads with low coverage:
The low coverage I can understand: there is just a little bit of viral DNA, thus there is a low probability of sequencing it.
What I am less comfortable with is the fact the only a portion of the genome is covered.
Is this normal? Is there a model or some paper describing this phenomenon? Are there some factors that increase the sequencing of a certain portion of a low abundance genome rather than others?