gravatar for Mensur Dlakic

1 hour ago by


K-mers may be in low abundance because they occur rarely in the genome, and in addition to that were not sequenced many times. A more likely explanation is that rare k-mers come from sequencing errors. You can probably find a statistical proof for that by Googling, but it should be pretty intuitive that k-mers that occur only once or twice are more likely to come from sequencing errors than be real.

Cutoffs are chosen such that we exclude as many k-mers as possible that result from sequencing errors. At the same time, we don't want to throw away the reads with truly rare k-mers. The exact number is determined from k-mer distribution and overall sequencing coverage.

This paper may help:

Source link