PHRED scores and sequence errors

Using my very basic probability skills, I will try to explain dsull comments with a little more detail.

You can calculate pretty easily the probability of the sequence being correct, i.e., no base is wrong, as (probability the base is correct) ^ #bases. So, for example, for a 100bp sequence with all bases Q20, the probability of the sequence being correct is (0.99)^(100) = 0.366, or 36.6% chance having no errors.

The probability of a sequence "being wrong" is one minus the probability of the sequence being correct. So, for the same example above of a 100bp sequence with all bases Q20, the probability of the sequence being wrong (i.e., containing one or more errors) is 1-0.366 = 0.634, or 63.4% chance of containing at least one error.

Note there is only one way of a sequence is correct (all bases must be correct), but there are many ways a sequence can be wrong - one base can be wrong, two bases, and so on. The estimation from your question - (0.01)^100 - is actually the probability of all bases of the sequence being wrong. 