This article introduces two kinds of probability: variability and uncertainty.

Variability comes from inherent randomness in a system. Flipping a coin or rolling dice involves variability from one throw to the next because the outcome is very hard to predict, even if you've collected good statistics from many previous throws.

Uncertainty comes from incomplete knowledge about the system. If it weren't for the unfortunate accident that bent this coin, you would expect it to have a 50% chance of landing heads. But now we have some uncertainty about the actual probability. It might be 40%, it might be 60%, it might even be 5%.

Let's say that we have flipped the coin seven times, and we got five heads and two tails.

This graph shows the probability distribution (uncertainty) of the coin's actual probability (variability) of coming up heads. You can change the number of heads and tails to see how that changes the graph.

Go on, try it. Put in zero heads and zero tails - the distribution is flat because there is no evidence, the probability could be anything. Put in three heads and zero tails - the most likely probability is 100%, but the distribution is quite spread out because there's not much evidence yet. With 50 heads and 50 tails the most likely probability is 50%, and it's quite a narrow peak because with that much evidence on both sides it is quite unlikely that the coin's probability is really 5% or 95%. (But not impossible - a higher resolution graph would show the curve only touches zero at 0% and 100%.)

We always have incomplete knowledge - and so there is always both variability and uncertainty. In order to make a rational prediction, you must take into account the amount of evidence you have for each outcome. Every prediction involves some uncertainty. Every time you throw the coin, you collect a little more evidence.

The Beta distribution, illustrated above, is what the Blerpl learning algorithm uses to model the binary independent causes it discovers as it builds its model of the world.