NOTE: Shannon's entropy measures the information we receive when uncertainty is removed from situations described probabilistically. Intuitively, this is the "amount of surprise" we experience when uncovering the correct symbol. When there are no distinguishing characteristics among the alternatives, the uncertainty is best captured by Hartley's measure on the set of alternatives---which yields they same value of the Shannon entropy for a uniform probability distribution, whereby all n alternatives have the same probability 1/n. This way, the Hartley measure is always the upper bound of the Shannon entropy. Intuitively, this means that when the underlying probability distribution is not uniform, uncovering the correct symbol yields less surprise than when all alternatives are equally likely (or have no distinguishing characteristics). For example, consider two symbols {a, b} with probability distribution (5/6, 1/6). Shannon's entropy for this distribution is 0.65 bits, whereas the Hartley measure of two alternatives is 1 bit, which is the same as the Shannon entropy if the probability distribution were (1/2, 1/2). So Hartley's measure tells us that we receive 1 bit of information when we discover the correct symbol (we need one yes-no question to settle the symbol). Shannon's entropy, in contrast, tells us that we receive only 0.65 bits of information---we are less surprised, because we know that symbol a is much more likely than symbol b. This means that if we were to guess the correct symbol multiple times, knowing the distribution, on average we would guess the correct symbol more often than if we did not know the distribution. In other words, on average, we would need less than one yes-no question.