Given certain assumptions (and foreshadowing an important result mentioned below), entropy is the measure of uncertainty.By the way, when I use the term entropy, I’m referring to Shannon entropy..There are quite a few other entropies, but I think it’s safe to assume that Shannon entropy is the one that is used most frequently in natural language processing and machine learning.So without further ado, here it is, the entropy formula for an event X with n possible outcomes and probabilities p_1, …, p_n:Shannon entropyBasic propertiesIf you are anything like me when I first looked at this formula, you might be asking yourself questions such as: Why the logarithm?.And (2) Are they any competing constructs that have all of the these desirable properties?In short, the answers for Shannon entropy as a measure of uncertainty are: (1) many and (2) no.Let’s proceed with a wish list.Basic property 1: Uniform distributions have maximum uncertaintyIf your goal is to minimize uncertainty, stay away from uniform probability distributions.Quick reminder: A probability distribution is a function that assigns a probability to every possible outcome such that the probabilities add up to 1..A distribution is uniform when all of the outcomes have the same probability..For example, fair coins (50% tails, 50% tails) and fair dice (1/6 probability for each of the six faces) follow uniform distributions.Uniform distributions have maximum entropy for a given number of outcomes.A good measure of uncertainty achieves its highest values for uniform distributions..Given n possible outcomes, maximum entropy is maximized by equiprobable outcomes:Equiprobable outcomesHere is the plot of the Entropy function as applied to Bernoulli trials (events with two possible outcomes and probabilities p and 1-p):In the case of Bernoulli trials, entropy reaches its maximum value for p=0.5Basic property 2: Uncertainty is additive for independent eventsLet A and B be independent events..The corresponding probabilities are given by [ 0.48, 0.32, 0.12, 0.08 ].The joint entropy (green) for the two independent events is equal to the sum of the individual events (red and blue).Plugging the numbers into the entropy formula, we see that:Just as promised.Basic property 3: Adding an outcome with zero probability has no effectSuppose (a) you win whenever outcome #1 occurs and (b) you can choose between two probability distributions, A and B..Distribution B has three outcomes with probabilities 80%, 20% and 0%.Adding a third outcome with zero probability doesn’t make a difference.Given the options A and B, which one would you choose?.It doesn’t matter.The entropy formula agrees with this assessment:Adding a zero-probability outcome has not effect on entropy.In words, adding an outcome with zero probability has no effect on the measurement of uncertainty.Basic property 4: The measure of uncertainty is continuous in all its argumentsThe last of the basic properties is continuity.Famously, the intuitive explanation of a continuous function is that there are no “gaps” or “holes”..A if you are a profit maximizer and B if you prefer with more variety and uncertainty.As the number of equiprobable outcomes increases, so should our measure of uncertainty.And this is exactly what Entropy does: H(1/6, 1/6, 1/6, 1/6, 1/6, 1/6) > H(0.5, 0.5).And, in general, if we let L(k) be the entropy of a uniform distribution with k possible outcomes, we havefor m > n.Property 6: Events have non-negative uncertaintyDo you know what negative uncertainty is?.Entropy is, thus, non-negative for every possible input.Property 7: Events with a certain outcome have zero uncertaintySuppose you are in possession of a magical coin..The result of the entropy function is always the same.SummaryTo recap, Shannon entropy is a measure of uncertainty.It is widely used because its satisfies certain criteria (and because life is full of uncertainty)..Shannon entropy is the natural choice among this family.In addition to other facts, entropy is maximal for uniform distributions (property #1), additive for independent events (#2), increasing in the number of outcomes with non-zero probabilities (#3 and #5), continuous (#4), non-negative (#6), zero for certain outcomes (#7) and permutation-invariant (#8).Thank you for reading!. More details