Skip to content

Latest commit

 

History

History
executable file
·
23 lines (13 loc) · 2.04 KB

File metadata and controls

executable file
·
23 lines (13 loc) · 2.04 KB

Entropy

Entropy is a function that maps a discrete probability distribution to a real number.

A discrete probability distribution is a set of probabilities $p_1, \dots, p_K$ where $p_i$ is the probability of observing event $i$.

We define the entropy of the distribution $H(p_1, \dots, p_K) = - \sum_{i = 1}^K p_i log_2 (p_i)$.

In the case of $K = 1$, the plot of entropy against $p_1$ looks like an inverted U with a minimum at 0 and a maximum at 1 when $p_1 = 0.5$.

In general, entropy attains a maximum at $\log_2(K)$ when we set all $p_i = \frac{1}{K}$.

The closer entropy is to 0, the closer the probability distribution is to putting all the probabilility mass at one of the events. The closer entropy is to $log_2(K)$, the closer the distribution is to a uniform distribution over events, or a "flat prior".

Suppose we have just two events: A and B, i.e. $K = 2$, and we sample a sequence of $N$ events where at each step in the sequence we have $p$ probability of seeing A and $1 - p$ probability of seeing B.

There are $2^N$ possible sequences we could observe. However, many of these sequences are rare. In particular, for very large $N$, the event A will occur about $N \cdot p$ times in a sequence. The number of sequences where the event A occurs $N \cdot p$ times is just the number of ways to choose $N \cdot p$ positions out of the $N$ total positions in a sequence, which using Stirling's formula, is about $2^{N \cdot H(p, 1 - p)}$. In other words, entropy determines the size of the subset of "typical" sequences for a probability distribution. If entropy is 0, there's just 1 typical sequence, because the same event happens over and over again. If entropy is 1, then the number of typical sequences is equal to the total number of possible sequences.

Sources