Entropy is a function that maps a discrete probability distribution to a real number.
A discrete probability distribution is a set of probabilities $p_1, \dots, p_K$ where $p_i$ is the probability of observing event $i$.
We define the entropy of the distribution $H(p_1, \dots, p_K) = - \sum_{i = 1}^K p_i log_2 (p_i)$.
In the case of $K = 1$, the plot of entropy against $p_1$ looks like an inverted U with a minimum at 0 and a maximum at 1 when $p_1 = 0.5$.
In general, entropy attains a maximum at $\log_2(K)$ when we set all $p_i = \frac{1}{K}$.
The closer entropy is to 0, the closer the probability distribution is to putting all the probabilility mass at one of the events. The closer entropy is to $log_2(K)$, the closer the distribution is to a uniform distribution over events, or a "flat prior".
Suppose we have just two events: A and B, i.e. $K = 2$, and we sample a sequence of $N$ events where at each step in the sequence we have $p$ probability of seeing A and $1 - p$ probability of seeing B.
There are $2^N$ possible sequences we could observe. However, many of these sequences are rare. In particular, for very large $N$, the event A will occur about $N \cdot p$ times in a sequence. The number of sequences where the event A occurs $N \cdot p$ times is just the number of ways to choose $N \cdot p$ positions out of the $N$ total positions in a sequence, which using Stirling's formula, is about $2^{N \cdot H(p, 1 - p)}$. In other words, entropy determines the size of the subset of "typical" sequences for a probability distribution. If entropy is 0, there's just 1 typical sequence, because the same event happens over and over again. If entropy is 1, then the number of typical sequences is equal to the total number of possible sequences.