2

We all know the formula. All the interpretations don't make sense to me, however. What does it mean that I get $0.81$ information after knowing X? Or what does it mean that there is $2.33$ uncertainty? It means nothing.

I realized that Shannon entropy is strictly connected to the mean of encoding the information, therefore I propose the following interpretation:

Entropy is the average frequency of getting information that was encoded as densely as possible with given encoding.

Here is how I would see the formula:

$$H(X) = \sum_x \underbrace{\ p_x}_{\text{probability of getting information x}}\underbrace{\log p_x^{-1}}_{\text{ after } p_x^{-1} \text{other bits in this encoding}}$$

Suppose now, that we want to use different encoding than binary, e.g. unary encoding. Then, the entropy would simply look like:

$$H(X) = \sum_xp_xp_x^{-1} = |X|$$

Which is kind of obvious when considering unary encoding. For any encoding, I would define generalized entropy as: $$H(X) = \sum_xp_xf(p_x^{-1}),$$ where $f$ is the information encoding function.

Finally, my question: is my interpretation correct? Are they any flaws or major mistakes that I can't see?

1 Answers1

0

This question asks for more of an opinion instead of an actual answer. I have always thought of entropy as a calculation of the opposite of how much a model has knowledge. High knowledge = Low entropy. So for example,

Model_1 = A machine that could predict with 99% accuracy the outcome of a coin flip.

Model_2 = I randomly guess if the coin comes out as heads.

Then each model exhibits a probability space of $<.99,.01>$ and $<.5,.5>$. The first has entropy of 0.056 (low entropy) and the second has entropy 0.693 (high entropy). In this way "very certain", "very knowledge", "very good at understanding", "high compressibility", "requires few explanatory variables", all mean the same thing "low entropy". Whereas, "very uncertain", "not good at predicting what is happening", "performs no better than random guessing", "very hard to compress", "requires a large number of variables to explain the data", all mean the same thing, "high entropy".

In physics the same sort of reasoning is done. Like if a room filled with two gases is perfectly divided on the left/right halves of the room, then the state of the gases is in lower entropy (with respect to the information presented). Likewise, when the two gases are mixed at random the state is higher entropy.

Bobby Ocean
  • 3,287