We all know the formula. All the interpretations don't make sense to me, however. What does it mean that I get $0.81$ information after knowing X? Or what does it mean that there is $2.33$ uncertainty? It means nothing.
I realized that Shannon entropy is strictly connected to the mean of encoding the information, therefore I propose the following interpretation:
Entropy is the average frequency of getting information that was encoded as densely as possible with given encoding.
Here is how I would see the formula:
$$H(X) = \sum_x \underbrace{\ p_x}_{\text{probability of getting information x}}\underbrace{\log p_x^{-1}}_{\text{ after } p_x^{-1} \text{other bits in this encoding}}$$
Suppose now, that we want to use different encoding than binary, e.g. unary encoding. Then, the entropy would simply look like:
$$H(X) = \sum_xp_xp_x^{-1} = |X|$$
Which is kind of obvious when considering unary encoding. For any encoding, I would define generalized entropy as: $$H(X) = \sum_xp_xf(p_x^{-1}),$$ where $f$ is the information encoding function.
Finally, my question: is my interpretation correct? Are they any flaws or major mistakes that I can't see?