6

I am reading the concept of entropy from Peter Walters' An Introduction to Ergodic Theory and I am having trouble understanding the notion of the entropy of a measure preserving transformation.

Definitions:

Let $(X, \mathcal{F}, \mu)$ be a probability space. For a partition $\xi=\{A_1 , \ldots, A_m\}$ of $X$ (where each $A_i$ is measurable) the entropy of $\xi$ is defined as:

$$ H(\xi) = -\sum_{i=1}^m \mu(A_i)\log(\mu(A_i)) $$

If $T:X\to X$ is a measure preserving transformation, we write $T^{-1}\xi$ to denote the set $\set{T^{-1}(A_i):\ 1\leq i\leq m}$. Thus $H(T^{-1}\xi)=H(\xi)$.

Now the entropy of a measure preserving transformation $T:X\to X$ with respect to $\xi$ is defined as (see Def. 4.9 in the aforementioned text)

$$ h(T, \xi) = \lim_{n\to \infty} \frac{1}{n} H\left(\bigvee_{i=0}^{n-1} T^{-i}\xi\right), $$

where $\bigvee_{i=0}^{n-1} T^{-1}\xi$ is the coarsest common refinement of the partitions $T^{-i}\xi$.

The Problem:

Just after giving the definition, the author writes

This means that if we think of an application of $T$ as a passage of one day of time, then $\bigvee_{i=1}^{n-1}T^{-i}\xi$ represents the combined experiment of performing the original experiment, represented by $\xi$, on $n$ consecutive days. Then $h(T, \xi)$ is the average information per day that one gets from performing the original experiment.

I do not entirely follows this. If the application of $T$ is the passage of one day, that is, it takes us one day into the future, why is the expression $\bigvee_{i=1}^{n-1} T^{-i}\xi$ is the combined experiment (wait, what is intuitive meaning of 'combined experiment'?) for the next $n$-days. We are taking backward images of $T$ in this expression, not the forward images.

At any rate, I do not have any intuition for the last definition presented above. Can someone please try to give some insight.

Thanks.

Alp Uzman
  • 12,209

2 Answers2

3

Ah, the use of backwards image in ergodic theory, an unending source of confusion for learners...

By definition, the set $T^{-n} A$ is $\{x \in X: \ T^n (x) \in A\}$, so really, it is about the forward orbit of the system!

Now, fix a partition $\xi$. An element $[a]_n \in \bigvee_{k=0}^{n-1} T^{-k} \xi$ is a subset of the form $a_0 \cap T^{-1} a_1 \cap \ldots \cap T^{-(n-1)} a_{n-1}$, where each $a_i$ belongs to $\xi$. In other words, knowing that a point belongs to $[a]_n$ means that you know that $x \in a_0$, $T(x) \in a_1$, $\ldots$, $T^{(n-1)} (x) \in a_{n-1}$.

If the result of your experiment has finitely many possible values, let the partition $\xi$ be generated by these values; then knowing $\bigvee_{k=0}^{n-1} T^{-k} \xi$ means knowing the result of the experiment until day $n-1$.

The entropy for the partition $\xi$ is then the rate of exponential growth of possible results until time $n$. As for the mention of "average information", you could look up Shannon entropy in a separate reference -- it explains this formulation, and I don't think I have seen this subject deal with satisfactorily in an ergodic theory book.

D. Thomine
  • 11,228
2

I first read Walters's book and was similarly confused. I think "Ergodic Theory and Dynamical Systems" by Yves Coudene explains everything much better. I will try to explain it below, but if what I say doesn't click with you, I'd definitely recommend Coudene's book.

You have some point $x$ in your space $X$ and a given partition $\xi$ of $X$. You want to know what element of the partition $\xi$ that $x$ is in. I first tell you what element of $T^{-1}\xi$ that $x$ is in. Then I tell you what element of $T^{-2}\xi$ that $x$ is in, which is equivalent to telling you what element of $T^{-1}\xi \vee T^{-2}\xi$ that $x$ is in (since we already know what part of $T^{-1}\xi$ $x$ is in). Etc. I think this is the "experiment" that's going on. On each day, we test where $x$ is in the next inverse image of $\xi$ under $T$ and view the new knowledge of $T^{-1}\xi \vee \dots \vee T^{-n}\xi \vee T^{-(n+1)}\xi$ compared to $T^{-1}\xi \vee \dots \vee T^{-n}\xi$ as gaining some information on where $x$ is in $\xi$. One can write $\frac{1}{n}H(\vee_{i=0}^{n-1} T^{-i}\xi) = \frac{1}{n}[H(\xi)+H(\xi | T^{-1}\xi) + \dots + H(\xi | T^{-1}\xi \vee \dots \vee T^{-(n-1)}\xi)]$, which more easily shows this "average information gained" intuition in. Of course, the sequence $H(\xi | T^{-1}\xi \vee \dots \vee T^{-n}\xi)$ is decreasing, so entropy is also $\lim_n H(\xi | T^{-1}\xi \vee \dots \vee T^{-n}\xi)$, which is the formula I like the best, since it very simply expresses the idea that we want to see how easy it is to guess what element of $\xi$ that $x$ is in based on the knowledge of $x$'s position in the preimages of $\xi$ under $T$.

The example that best cemented this in my mind was a Bernoulli shift. Say $X = \{0,1\}^\mathbb{N}$ with let's say $\frac{1}{2}-\frac{1}{2}$ product measure and $T$ the left shift. If $\xi = \{\{(x_n)_n \in X : x_1 = 0\}, \{(x_n)_n \in X : x_1 = 1\}\}$, then learning where a given $x$ is in $T^{-j}\xi$ is just telling you the $j^{th}$ bit of $x$. As this gives no information about the first bit of $x$, the entropy of $T$ w.r.t. $\xi$ is just $\log 2$, the entropy of $\xi$ itself. This intuition also explains why an one-sided generator has entropy 0 for an invertible transformation. Knowing the elements of $T^{-1}\xi, T^{-2}\xi, \dots$ that a given $x$ is in will allow us to completely determine $x$ and in particular know $x$'s position in $\xi$ (invertibility was used to show that $\mathcal{B} = T^{-1}\mathcal{B} = T^{-1}(\xi \vee T^{-1}\xi \vee \dots)$.

I know I went on a long rant, but here are some takeaways. First, know all the different expressions for the entropy. Different expressions show some intuition of the meaning of entropy better than others. Also, we use preimages of $\xi$ (not forward images) just because of the fact that we measure preserving deals with preimages and everything is easier - it's not too important.

mathworker21
  • 35,247
  • 1
  • 34
  • 88
  • Thank you for the detailed answer. Also, the book you have suggested seems quite useful. I will use it in complement with Walter's book. – caffeinemachine Jul 14 '18 at 07:37