12

If we define the characteristic function for a random variable X as

$\Phi(t)=<e^{itX}>$

then it seems like we can think of it as essentially a spectral decomposition that measures the contributions of different frequencies to the probability distribution for X. I know how the moments are related to the derivatives at $t=0$, but I think that I might be missing some of the deeper connection between the moments and the spectral decomposition. If anybody had some thoughts on this then I would love to hear them, but I'm particularly interested in the same sort of question applied to the cumulants.

We can then define the cumulant generating function in terms of $\Phi$ such that

$\Psi(t)=\ln\Phi(t)$

and

$\Psi^{\prime}(t)=\frac{\Phi^{\prime}(t)}{\Phi(t)}$

Now, what I'm really trying to ask, is what these equations are telling us about the meaning of the cumulant generating function. Again, I understand how the cumulants are determined, how they relate to the moments, why the generating function was defined this way, etc. What I don't understand is if there's a simple interpretation of either $\Psi(t)$ or $\Psi^{\prime}(t)$ at any given value of $t$. Is it valid to think of $\Psi(t)$ as a spectral decomposition of a second hypothetical probability distribution that has moments equal to the cumulants of the original distribution? Thanks for any answers!

Ivanna
  • 233

1 Answers1

10

For simplicity let us assume that $X$ has mean zero, so I don't accidentally say something obviously wrong by mixing up cumulant and moment.

A few basic comments:

You can look at $\Psi(z)=E[e^{zX}]$ for a complex parameter $z$. This unifies the notion of the characteristic function (which is the restriction of $\Psi$ to the imaginary axis) and the cumulant generating function (which is the restriction of $\Psi$ to the real axis).

This unified object $\Psi$ is really the "Fourier transform" of the (formal) density of $X$. So they are really the same object, the only issue is that often the domain of $\Psi$ doesn't contain the real axis but it is always guaranteed to contain the imaginary axis.

The term "generating function" should really already be alluding to the fact that the cumulant generating function is a tool, not really an object of interest per se. In general generating functions are used as methods for studying the coefficients of their (perhaps formal) power series, and are not of much interest in and of themselves.

With that said, the most direct interpretation of the cumulant generating function per se that I can think of comes from Cramer's theorem. This loosely says that if $X_i$ are iid random variables with a cumulant generating function, and $n$ is large, then the probability that $|\sum_{i=1}^n X_i|>nx$ is approximately $e^{-nI(x)}$. Here $I(x)$ is called the rate function and is given explicitly by the Legendre transform of the logarithm of the cumulant generating function:

$$I(x)=\sup_{t \in \mathbb{R}} tx-\ln \Psi(t).$$

Notice that this supremum, if it is finite, will be attained where $(\ln \Psi)'(t)=x=\Psi'(t)/\Psi(t)$. Thus in effect we can look at $\psi=(\Psi'/\Psi)^{-1}$, and then $I(x)=x\psi(x)-\ln \Psi(\psi(x))$ (on the domain of $\psi$, anyway). $\Psi'/\Psi$ is guaranteed to be injective (but not surjective) because $\Psi$ is log-convex.

But $I$ has a relatively concrete interpretation as measuring the decay rate of large deviations, so this gives us a way of thinking about $\Psi$ and $\psi$.

An instructive example comes when you consider $X_i$ equally likely to be $-1$ or $1$; in this case $\Psi=\cosh$ and $\psi=\tanh^{-1}$, so that $I(x)=x\tanh^{-1}(x)+\frac{1}{2}\log(1-x^2)$ in $(-1,1)$ (extended by continuity to $-1$ and $1$, where the value is easily seen by simple counting considerations to be $\log(2)$.) This gives us the exponential decay of the tail behavior of the sum.

But neither term really expresses it properly in isolation. For instance, notice that the two terms cancel out their respective singularities at $\pm 1$, so there is no hope of understanding the behavior there without both terms. To put it another way, quantitatively understanding the tail is requiring us to know not really how $\psi$ and $\Psi$ behave by themselves but how much $id \cdot \psi$ differs from $\ln \circ \Psi \circ \psi$. That can't possibly be encapsulated in a single value of $\Psi$, at the very least you need to know $\Psi$ on some interval to get this information.

Ian
  • 104,572
  • Thanks for the answer. Unfortunately, I was hoping for a more intuitive explanation behind the cumulant. I am aware of the results you mentioned, so I'm actually interested in questions like "if $\log \Psi(3) = 10$, it means this-and-that", sorry for being imprecise. – Zuza Nov 11 '17 at 20:57
  • @Zuza I doubt you will find any such thing because as you can see from Cramer's theorem, it somehow depends explicitly on the "dual variable" to the natural variable of $I$, which is really the meaningful quantity for measuring tails. That is, its argument is like a "temperature" whereas the natural space is like "energy". You need this additional function $\psi$ to really extract meaning from $\Psi$. You might be able to ask for intuitive meaning of $\Psi \circ \psi$, though. – Ian Nov 11 '17 at 21:16
  • I am sure some progress can be done on this question. For instance, if the first singularity of the cumulant generating function is at S, then the tail falls off as $f(t) \sim e^{-S \cdot t}$. Similarly, I didn't investigate what happens when the CDF has a sharp increase in some position. Any progress on such questions would be awesome. – Zuza Nov 12 '17 at 00:39
  • @Zuza Sure, the overall decay of the tail can be quantified by the singularity, but there's not much else you can do. Even something as simple as the variance is gated behind a second derivative operation that makes direct interpretation of $\Psi$ itself nontrivial. (In other words: knowing whether $\Psi(3)$ is finite or infinite tells me something easily interpreted. Knowing specifically that it is $10$ is rather opaque.) I think the best thing actually will be to try to understand how $\psi$ and its inverse allow you to switch between the "energy" and "temperature" variables. – Ian Nov 12 '17 at 00:41
  • Sorry, but I disagree with you on this one. I'm sure there must be some quantifiable difference between distributions that have a sharp increase on S_1 and one that has a sharp increase on S_2, even if both are finite. Similarly how there is a difference between those with different Fourier Series. – Zuza Nov 12 '17 at 00:48
  • A sharp increase in the CDF in a small region means a high probability to fall in that region, which is loosely reflected in the MGF through the first moment being made closer to that number and the second moment being smaller. But that's through the MGF as a generating function not really a function in and of itself. In terms of the MGF as a function in and of itself, these things come down to something like "the MGF of $X$ looks more like a line with a slope equal to that number in a neighborhood of zero than the MGF of $Y$ does". – Ian Nov 12 '17 at 01:59
  • Note that all of that is about the MGF on a neighborhood of zero, not at one point. – Ian Nov 12 '17 at 02:02