12

An $\mathbb{R}$-valued discrete-time stochastic process $\{X_n\}_{n \in \mathbb{Z}}$ is said to be strictly stationary if for all choices of times $t_1, \ldots , t_n \in \mathbb{Z}$ and lags $h \in \mathbb{Z}$ the following holds $$(X_{t_1}, \ldots , X_{t_n}) \stackrel{D}{=} (X_{t_1+h}, \ldots , X_{t_n+h})$$

In particular, no moment assumptions are imposed and all the processes are indexed over the integers. Now, consider the equation $$X_n = X_{n-1} + \epsilon_n \qquad (\star )$$ where $\epsilon_n$ is a strictly stationary white noise (i.e. an iid sequence). I am interested in showing that the only solution to this equation is $X_n = \epsilon_n = 0$. Here, a solution means expressing each $X_n$ as a function of $\{\epsilon_n\}_{n\in \mathbb{Z}}$.

The case where we assume that $X_n$ and $\epsilon_n$ are also weakly stationary (i.e. constant mean and variance, with covariance only depending on the lag, $\mathrm{Cov}(X_t, X_s) = \gamma (|t-s|) = \gamma (h)$) is a trivial consequence of the Cauchy-Schwartz inequality, but of course this relies on the existence of a second moment for each of the processes.

It is also tempting to assume that solutions of $(\star )$ must be causal or non-anticipative, and thus argue by assuming the independence of $X_{n-1}$ and $\epsilon_n$. Arguing like that would be incorrect once one observes that a stationary solution to the equation $X_n = \phi X_{n-1} + \epsilon_n$ for $|\phi| > 1$ is given by $X_n = - \sum_{j=1}^\infty \phi^{-j} \epsilon_{n+j}$, where we note that $X_n$ is future-dependent. The situation is even worse when one discovers there are stochastic equations whose solutions are of the form $X_t = \sum_{j=-\infty}^\infty \psi_j \epsilon_{t-j}$, or another (not-necessarily linear) function depending on the entire history of $(\epsilon_n)_{n \in \mathbb{Z}}$. Indeed, the only restriction to the solution space is that $X_t$ be the measurable image of the entire history of the innovation sequence.

An analytically-flavoured approach to this problem is by generalising it to stochastic processes which take values on a locally compact topological group $G$ and the addition in $(\star )$ is treated as the group operation. In this setting, one observes that the problem is solved by considering the idempotent measures, which coincide with the Haar measures on the compact subgroups of $G$. For instance, the equation is trivially satisfied in the circle group (viewed as a subgroup of $\mathbb{C}^*$) where the $\epsilon_n$ are uniformly distributed on the unit circle. But again, this forces us to consider solutions where $X_{n-1}$ and $\epsilon_n$ are independent.

Jose Avilez
  • 13,432

1 Answers1

7

Since $(X_n)$ is stationary, the marginal distributions of $X_n$ form a tight family of distributions, namely, $$ \lim_{M\rightarrow\infty}\sup_{n\ge 1} P(|X_n|>M)=\lim_{M\rightarrow\infty}P(|X_n|>M)=0. $$ On the other hand, the marginal distributions of the random walk $W_n=\sum_{i=1}^n \epsilon_i$ are not tight unless $\epsilon_i$ is zero almost surely. Indeed, if $P(\epsilon_i=0)<1$, then for any $M>0$, we have $$ \lim_{n\rightarrow\infty} P(|W_n|>M)=1. $$ One can check this following Chapter 9 Exercise 2 of Kallenberg's Foundation of Modern Probability 2nd edition using the following inequality of characteristic function \begin{equation} P(|W_n|\le M)\le 2M \int_{-1/M}^{1/M} |\varphi(t)|^n dt, \end{equation} where $\varphi(t)= E[e^{it \epsilon_1}]$.

Lemma If $\epsilon_1$ is not almost surely a constant, then there exists a interval $[-\delta,\delta]$, $\delta>0$, such that $|\varphi(t)|<1$ for all $t\in [-\delta,\delta]$.

Proof of the lemma: this follows from [Post][1] and the continuity of a characteristic function.

So if $\epsilon_1$ is not a constant almost surely, then one can choose $M$ large enough in the inequality above, and conclude by the Dominated Convergence Theorem that $$ \lim_n P(|W_n|\le M)=0. $$ If $\epsilon_1$ is almost surely a nonzero constant, the same conclusion holds trivially as well. So we have shown that the marginal distributions of a non-zero random walk (including deterministic drift) is not tight.

Lemma Suppose random variables $|W_n|\rightarrow \infty$ in probability (in the sense of the displayed line above for arbitrary $M$) and $X$ is a fixed finite random variable. Then $|W_n+X| \rightarrow \infty$ in probability as well.

Proof of the lemma: immediate if using a sub-sub sequence argument to relate to almost sure divergence to infinity: a sequence of variables $Y_n \rightarrow \infty$ in probability if and only if for any subsequence of $(Y_n)$ there exists a further subsequence which diverges to $\infty$ almost surely.

Apply the previous lemma to $X_n=X_0+W_n$ to obtain a contradiction.

We can have all $X_n$ identical to each other (not necessarily a zero random variable), which is a stationary process.

[1]: https://math.stackexchange.com/questions/1410331/characteristic-function-with-modulus-1-implies-degenerate-distribution#:~:text=Characteristic%20function%20with%20modulus%201%20implies%20degenerate%20distribution,-probability%20characteristic%2Dfunctions&text=Let%20X%20be%20a%20random,X%3Dc)%3D1.

Uchiha
  • 923
  • Good solution indeed. I checked it, and it doesn't have any error so far. – Paresseux Nguyen May 27 '21 at 21:34
  • However, it would be nice if you can explain all steps so that other MSE fellows can follow easily. – Paresseux Nguyen May 27 '21 at 21:35
  • Agreed! I certainly would benefit from the details. – Jose Avilez May 27 '21 at 23:34
  • @Uchiha Thank you for the details. Is there a typo in the last equation? Shouldn't it be $\sup_n P(|X_0| < M) + P(|W_n| > 2M)$? Otherwise, I don't see how the product slipped in... – Jose Avilez May 28 '21 at 15:14
  • The event ${|X_n|>M}$ contains the event ${|X_0|<M, |W_n|>2M }$ since $|X_n|\ge |W_n|-|X_0|$. Then apply independence. Does it make sense now? – Uchiha May 28 '21 at 19:17
  • @Uchiha But that assumes that all solutions to $(\star )$ must have that $X_n$ is independent of $W_m$ for $m > n$, which means that we can't find solutions in the class of non-anticipative (or causal) processes. Otherwise, one could simply take characteristics on both sides of $(\star )$ and conclude immediately. Can that step be done without assuming independence? – Jose Avilez May 28 '21 at 19:36
  • @JoseAvilez OK. I tried to remove independence. – Uchiha May 28 '21 at 20:01