6

The title can be shortened to "prove that $AR(1)$ processes are strictly stationary when $|\alpha|<1$". This has been discussed many times on MSE and Cross Validated, but I found no mathematical proof of why it is strictly stationary.

For $t\in\mathbb{Z}$, consider the process recursively defined as $X_{t}:=\alpha X_{t-1}+\epsilon_{t}$ where $\epsilon_{t}$s are i.i.d $\sim N(0,\sigma^{2})$. I want to show that the process $\{X_{t}:t\in\mathbb{Z}\}$ is strictly stationary when $|\alpha|<1$.


I have a not-bad attempt, but I got stuck in the end.

First, by the recursive relation, it is easy to see that $$X_{t}=\alpha^{n}X_{t-n}+\sum_{k=0}^{n-1}\alpha^{k}\epsilon_{t-k}.$$ Define $Y_{n}:=\sum_{k=0}^{n-1}\alpha^{k}\epsilon_{t-k}$. We recall that any linear combination of independent Gaussian random variables is Gaussian. In particular, since $\epsilon_{t}$ is i.i.d $N(0,\sigma^{2})$, it follows that $Y_{n}\sim N(0,\sigma^{2}\sum_{k=0}^{n-1}\alpha^{2k})$. We set $\sigma_{n}^{2}:=\sigma^{2}\sum_{k=0}^{n-1}\alpha^{2k}$. Note that since $|\alpha|<1$, when $n\rightarrow\infty$, the variance converges $$\sigma_{n}^{2}\longrightarrow\sigma^{2}\sum_{k=0}^{\infty}\alpha^{2k}=\dfrac{\sigma^{2}}{1-\alpha^{2}}.$$ Consider the characteristic function $\varphi_{n}(t)$ of $Y_{n}$. It is of the following form $$\varphi_{n}(t):=\mathbb{E}(e^{itY_{n}})=e^{it\mu_{Y_{n}}-\frac{1}{2}\sigma^{2}_{Y_{n}}t^{2}}=e^{-\frac{1}{2}\sigma^{2}_{n}t^{2}}\longrightarrow e^{-\frac{1}{2}\frac{\sigma^{2}}{1-\alpha^{2}}t^{2}}=:\varphi(t).$$ We note that $\varphi(t)$ is the characteristic function of $Y\sim N(0,\frac{\sigma^{2}}{1-\alpha^{2}})$ and $\varphi(t)$ is continuous at $t=0$. Hence, in view of the Lévy's continuity theorem, it follows that $Y_{n}\longrightarrow Y\ \text{in distribution}$ when $n\rightarrow\infty$. Now, since $X_{t}=\alpha^{n}X_{t-n}+Y_{n}$ and since $|\alpha|<1$, we have $$X_{t}=\alpha^{n}X_{t-n}+Y_{n}\longrightarrow Y,\ \text{in distribution, as}\ n\rightarrow\infty.$$


This is where I got stuck. From here, we can see that, since the argument works for every $t\in\mathbb{Z}$, it follows that $X_{t}=_{d}Y$ for all $t$. So, every random variable in the process is Gaussian with the same parameters.

This does not imply strict stationarity directly because it does not specify the joint distribution. However, since $X_{t}$s are Gaussian, weak stationarity and strict stationarity are equivalent.

I know that $\mathbb{E}X_{t}=\mathbb{E}Y=0$ and $\mathbb{E}X_{t}^{2}=\mathbb{E}Y^{2}=\frac{\sigma^{2}}{1-\alpha}.$ However, I still don't know how to compute $\text{Cov}(X_{t},X_{s})$. Don't I need the joint distribution condition of $(X_{t},X_{s})$ for this?


An extension from this is the following question:

  1. How can I show that $X_{t}$ is not strictly stationary when $\alpha=1$ (let us assume real space).

I know that when $\alpha=1$, the above argument does not work. And the variance blows up. But that has nothing to say about the strict stationarity.

  1. What about $|\alpha|>1$?
Jose Avilez
  • 13,432
  • As you mentioned, you only need weak stationarity for Gaussians. What's stopping you from showing that ? The translation-invariance of that auto-correlation function should be easy to establish if you just expand everything. – dezdichado Jan 21 '24 at 18:21
  • See https://math.stackexchange.com/a/4226731/671710 – Jose Avilez Jan 21 '24 at 18:23
  • @JoseAvilez Nice reference. – JacobsonRadical Jan 21 '24 at 18:27
  • @dezdichado You are right. My problem comes down to that I don't know how to compute $\text{Cov}(X_{t},X_{s})=\mathbb{E}(X_{t}X_{s})-\mathbb{E}X_{t}\mathbb{E}X_{s}=\mathbb{E}(X_{t}X_{s})$, since for the most-right expression, I need the joint distribution between $X_{t}$ and $X_{s}$. – JacobsonRadical Jan 21 '24 at 18:27

1 Answers1

3

Let's just prove the general result when $\epsilon_t$ is in $L^2$. Notice that by a summability argument, using the fact that $|\alpha|<1$, we get that $$X_t = \sum_{k=0}^\infty \alpha^k \epsilon_{t- k} = f(\epsilon_t, \epsilon_{t-1}, \epsilon_{t-2}, \ldots)$$

where $f: \mathbb{R}^\infty \to \mathbb{R}^\infty$ is a measurable function. The sequence $(\epsilon_k)_{k \in \mathbb{Z}}$, being independent, satisfies $(\epsilon_{t})_{t \in \mathbb{Z}} \stackrel{D}{=}(\epsilon_{t+h})_{t \in \mathbb{Z}}$ for all $h\in \mathbb{Z}$. Let $t_1 < \ldots < t_l$ be times, $h \in \mathbb{Z}$, and $B$ be a Borel set in $\mathbb{R}^l$ then: \begin{align*} \mathbb{P} \left(\begin{bmatrix} X_{t_1} \\ X_{t_2} \\ \vdots \\ X_{t_l} \end{bmatrix} \in B \right) &= \mathbb{P} \left(\begin{bmatrix} f(\epsilon_{t_1}, \epsilon_{t_1 -1 }, \ldots ) \\ f(\epsilon_{t_2}, \epsilon_{t_2 - 1}, \ldots ) \\ \vdots \\ f(\epsilon_{t_l}, \epsilon_{t_l -1}, \ldots ) \end{bmatrix} \in B \right) \\ &= \mathbb{P} \left(\begin{bmatrix} f(\epsilon_{t_1 +h}, \epsilon_{t_1 -1 +h}, \ldots ) \\ f(\epsilon_{t_2+h}, \epsilon_{t_2 - 1 +h}, \ldots ) \\ \vdots \\ f(\epsilon_{t_l+h}, \epsilon_{t_l -1+h}, \ldots ) \end{bmatrix} \in B \right) \\ &= \mathbb{P} \left(\begin{bmatrix} X_{t_1+h} \\ X_{t_2+h} \\ \vdots \\ X_{t_l+h} \end{bmatrix} \in B \right) \end{align*} Which means that $(X_{t_1} , \ldots X_{t_l}) \stackrel{D}{=} (X_{t_1+h} , \ldots X_{t_l+h})$, showing that $(X_t)$ is strictly stationary.

For your extensions:

  1. for $|\alpha| = 1$ you can show that the variance of the process must blow up and hence it cannot be stationary.

  2. For $|\alpha| > 1$ the process still admits a strictly stationary solution, but it is now future-dependent. To find out the $MA(\infty)$ form, try recursing forward in time to express $X_t$ as a function of $\epsilon_{t+1}, \epsilon_{t+2}, \ldots$

Jose Avilez
  • 13,432
  • Why is ${f(\epsilon_{t},\epsilon_{t-1},\dots):t\in\mathbb{T}}$ strictly stationary or i.i.d, which was used in the second equality?

    For $|\alpha|=1$, why does blown-up variance imply no stationarity? It only implies not $L^{2}$ right?

    – JacobsonRadical Jan 22 '24 at 12:44
  • oh okay for the first question, you used the theorem in the post you referred to. But I am also cautious of that theorem, because I don't see why $f$ being measurable in the product $\sigma$-algebra implies the equality in distribution. – JacobsonRadical Jan 22 '24 at 13:03
  • @JacobsonRadical If $X$ and $Y$ agree in distribution, then $f(X)$ and $f(Y)$ agree in distribution for measurable $f$. – Jose Avilez Jan 22 '24 at 14:33
  • @JacobsonRadical You are correct about the $|\alpha|=1$ case: it only proves the non-existence of an $L^2$ stationary solution. I asked that particular question a while back: https://math.stackexchange.com/questions/4151128/the-only-strictly-stationary-random-walk-in-mathbbr-is-degenerate – Jose Avilez Jan 22 '24 at 15:46
  • Thanks for the reference. I will read it at once. For the first question, I am sorry that I am still confused. I agree the fact you referred to. Say, let us take $\mathbf{X}:=(\epsilon_{t_{1}})$ and $\mathbf{Y}:=(\epsilon_{t_{1}+h})$, then we can conclude that $$\mathbb{P}(f(\mathbf{X})\in B)=\mathbb{P}(f(\mathbf{Y})\in B)$$ for all $B\in\mathcal{B}(\mathbb{R})$.

    However, in the second equality, we need the joint distribution, right? In other word, we need the index $\mathbf{X}{t}:=(\epsilon{t})$ and

    – JacobsonRadical Jan 22 '24 at 16:51
  • and we need to show that $$\mathbb{P}[(f(\mathbf{X}{t}),f(\mathbf{X}{t+1}),\dots)\in B]=\mathbb{P}[(f(\mathbf{X}{t+1}),f(\mathbf{X}{t+2}),\dots)\in B]$$ for all $B\in\mathcal{B}(\mathbb{R}^{\infty})$, which is requiring a different thing I guess – JacobsonRadical Jan 22 '24 at 16:53
  • @JacobsonRadical The set ${f ((\epsilon_{t-k}){k \in \mathbb{N}}) \in B}$ for a Borel subset $B$ of $\mathbb{R}$ equals the set ${ (\epsilon{t-k})_{k \in \mathbb{N} } \in C }$ for some Borel set $C$ in $\mathbb{R}^\mathbb{N}$. You can use stationarity of $\epsilon$ to shift there, which is what I do in my second equality – Jose Avilez Jan 22 '24 at 17:04
  • Let me accept the answer first. So my confusion here is, what you can show from the fact you mentioned is that $$\mathbb{P}(f(\mathbf{X}{t})\in B)=\mathbb{P}(f(\mathbf{X}{t})\in B),\forall B\in\mathcal{B}(\mathbb{R}).$$

    But instead, we need to do the finite dimensional distribution $$\mathbb{P}[(f(\mathbf{X}{t}),\dots, f(\mathbf{X}{t+h}))\in B]=\mathbb{P}[(f(\mathbf{X}{t+1}),\dots, f(\mathbf{X}{t+1+h}))\in B],\forall B\in\mathcal{B}(\mathbb{R}^{h}).$$

    How does the fact you mentioned imply this?

    – JacobsonRadical Jan 22 '24 at 17:33
  • @JacobsonRadical the exact same argument works there (notice how my answer considers the vectors $(X_{t_1} , \ldots , X_{t_n})$ and $(X_{t_1+h} , \ldots , X_{t_n+h})$. The set $C$ you obtain is now in $\mathbb{R}^{n \times \mathbb{N}}$. – Jose Avilez Jan 22 '24 at 19:42
  • Yeah I was being dumb. I agree. Thx for the nice answer. – JacobsonRadical Jan 22 '24 at 20:38
  • 1
    @JacobsonRadical not at all! You are correct in stating that this proof is rarely seen in texts. I learnt it in my graduate time series class and the $|\alpha| = 1$ case in full generality was left as a challenging exercise. +1 for the good question – Jose Avilez Jan 22 '24 at 20:41