2

To solve a problem at work, I need to write an algorithm that will find the limit of a user-submitted Markov chain. The Wikipedia page on Markov chains says

Because there are a number of different special cases to consider, the process of finding this limit if it exists can be a lengthy task

Wikipedia then goes on to explain that some Markov chain limits can be found using the formula

$$Q=f(0_{n,n})[f(P-I_n)]^{-1}$$

where

$P \equiv$ the transition matrix

$Q = \lim_{k\to\infty} P^k$

$f(A) \equiv$ a function that returns the matrix $A$ with its right-most column replaced with all 1's.

However, this approach doesn't work if $[f(P-I_n)]^{-1}$ doesn't exist, and Wikipedia doesn't explain how to find the limit in that case.

From the research I've done, I think the $Q=f(0_{n,n})[f(P-I_n)]^{-1}$ approach fails only when the Markov chain is reducible, and the solution is to break the transition matrix into irreducible parts by identifying its communicating classes. However, I'm only guessing; I haven't yet found a reference that spells out a general procedure for finding the limit of a Markov chain. Can anyone point me in the right direction?

P.S. My work problem involves identifying the crude oils flowing into an oil refinery, after flowing through various tanks, and after being labeled and re-labeled through various accounting systems. I have verified that the $Q=f(0_{n,n})[f(P-I_n)]^{-1}$ approach fails for actual Markov chains my application must handle.

P.P.S. I am aware that my question may be regarded as a duplicate of this question, but the only answer to that question is wrong, and after seven years I don't think a better answer is going to be posted there.

Edit:

In the comments, saulspatz asked why I was interested in the limit ($\lim_{k\to\infty} P^k$), and not the stationary distribution. My answer is too long for a comment, so I'm posting it here.

My Markov chain is really just a mapping from one set of crude oil identifying codes to another set of crude oil identifying codes. I need to reapply the mapping until all codes are resolved to the most primitive level. Each successive multiplication does not represent a step in time; it represents a conceptual reversal of one of the mixing or re-categorizing steps the material went through on its way from the well to the plant. In the end, $\lim_{k\to\infty} P^k$ is a translation table from all possible codes to the primitive codes I'm interested in. It will remain static for perhaps a month, and during that period I will be multiplying it by each day's record of the refinery receipts to get the data set for that day. Under this scenario, I believe $\lim_{k\to\infty} P^k$ is relevant, not the stationary distribution.

bridget
  • 33
  • With that clarification, all my previous comments are irrelevant, so I've deleted them. The answer you say is wrong is correct though; it's just the answer to a different question. How big are your matrices? – saulspatz Nov 21 '19 at 18:37
  • I expect the biggest one will be about 200×200, but very sparse. – bridget Nov 21 '19 at 18:47
  • It seems to me that this should be a simple question in linear algebra, for someone more conversant than I am, so I have added the linear-algebra tag to hopefully attract such a person. The eigenvalues have modulus at most $1$. If $P = AJA^{-1}$ then $P^n=AJ^nA^{-1}$ and all the Jordan blocks associated with eigenvalues of of modulus $<1$ go to $0$. See https://math.stackexchange.com/questions/326688/why-does-the-n-th-power-of-a-jordan-matrix-involve-the-binomial-coefficient It should be reducible to finding the eigenvalues of modulus $1$, and the associated eigenspaces. – saulspatz Nov 21 '19 at 19:09
  • For a finite-state Markov chain, if the limiting distribution $\lim_{k\to\infty} P^k$ exists, then it is equal to the (unique) stationary distribution $\pi$ that satisfies $\pi = \pi P$. – Math1000 Nov 21 '19 at 19:35
  • @BrianMoehring Oh, I was implicitly assuming that the chain was irreducible. If it is not, then you are correct. In that case, the limiting distribution $\lim_{k\to\infty} P^k$ may exist, but its rows will differ (meaning that the limiting probability $\lim_{k\to\infty}\mathbb P(X_n=k)$ depends on the initial distribution of $X_0$). – Math1000 Nov 21 '19 at 19:48

1 Answers1

1

A "brief" sketch of an algorithm:

  1. Determine the communicating classes.
  2. If one communicating class is not accessible from any other communicating class but accesses another communicating class, remove it. Continue to do this until no communicating class is accessible from another one.
  3. For each remaining communicating class, choose a state and determine the period of that state. If any state you choose has period greater than $1,$ then STOP: the limit $\lim_{k\to\infty} P^k$ doesn't exist. Otherwise, continue.
  4. Let $P'$ denote the transition matrix for the Markov subchain given by deleting each row and column of $P$ corresponding to a deleted state. Then conjugate by a permutation matrix $\sigma$ so that $$\sigma P' \sigma^{-1} = \text{diag}(C_1, C_2, \ldots, C_m)$$ is a block diagonal matrix where $C_i$ is the transition matrix for the $i^\text{th}$ communicating class $\mathcal{C}_i$.
  5. Note that by definition, $C_i$ is the transition matrix of an irreducible Markov chain, and we checked in step 3 that it is aperiodic, so the formula you found on wikipedia applies. Therefore, apply that formula to each transition matrix $C_i,$ giving $$\Pi_i := \lim_{k\to\infty} C_i^k$$ and then note that $Q' := \lim_{k\to\infty} (P')^k = \sigma^{-1}\text{diag}(\Pi_1, \Pi_2, \ldots, \Pi_m)\sigma$
  6. At this point, the rows of $Q = \lim_{k\to\infty} P^k$ are naturally identified by the rows of $Q'$ if the corresponding state was in the Markov subchain we created in step 2. Now we need to determine, for any state we removed in step 2, the probability $p_i$ of eventually ending up in $\mathcal{C}_i$ when starting in the given state. This can be done via collapsing each $\mathcal{C}_i$ to single absorbing "state" and then applying this formula to the resulting Markov chain. Once you do this, the row corresponding to this state will be $\sum_{i=1}^m p_i r_i$ where $r_i$ is a row of $Q$ corresponding to any fixed state in $\mathcal{C}_i$

A simple example:

Suppose $$P = \left(\begin{array}{ccccc} \frac{1}{2} & 0 & \frac{1}{2} & 0 & 0\\ 0 & 1 & 0 & 0 & 0 \\ \frac{1}{3} & 0 & \frac{2}{3} & 0 & 0 \\ \frac{1}{5} & \frac{1}{4} & \frac{1}{3} & \frac{1}{6} & \frac{1}{20} \\ 0 & \frac{1}{2} & 0 & \frac{1}{2} & 0 \end{array}\right)$$ where we number the states $1$ through $5.$ Then we can recognize the communicating classes as $\{1,3\}, \{2\}, \{4,5\}$ and since $4 \to 3$ but $x\not\to 4$ for any $x$ not in the communicating class of $4,$ we remove the whole class $\{4,5\}.$ By removing rows and columns $4,5,$ this gives the matrix $$P' = \left(\begin{array}{ccc} \frac{1}{2} & 0 & \frac{1}{2} \\ 0 & 1 & 0 \\ \frac{1}{3} & 0 & \frac{2}{3}\end{array}\right)$$ corresponding to the subchain on states $1,2,3,$ and each state is obviously aperiodic since every entry on the diagonal is nonzero. Permuting the second and third columns/rows means we conjugate by $\sigma = \left(\begin{array}{ccc} 1 & 0 & 0 \\ 0 & 0 & 1 \\ 0 & 1 & 0\end{array}\right)$ to give $$\sigma P'\sigma^{-1} = \left(\begin{array}{ccc} \frac{1}{2} & \frac{1}{2} & 0 \\ \frac{1}{3} & \frac{2}{3} & 0 \\ 0 & 0 & 1\end{array}\right)$$

Applying the formula to each block and then putting it together, $$\lim_{k\to\infty} \left(\begin{array}{cc} \frac{1}{2} & \frac{1}{2} \\ \frac{1}{3} & \frac{2}{3} \end{array}\right)^k = \left(\begin{array}{cc} \frac{2}{5} & \frac{3}{5} \\ \frac{2}{5} & \frac{3}{5} \end{array}\right) \\ \lim_{k\to\infty} (1)^k = (1) \\ \lim_{k\to\infty} (P')^k = \left(\begin{array}{ccc} \frac{2}{5} & 0 & \frac{3}{5} \\ 0 & 1 & 0 \\ \frac{2}{5} & 0 & \frac{3}{5}\end{array}\right)$$ so $$\lim_{k\to\infty} P^k = \left(\begin{array}{c}\begin{array}{ccccc} \frac{2}{5} & 0 & \frac{3}{5} & 0 & 0\\ 0 & 1 & 0 & 0 & 0 \\ \frac{2}{5} & 0 & \frac{3}{5} & 0 & 0\end{array} \\ \ldots r_4 \ldots \\ \ldots r_5 \ldots \end{array}\right)$$ where $$\begin{align*} r_i ={} &\mathbb{P}\left(X_k \in \{1,3\} \text{ eventually } | X_0 = i\right)\cdot\left(\begin{array}{ccccc}\frac{2}{5} & 0 & \frac{3}{5} & 0 & 0 \end{array}\right) \\ &+ \mathbb{P}\left(X_k \in \{2\} \text{ eventually } | X_0 = i\right)\cdot\left(\begin{array}{ccccc}0 & 1 & 0 & 0 & 0 \end{array}\right)\end{align*}$$

We may solve this for $r_4, r_5$ by first writing it as $$r_4 = \frac{1}{5}r_1 + \frac{1}{4}r_2 + \frac{1}{3}r_3 + \frac{1}{6}r_4 + \frac{1}{20}r_5 \\ r_5 = 0 r_1 + \frac{1}{2} r_2 + 0 r_3 + \frac{1}{2} r_4 + 0 r_5$$ and then solving this linear system of two equations and two unknowns by your favorite method to find $$r_4 = \frac{64}{97} r_1 + \frac{33}{97} r_2 = \left(\begin{array}{ccccc}\frac{128}{485} & \frac{33}{97} & \frac{192}{485} & 0 & 0 \end{array}\right)\\ r_5 = \frac{32}{97} r_1 + \frac{65}{97} r_2 = \left(\begin{array}{ccccc}\frac{64}{485} & \frac{65}{97} & \frac{96}{485} & 0 & 0 \end{array}\right)$$

Yielding the final answer of

$$\lim_{k\to\infty} P^k = \left(\begin{array}{ccccc} \frac{2}{5} & 0 & \frac{3}{5} & 0 & 0\\ 0 & 1 & 0 & 0 & 0 \\ \frac{2}{5} & 0 & \frac{3}{5} & 0 & 0 \\ \frac{128}{485} & \frac{33}{97} & \frac{192}{485} & 0 & 0 \\ \frac{64}{485} & \frac{65}{97} & \frac{96}{485} & 0 & 0\end{array}\right)$$


A [very] simple non-example:

If $P = \left(\begin{array}{cc}0 & 1 \\ 1 & 0\end{array}\right),$ then the corresponding Markov chain is irreducible, so $f(P-I_2) = \left(\begin{array}{cc}-1 & 1 \\ 1 & 1\end{array}\right)$ is invertible, but $$\lim_{k\to\infty} P^k \neq \left(\begin{array}{cc}\frac{1}{2} & \frac{1}{2} \\ \frac{1}{2} & \frac{1}{2} \end{array}\right) = f(0_{n,n})[f(P-I_2)]^{-1}.$$ This fails because the Markov chain is not aperiodic, so we cannot remove step 3 from the algorithm I gave.

  • I see that you've refined Step 6. While you were working on that, I was experimenting with a different approach: I think that if I restore the states removed in Step 2, I'll have a chain that resolves to its limit within a finite number of iterations. – bridget Nov 22 '19 at 18:37
  • @awlman Since my initial step 6 boiled down to "I'm pretty sure there is a method but I can't remember it," of course I wasn't happy to leave it that way. I just did one final edit to finish the middle section (the example of finding the limit) so I'm probably going to leave the whole answer as-is unless someone else gives feedback to update it. As for your different approach, I'm not sure what you mean by "resolves to its limit within a finite number of iterations" -- what are you iterating? It certainly won't give you the limit as one of $P^1, \ldots, P^N$ for a finite $N$ – Brian Moehring Nov 22 '19 at 20:34
  • As far as efficiency of the above algorithm, the largest time-sinks will probably be inverting the given matrices. You'll need to invert the $|\mathcal{C}_i|$-dimensional matrix $f(C_i-I)$ for each absorbing communicating class $\mathcal{C}_i$, and then you'll need to invert a matrix of the form $P_t - I$ where $P_t$ is the square sub-matrix of $P$ of all the transient states. Since $C_i$ and $P_t$ will be sparse in your application, there is probably an efficient numerical method to invert those matrices, but I don't have the background to even point you in the right direction for that – Brian Moehring Nov 22 '19 at 21:05
  • I'm still working on the implementation, but I have a question about steps 1 and 2. Some states are not members of any communicating class. Was it your intention that those states should be removed too? And if so, then can't steps 1 and 2 be summarized as "Identify the closed communicating classes"? – bridget Nov 25 '19 at 16:01
  • In reverse order: I wasn't aware of that notation, but yes, the only purpose of steps 1-2 is to identify the closed communicating classes. (I was just thinking of steps 1-3 in terms of digraph algorithms) As for your first question, we define the communication relation to be reflexive, so every state is a member of a communicating class even if the class is a singleton. If the state is absorbing (i.e. its singleton communicating class is closed), then you'd want to keep it. – Brian Moehring Nov 25 '19 at 16:58
  • I now have working code which implements the algorithm more or less as you described. I found it convenient to do Steps 1-3 simultaneously – it turns out to be efficient to calculate the period at the same time you figure out how the states fall into communicating classes. My code has passed all the tests I've tried so far, so I think you got the algorithm right. Thanks. – bridget Nov 30 '19 at 23:18