Absorbing Markov Chains: An efficient algorithmic approach

Question

Following this procedure I have successfully written a program to calculate the probability of ending in a given absorbing state given the initial state. The procedure is as follows:

Given the transition matrix (P), row and column swap until the identity matrix is in the bottom right corner.

$$ P = \begin{bmatrix} Q & R \\ 0 & I \end{bmatrix} $$

Calculate the matrix N, and multiply by R to give the final probability matrix (B).

$$N = (I-Q)^{-1}$$ $$B = N*R$$

Where the values in B represent the probability of moving from an initial non-absorbing state (rows) to a final absorbing state.

My question is if there is a more efficient method for solving this problem if I know what my initial state is. It seems wasteful to calculate the entire B matrix, given that I will only ever use one row in it.

I am writing a program to do this, so the matrix inversion step is particularly inefficient. Can I avoid this altogether?

score 3 · Accepted Answer · answered May 12 '17 at 02:08

There are several possible techniques.

Let $P^t$ denote the matrix $P$ raised to the $t$th power. Then $(P^t)_{i,j}$ (the $i,j$-th entry of that matrix) represents the probability that if you start in state $i$, you'll end in state $j$ after $t$ steps. If $t$ is sufficiently large, this is a good approximation to the limiting value as $t \to \infty$.

So, one possible technique is to choose a sufficiently large value of $t$, compute $P^t$, and then evaluate $(P^t)_{i,j}$. Notice that you can compute $P^t$ using $O(\lg t)$ matrix multiplications using the square-and-multiply algorithm, so it is possible to choose a very large value of $t$. Unfortunately, if $P$ is a $n\times n$ matrix, this will be slow: it will take $O(n^3 \log t)$ time, which is significant if $n$ is very large.

Another option is to use the power iteration method as used in the PageRank algorithm. Basically, you set the vector $x^0$ to be a vector that $1$ in its $i$th element and all zeros elsewhere. Then, you iteratively compute

$$x^{k+1} = P \cdot x^k.$$

After $t$ steps, you have computed $x^t$; now you read off the $j$th element in that, i.e., $(x^t)_j$, and that is the probability after $t$ steps. If $P$ is dense, each iteration takes $O(n^2)$ time, for a total of $O(n^2 t)$ time. That might be better than the previous method, but still not too encouraging if $n$ is very large. However, in many real applications, $P$ is sparse, and thus each iteration can be computed in $O(n)$ time; in this case, the method takes $O(nt)$ time, which can be quite efficient.

How large does $t$ need to be? In practice, often $t$ doesn't need to be very large. You can analyze how large $t$ needs to be, as a function of the gap between the first largest and second largest eigenvalue of $P$. In particular, if $\lambda_1$ denotes its first largest eigenvalue and $\lambda_2$ its second largest eigenvalue, for a Markov process we have $\lambda_1 = 1$ and $\lambda_2 \le 1$. Now effectively we need $t$ to be large enough that $\lambda_2^t$ is much smaller than $1$ (negligible compared to $1$). If $\lambda_2$ is not too close to $1$, then $t$ won't need to be very large. In particular, $t = O(\log(1/(1-\lambda_2)))$ iterations suffices.

As a result, the power iteration method can often be very effective and very efficient.

Absorbing Markov Chains: An efficient algorithmic approach

1 Answers1

Linked