9

I'm currently reading some papers about Markov chain lumping and I'm failing to see the difference between a Markov chain and a plain directed weighted graph.

For example in the article Optimal state-space lumping in Markov chains they provide the following definition of a CTMC (continuous time Markov chain):

We consider a finite CTMC $(\mathcal{S}, Q)$ with state space $\mathcal{S} = \{x_1, x_2, \ldots, x_n\}$ by a transition rate matrix $Q: \mathcal{S} \times \mathcal{S} \to \mathbb{R}^+$.

They don't mention the Markov property at all, and, in fact, if the weight on the edges represents a probability I believe the Markov property trivially holds since the probability depends only on the current state of the chain and not the path that lead to it.

In an other article On Relational Properties of Lumpability Markov chains are defined similarly:

A Markov chain $M$ will be represented as a triplet $(S, P, \pi)$ where $S$ is the finite set of states of $M$, $P$ the transition probability matrix indicating the probability of getting from one state to another, and $\pi$ is the initial probability distribution representing the likelyhood for the system to start in a certain state.

Again, no mention of past or future or independence.

There's a third paper Simple O(m logn) Time Markov Chain Lumping where they not only never state that the weights on the edges are probabilities, but they even say:

In many applications, the values $W(s, s')$ are non-negative. We do not make this assumption, however, because there are also applications where $W(s, s)$ is deliberately chosen as $-W(s, S \setminus \{s\})$, making it usually negative.

Moreover, it's stated that lumping should be a way to reduce the number of states while maintaining the Markov property (by aggregating "equivalent" state into a bigger state). Yet, to me, it looks like it's simply summing probabilities and it shouldn't even guarantee that the resulting peobabilities of the transitions to/from the aggregated states are in the range $[0,1]$. What does the lumping actually preserve then?

So, there are two possibilities that I see:

  • I didn't understand what a Markov chain is, or
  • The use of the term Markov chain in those papers is bogus

Could someone clarify the situation?

It really looks like there are different communities using that term and they mean widely different things. From these 3 articles that I'm considering it looks like the Markov property is either trivial or useless, while looking at a different kind of papers it looks fundamental.

Bakuriu
  • 807
  • 7
  • 19

2 Answers2

10

A Continuous-time Markov Chain can be represented as a directed graph with constant non-negative edge weights. An equivalent representation of the constant edge-weights of a directed graph with $N$ nodes is as an $N \times N$ matrix. The Markov property (that the future states depend only on the current state) is implicit in the constant edge weights (or constant entries in the matrix). Implicit means implied by. Mathematicians use it as a euphemism meaning, "you should prove it yourself."

But the first paper defines a notation consistent with a Continuous-time Markov Chain, sometimes called a Markov Process, while the second paper defines a notation consistent with a Discrete-time Markov Chain. They say

$P$ is the transition probability matrix indicating the probability of getting from one state to another, and $\pi$ is the initial probability distribution representing the likelihood for the system to start in a certain state. [emphasis added]

They are assuming that the matrix is constant over time (thus implying the Markov property). Implicit in the term probability is the fact that each constant is in range $[0,1]$, that the entries in every column of $P$ sum to $1$, and that the sum of the entries in $\pi$ sum to $1$.

I can't read the third paper, it is paywalled. If the entries in every column of the matrix are required to sum to 1 then they are probabilities and they are talking about Discrete-time Markov Chains. If the entries in every column can sum to an arbitrary number then the entries represent rates not probabilities and they are talking about Continuous-time Markov Chains.

Continuous-time Markov Chains are not the same as Discrete-time Markov Chains. In a Continuous-time Markov Chain the edge weights do not represent probabilities, but rather transition rates. The edge weights must be non-negative, but can be arbitrarily large, and the weights of the out-edges can sum to any non-negative number. The sum is not required to be $1$.

With both Continous-time and Discrete-time Markov Chains the Markov property is implied by the constant edge weights (or equivalently the constant entries in the transition matrix.)

Wandering Logic
  • 17,863
  • 1
  • 46
  • 87
8

Markov Chains come in two flavors: continuous time and discrete time.

Both continuous time markov chains (CTMC) and discrete time markov chains (DTMC) are represented as directed weighted graphs.

For DTMC's the transitions always take one unit of "time." As a result, there is no choice for what your weight on an arc should be-- you put the probability of going to "j" given that you're at "i."

For CTMC's, the transition time between any two states is necessarily given by an exponential random variable. This is the key difference between CTMC's and DTMC's: DTMC's always have unit transition time. CTMC's have random transition time.

For a CTMC, the convention is generally to put weights on an arc according to the rate of the exponential random variable going from the source to the destination. That is- the convention is to put rates on the arcs, not probabilities.

Negative Rates

Although all the CTMC's that I recall were represented with positive rates on the edges, negative rates do come up in CTMC analysis.

Say we're standing at state A, which is connected to B, C, and D as below.

A --> B (the rate into A from B is negative) A --> C (the rate into A from C is negative) D --> A (the rate into A from D is positive)

This is likely not quite what your paper is referring to; I bring it up to show that negative weights aren't necessarily ridiculous if someone was working with a suitable convention.

Markov Property

For DTMC's- you're right. The markov property is satisfied trivially. For CTMC's, the markov property is satisfied because the transitions are given by exponential random variables (which are "memoryless"). If the transitions were not given by exponential random variables (say instead they were uniform), then we would be talking about "Semi-Markov Chains" or "Semi-Markov Processes."

Riley
  • 81
  • 1