Hidden Markov Model - why backward probability is conditional on the current state

Question

I'm trying to understand hidden Markov model (HMM). Here is the material which I studied.
It states that there are two assumptions in HMM (page 3):

$P( q_i | q_1, ..., q_{i-1} ) = P( q_i | q_{i-1} )$
$P(o_i | q_1, ..., q_T, o_1, ..., o_T) = P(o_i | q_i)$

where $q_i$ denotes the i'th state, $o_i$ denotes the i'th observation, T is the length of sequence.

And in forward-backward algorithm, we need to define the backward probability (page 12, eq A.15):

$\beta_t(i) = P(o_{t+1}, ..., o_{T} | q_t=i)$

Here explains that backward probability is "the probability of emitting the remaining sequence from t+1 until the end of time after being at hidden state i at time t".

My question is about the assumption (2) and $\beta_t(i)$. Assumption (2) says that the i'th observation only depends on the i'th state. Backward probability only considers $o_{t+1}, ..., o_T$, so they only depend on $q_{t+1}, ..., q_T$, right? Thus I don't know why conditional on $q_i=i$ is needed in backward probability.
In other words, why can't we state that:

$\beta_t(i)=P(o_{t+1}, ..., o_{T} | q_t=i) = P(o_{t+1}, ..., o_{T})$

Very thanks!

Assumption (2) doesn't make sense to me, since it seems to condition on $o_i$. That's like $P(X=x |X=x, Y=y)$. But as for why we cannot state what you have written, I think if the probability was conditioned on information that included those future states, then the sequence of outputs would be conditionally independent of that past state, but without conditioning on information that includes those future states, the probability of that sequence of outputs is not independent of that past state because the future states are not independent of that past state. — Joe, Aug 02 '21 at 17:39
@Joe Did You mean that "$P(o_i|q_i, q_{i-1})=P(o_i|q_i)$, but $P(o_i|q_{i-1}) \neq P(o_i)$ because $q_{i-1}$ would affect $q_i$ and also affect $o_i$"? Sounds like sufficient statistic, $o_i$ has totally determined by $q_i$ so that other states are not needed when $q_i$ is given. — Chun-Ye Lu, Aug 02 '21 at 17:54
Yes. I'm not familiar with HMMs, but that's what assumption 2 means to me: conditionally independent, not independent. — Joe, Aug 02 '21 at 20:55
@Joe OK thanks, I understand. I can't not accept a comment but if you write down the answer I can accept it. — Chun-Ye Lu, Aug 03 '21 at 06:44
I think it would be better to have someone who knows about HMMs give a complete answer. — Joe, Aug 03 '21 at 11:12

lonza leggiera · Accepted Answer · 2021-08-23T06:52:57.873

Assumption $1$ should be $$ P\big(q_i\,\big|\,q_1,q_2,\dots,q_{i-1},o_1,o_2,\dots,o_{i-1}\,\big)=P\big(q_i\,\big| ,q_{i-1}\,\big)\ . $$ That is, the distribution of the hidden state at time $\ i\ $ given all the preceding hidden states ,and all the preceding observations, depends only on the preceding state. Assumption $2$ should be $$ P\big(o_i\,\big|\,q_1,q_2,\dots,q_T,o_1,\dots,o_{i-1},o_{i+1},\dots,o_T\,\big)=P\big(o_i\,\big|q_i\,\big)\ . $$ That is, the distribution of the observation at time $\ i\ $ given all the hidden states and all the other observations depends only on the hidden state at time $\ i\ $. In my experience, assumption $2$ is more commonly stated in the form $$ P\big(o_i\,\big|\,q_1,q_2,\dots,q_i,o_1,\dots,o_{i-1}\,\big)=P\big(o_i\,\big|\,q_i\,\big)\ . $$ Although this may appear to be a weaker assumption at first sight, it's in fact equivalent, given assumption $1$.

You're quite correct that given $\ q_1,q_2,\dots,q_T\ $, the distribution of $\ o_{t+1},o_{t+2}, \dots, o_T\ $ depends only on $\ \ q_{t+1},q_{t+2}, \dots, q_T\ $. The key to understanding the dependence of $$ \beta_t(i)=P\big(o_{t+1},o_{t+2},\dots,o_T\,\big| \,q_t=i\,\big) $$ on $\ i\ $, however, is that you're not given any of $\ \ q_{t+1},q_{t+2}, \dots, q_T\ $ in the conditioning event—you're only given $\ q_t\ $— and the distribution of the subsequent hidden states, $\ \ q_{t+1},q_{t+2}, \dots, q_T\ $, depends on the value of $\ q_t\ $. The dependence of $\ \beta_t(i)\ $ on $\ i\ $ comes from the fact that the distribution of $\ o_{t+1},o_{t+2}, \dots, o_T\ $ depends on $\ \ q_{t+1},q_{t+2}, \dots, q_T\ $ and these latter states pass their dependence on the value of $\ q_t\ $ through to that distribution.

Thanks for the clear explanation! btw, should it be $P(o_i | q_{i-1})$ (instead of $q_i$) in the right hand side of the third equation? — Chun-Ye Lu, Aug 23 '21 at 06:08
No. Although the equation is incorrect as it stands, and I think your suggested correction would turn it into one that's correct, I'm not sure it would then be equivalent to the preceding one. The real error was a missing $\ q_i\ $ on the left side of the equation, which I've now corrected. Thanks for picking it up. — lonza leggiera, Aug 23 '21 at 06:58

Hidden Markov Model - why backward probability is conditional on the current state

1 Answers1