3

I am reading "Bayesian Reasoning And Machine Learning" and I'm doing exercise 23.3 (a) on p.490.

Here's the exercise:

Consider a HMM with 3 states $(M=3)$ and $2$ output symbols, with a left-to-right state transition matrix

$A = \begin{pmatrix}0.5 & 0 & 0 \\ 0.3 & 0.6 & 0 \\ 0.2 & 0.4 & 1 \end{pmatrix}$

where $A_{i,j}=p(h_{t+1}=i|h_t=j)$, emission matrix $B_{i,j}=p(v_t=i|h_t=j)$

$B=\begin{pmatrix} 0.7 & 0.4 & 0.8 \\ 0.3 & 0.6 & 0.2 \end{pmatrix}$

and initial state probability vector $a=(0.9, 0.1,0)^T$. Given the observed symbol sequence is $v_{1:3}=(1,2,1)$ compute $p(v_{1:3})$.

1 Answers1

1

Let $\ a_{ij}, b_{jv}\ $ be the entries in row $\ i\ $ and colummn $\ j\ $ of $\ A^T\ $, and row $\ j\ $ and column $\ v\ $ of $\ B^T\ $, respectively, and for $\ v=1,2\ $, let $\ C(v)\ $ the $\ 3\times3\ $ matrix whose entry in row $\ i\ $ and column $\ j\ $ is $\ a_{ij}b_{jv}\ $, and $\ \alpha(v)\ $ the $\ 3\times1\ $ column vector whose $\ i^\text{th}\ $ entry is $ a_ib_{iv}\ $. Then the probability that $\ v_1,v_2,\dots v_m\ $ are the first $\ m\ $ observed outputs is \begin{align} p(v_1,v_2,\dots,v_m)&=\sum_{h_1=1}^3\sum_{h_2=1}^3\dots\sum_{h_m=1}^3a_{h_1}b_{h_1v_1}a_{h_1h_2}b_{h_2v_2}a_{h_2h_3}\dots a_{h_{m-1}h_m}b_{h_mv_m}\\ &=\alpha(v_1)^TC(v_2)C(v_3)\dots C(v_m)\mathbb{1}\ , \end{align} where $\ \mathbb{1}\ $ is the $\ 3\times1\ $ column vector all of whose entries are $\ 1\ $.

For $\ t=2,\dots, m\ $, the row vectors $\ \alpha_t=\alpha(v_1)^T\prod _\limits{i=2}^tC(v_i)\ $ can be computed iteratively by \begin{align} \alpha_2&=a(v_1)^TC(v_2)\\ \alpha_t&=\alpha_{t-1}C(v_t)\ , \end{align} each step of which can be performed by multiplying a row vector by one of the matrices $\ C(v)\ $. The required probability can then be obtained by computing the dot product $\ \alpha_m\mathbb{1}\ $. For your matrices $\ A\ $, and $\ B\ $, you have \begin{align} C(1)&=\pmatrix{0.5\times0.7&0.3\times0.4&0.2\times0.8\\ 0&0.6\times0.4&0.4\times0.8\\ 0&0&1\times0.8}\\ &=\pmatrix{0.35&0.12&0.16\\ 0&0.24&0.32\\0&0&0.8}\ ,\\ C(2)&=\pmatrix{0.5\times0.3&0.3\times0.6&0.2\times0.2\\ 0&0.6\times0.6&0.4\times0.2\\ 0&0&1\times0.2}\\ &=\pmatrix{0.15&0.18&0.04\\ 0&0.36&0.08\\0&0&0.2}\ ,\\ \alpha(1)^T&=\pmatrix{0.9\times0.7&0.1\times0.4&0}\\ &=\pmatrix{0.63&0.04&0}\ ,\text{ and}\\ \alpha(2)^T&=\pmatrix{0.9\times0.3&0.1\times0.6&0}\ . \end{align} The probability that the first $\ 3\ $ observed outputs are $\ v_1=1, v_2=2 $ and $\ v_3=1\ $ is therefore \begin{align} &\alpha(1)^TC(2)C(1)\mathbb{1}\\ &=\pmatrix{0.63&0.04&0}\pmatrix{0.15&0.18&0.04\\ 0&0.36&0.08\\0&0&0.2}\pmatrix{0.35&0.12&0.16\\ 0&0.24&0.32\\0&0&0.8}\pmatrix{1\\1\\1}\\ &=\pmatrix{0.0945&0.128&0.0284}\pmatrix{0.63\\0.56\\0.08}\\ &=0.153823\ . \end{align}

  • Hi, so I calculated it using your method, and the answer is supposed to be $0.153823$ (I checked with the answer manual I found). You calculated your $G(i)$ matrices incorrectly. BTW, does this algorithm have a name? – Slim Shady Aug 12 '22 at 12:39
  • 1
    I see that I originally confused the row and column indices in calculating $\ G(i)\ $. I'm much more used to working with the convention where the transition matrix of the Markov chain is taken to be row stochastic rather than column stochastic. Apologies for the error, which I've now corrected, and thank you for picking it up. – lonza leggiera Aug 12 '22 at 14:22
  • Does this algorithm have a name? Seems like a neat way to calculate it. – Slim Shady Aug 12 '22 at 14:24
  • 1
    My $\ \gamma_t\ $ is just the $\ \alpha\big(h_t\big)\ $ appearing in your text's equation $(23.2.9)$, so my recursion for $\ \gamma_t\ $ is just your text's $\alpha$-recursion expressed in matrix form. This is commonly called the forward algorithm. Your text's $\beta$-recursion of equation $(23.2.18)$ can be expressed similarly in matrix form, and is commonly called the backward algorithm. – lonza leggiera Aug 12 '22 at 14:28
  • Do you mind showing the matrix form for $\beta$ in this question? (I made separate question so that because it's too much for a follow up, otherwise I'd ask here) – Slim Shady Aug 12 '22 at 15:18
  • There's not much to the $\ \beta\ $ calculation. It's just doing the matrix multiplication from the other end. I'm rather pressed for time at the moment, so it will probably be a few days before I can get to your other question. In the meantime, I've updated my answer above. My original choice of notation was a little unfortunate in not being very compatible with some of the common conventions used in expositions of hidden Markov models. The notation I've used in the update is somewhat more so. – lonza leggiera Aug 14 '22 at 08:49