I am looking into proofs of Q learning convergence. Specifically, I am looking at Jaakkola, Jordan and Singh's proof of Q learning convergence from their paper On the convergence of stochastic iterative dynamic programming algorithms.
The proof of a lemma (reproduced below) suggest that it is a trivial corollary of Dvoretzky's stochastic approximation theorem extension (from On Stochastic Approximation). However, I cannot see how this follows from Dvoretzky's theorem at all.
This question was previously asked here: Consequence of Dvoretzky Stochastic Approximation Theorem (with a full reproduction of Dvoretzky's theorem, too), but found no answers. I hoped that by using different tags, there might be someone out there who can supply an answer either to me or to the original questioner.
Jaakkola's Lemma
In the appendix, under proof of Theorem 1 on page 11, it contains the following lemma:
A random process $w_{n+1}(x)=(1-\alpha_n(x))w_n(x)+\beta_n(x)r_n(x)$ converges to 0 almost surely if the following conditions are satisfied:
$\Sigma_n \alpha_n(x)=\infty,\: \Sigma_n \alpha_n^2(x)<\infty,\: \Sigma_n \beta_n(x)=\infty,\: \Sigma_b \beta_n^2(x)< \infty$ uniformly, almost surely.
$\mathbb{E}[r_n(x)|P_n] = 0,\:\mathbb{E}[r_n^2(x)|P_n] < C$ almost surely, where $P_n=\{w_n,w_{n-1},...,r_n,r_{n-1},...\alpha_n,\alpha_{n-1},...,\beta_n,\beta_{n-1},...\}$.
All random variables can depend on $P_n$.