Here is a proof based on a slightly easier result (where $H=1$) that I learned from a book by Dickson IIRC.
We can prove this inductively, the case $b=a$ being trivial. For the inductive step, build an $m\times n$ matrix, where the rows are indexed by the $m$ subgroups of order $p^b$ containing $H$, and the columns are indexed by the $n$ subgroups of order $p^{b+1}$ containing $H$. Call this matrix $A$, and let $a_{ij}$ be equal to $1$ if the corresponding column group contains the corresponding row group, and $a_{ij}=0$ otherwise. By induction, we're assuming $m\equiv1\pmod{p}$.
For every row group $K$, there are $1\pmod{p}$ column groups containing it. This is because $K$ is normal in each of them, and so we're just counting subgroups of order $p$ in $N_G(K)/K$. For every column group, there are $1\pmod{p}$ row groups that it contains. This is equivalent to counting the maximal subgroups that contain $H$, and a proof that there are $1\pmod{p}$ of them is given below.
Now we just add up the entries of the matrix in two different ways:
\begin{equation}
1\equiv m\equiv \sum_i\sum_j a_{ij}\equiv \sum_j\sum_i a_{ij}\equiv n\pmod{p}
\end{equation}
We still need to prove that for a group $P$ of order $p^{b+1}$ containing $H$, it has $1\pmod{p}$ maximal subgroups containing $H$. Since $\Phi(P)$ is contained in every maximal subgroup, this is equivalent to counting the maximal subgroups that contain $H\Phi(P)$. The key difference is this latter subgroup is normal in $P$, so we can count maximal subgroups in $P/H\Phi(P)$, which is an $\mathbb{F}_p$ vector space. It is well-known (and I can provide proof if need be) that the number of codimension-1 subpsaces of a finite $\mathbb{F}_p$ vector space is $1\pmod{p}$.
EDIT: Here is a proof of the codimension-1 claim above.
Assume $V$ has dimension $d$ over the finite field $\mathbb{F}_p$. Define the set
\begin{equation*}
X_V=\{(v_1,\ldots,v_{d-1})\in V^{d-1}\mid\dim(\langle v_1,\ldots,v_{d-1}\rangle)=d-1\}
\end{equation*}
of linearly independent lists from $V$ of length $d-1$. If $\mathcal{M}=\{M\le V\mid\dim(M)=d-1\}$ is the set of maximal subgroups (subspaces), there is a surjective map $f:X_V\rightarrow\mathcal{M}$ given by
\begin{equation*}
(v_1,\ldots,v_{d-1})\mapsto\langle v_1,\ldots,v_{d-1}\rangle
\end{equation*}
For a fixed $M\in\mathcal{M}$, note that $f^{-1}(M)=X_M$, the set of bases of $M$. Since $|X_M|$ only depends on $\dim(M)$, we see that $|f^{-1}(M)|=|f^{-1}(N)|$ for all $M,N\in\mathcal{M}$, and hence
\begin{equation*}
|\mathcal{M}| = \frac{|X_V|}{|X_M|} = \frac{\prod\limits_{i=0}^{d-2}(p^d-p^i)}{\prod\limits_{i=0}^{d-2}(p^{d-1}-p^i)} = \frac{p^d-1}{p-1}\equiv1\pmod{p}
\end{equation*}
The above proof works similarly (with $k$ replacing $d-1$) to count the $k$-dimensional subspaces of $V$.