Here's an answer, which is in principle same as the previous answer, but in the language of projections, which often turn out to be very useful.
Let $A$ and $B$ be operators on the Hilbert space $V$. Let $S$ be a subspace of $V$, and let $P$ be the orthoprojection onto $S$. Recall $P = P^*$. It is easy to show that the following are equivalent.
- $S^\perp$ is invariant under A.
- $S$ is invariant under $A^*$.
- $PA^*P = A^*P$.
We will also use this definition of traingularization: $A$ has a traingularization if and only if $V$ has an orthonormal basis $y_1, \dots, y_n$ such that $Ay_j \in \text{span}[y_1, \dots, y_j]$ for $j = 1, \dots, n$.
Proof is by the induction on the dimension $n$ of the underlying space. Firstly, since $A$ and $B$ commute, (show) they share an (unit) eigenvector, say $y_1$. Now let $S = \text{span}[y_1]^\perp$ and let $P$ be the orthoprojection onto S. Since $\text{span}[y_1]$ is invariant under both $A$ and $B$, we have $$P A^* P = A^* P \text{ and } P B^* P = B^* P.$$
Now look at the operators $PA$ and $PB$. Note that
$$
\begin{align*}
(PAPB - PB PA)^* &= B^*P A^*P - A^*PB^*P\\
&= B^* A^*P - A^* B^* P\\
&= (AB - BA)^* P \\
&= 0.
\end{align*}
$$
So $PA$ and $PB$ commute, and they can be considered as operators from $S$ to $S$, which is an $n - 1$ dimensional space. By induction hypothesis, $S$ has an orthonormal basis $y_2, \dots, y_n$ which simultaneously triangularizes $PA$ and $PB$, that is, for $j = 2, \dots, n$
$$PAy_j \in \text{span}[y_2, \dots, y_j] \enspace \text{and} \enspace PBy_j \in \text{span}[y_2, \dots, y_j] .$$
Now $y_1, \dots, y_n$ is an orthonormal basis of $V$ and for $j = 1, \dots, n$
$$
Ay_j = (I - P) Ay_j + PAy_j \in \text{span}[y_1, \dots, y_j]
$$
since $I - P$ is a projection to $\text{span}[y_1]$. The same is true for $B$. This completes the proof.
The argument also works when you have a family $\{A_\alpha\}$ of commuting matrices, instead of a pair.