In the comments of Martin Brandenburg's answer to this old MO question Victor Protsak offers the following "1-line proof" of the Cayley–Hamilton theorem. Here $p_A(\lambda)$ is the characteristic polynomial.
Let $X = A - \lambda I_n$, then $p_A(\lambda) I_n = (\det X) I_n = X \operatorname{adj}(X)$ in the $n \times n$ matrix polynomials in $\lambda$, now specialize $\lambda \to A$, get $p_A(A) = 0$.
I think this proof is not quite complete as written and requires at least one more line. The "specialize $\lambda \to A$" step, as written, looks a lot like the standard "tempting but incorrect" proof of Cayley–Hamilton. The issue is that in this calculation we are working with, equivalently, either matrices with polynomial entries $M_n(K[\lambda])$, or polynomials with matrix coefficients $M_n(K)[\lambda]$. Naïvely, "specialize $\lambda \to A$" means applying some kind of evaluation homomorphism $M_n(K)[\lambda] \to M_n(K)$ sending $\lambda$ to $A$. But this is not a homomorphism in general, and in particular is not multiplicative, due to lack of commutativity. So, very explicitly, if $f(\lambda) = \sum F_i \lambda^i \in M_n(K)[\lambda]$ and $g(\lambda) = \sum G_i \lambda^i \in M_n(K)[\lambda]$ are two matrix polynomials, and we interpret "specialize $\lambda \to A$" to mean $f(A) = \sum F_i A^i \in M_n(K)$ and $g(A) = \sum G_i A^i \in M_n(K)$, then $f(A) g(A) \neq fg(A)$ in general, where $fg$ refers to the product of matrix polynomials (which involves treating $\lambda$ as central).
A different way to interpret this specialization is to instead consider the (commutative) subalgebra $K[A] \subset M_n(K)$ generated by $A$, and interpret the specialization as applying the evaluation homomorphism $K[\lambda] \to K[A]$ to a matrix with polynomial coefficients to get a matrix in $M_n(K[A])$. This specialization is a homomorphism, but it doesn't send $X$ to $0$! This is clarified if we write $M_n(K[\lambda]) \cong M_n(K)[\lambda]$ explicitly as a tensor product $M_n(K) \otimes K[\lambda]$, in which case
$$X(\lambda) = A \otimes 1 - I_n \otimes \lambda \in M_n(K) \otimes K[\lambda]$$
is getting specialized to
$$X(A) = A \otimes 1 - I_n \otimes A \in M_n(K) \otimes K[A].$$
Q1: Am I correct that this proof is incomplete or at least ambiguous as written?
Wikipedia appears to explain a way to complete this proof, which I'd describe as follows:
The point is that we actually do have an evaluation homomorphism for the matrices which appear in this argument. Because $X = A - \lambda I_n$ commutes with its adjugate $\operatorname{adj}(X) = \text{adj}(A - \lambda I_n)$, $A$ commutes with all the coefficients of the matrix polynomial $\operatorname{adj}(X)$ when expanded out in powers of $\lambda$. That means this computation isn't happening in the full $M_n$ but in the smaller centralizer $Z_{M_n}(A) \subset M_n(K)$. So, we can interpret the identity $p_A(\lambda) I_n = X \operatorname{adj}(X)$ as an identity in $Z_{M_n}(A)[\lambda]$, and now we really do have an evaluation homomorphism
$$Z_{M_n}(A)[\lambda] \ni f(\lambda) \mapsto f(A) \in Z_{M_n}(A)$$
because $A$ commutes with all the coefficients of the matrix polynomials involved. Applying this evaluation homomorphism gives us an identity
$$p_A(A) = (A - A) \operatorname{adj}(X(A)) = 0 \in Z_{M_n}(A)$$
as desired (the notation $\operatorname{adj}(X(A))$ is a little unfortunate but I couldn't think of anything better; this means taking the matrix polynomial $\operatorname{adj}(X) \in Z_{M_n}(K)[\lambda]$, then evaluating it at $A$). Note that the identity matrix on the LHS has disappeared; we evaluated the matrix polynomial $p_A(\lambda) I_n \in Z_{M_n}(A)[\lambda]$ at $\lambda = A$ and we get the ordinary product $p_A(A) I_n = p_A(A) \in Z_{M_n}(A)$, rather than the tensor product above. Similarly this is why the identity matrix in $A - \lambda I_n$ has disappeared.
Q2: Is this a correct completion of Victor Protsak's argument or have I misunderstood something? Have I overcomplicated the situation or is it really necessary to say all this?
To be clear, I think this completed proof is quite a nice proof of Cayley–Hamilton, probably my new favorite. It also seems to me unusually confusing and fraught with both notational and conceptual issues (I gather that I'm not alone in this based on the comments in that MO discussion) so I want to be sure I've understood what's going on carefully, and in particular I want to be clear on exactly where each of the expressions in the proof live.