$E = \int_{\Omega}\left \| \nabla u(x)\right \|^2 dx$
So, Dirichlet's energy measures the integral of the squared norm of the gradient. Why squared norm? What would we get if we use just a norm? It's still going to be non-negative.
If I calculated a $u(x)$ that minimizes $E = \int_{\Omega}\left \| \nabla u(x)\right \| dx$ (not squared) would it be worse than Dirichlet? My goal is exactly what I specify: the gradient at any point should have minimum length.
Is the reason for the squared norm minimisation equivalent to the role of the square in the least squares fit? (here)