One can think of a set of $n$ observations as being an $n$-dimensional vector. We then have the Euclidean norm $\sqrt {\sum (y_i-\hat y_i)^2}$. Since minimizing the square root of a value is the same as minimizing the value (for positive numbers), it's simpler to talk of finding the least squares, rather than finding least root mean square.
Using $\sum (y_i-\hat y_i)^2$ over $\sqrt {\sum (y_i-\hat y_i)^2}$ has further advantages, such as that we can split $\sum y_i^2$ into the "explained" part $\sum (y_i-\hat y_i)^2$ and the "unexplained" part $\sum y_i^2-\sum (y_i-\hat y_i)^2$.
Once we have the Euclidean norm, many questions can be answered by looking at the geometry of the space. For instance, if we take the set of points obtained by $\hat y = mx+b$, this is a plane in the space. Finding the least squares means finding the point on this plane closest to the observation vector, which can be obtained simply by looking at the hyperplane perpendicular to that plane that goes through the observation vector, and seeing what point it intersects the plane, which is a simple linear algebra problem.