This idea is based on @PiotrHughes' answer, which unfortunately does not provide good results even for the simple case of a linear function.
Idea: The idea is following: We want to compute the gradient at $P_0 = (x_0,y_0)$ which has edges to $P_i = (x_i,y_i)$ for $i=1,2,3,...n$. Let us consider the unit vectors $d_i$ pointing from $P_0$ to $P_i$, so let $r_i = P_i - P_0$ and $d_i = \frac{1}{||r_i||}(r_i)$.
Then the scalar product $\nabla f(x_0,y_0) \cdot d_i$ is the directional derivative in the direction $d_i$. These directional derivatives can be approximated by $$\nabla f(x_0,y_0) \cdot d_i \approx \frac{f(x_i,y_i)-f(x_0,y_0)}{||r_i||}$$
This gives us an equation for each edge, and if we have at least two non-colinear edges we can solve this system for $\nabla f(x_0,y_0)$ using least squares.
Implementation:
We can simplify this system to
$$\begin{bmatrix} x_1-x_0 & y_1-y_0 \\ x_2-x_0 & y_2-y_0 \\ \vdots & \vdots \\ x_n - x_0 & y_n - y_0 \end{bmatrix} \nabla f(x_0,y_0) = \begin{bmatrix} f(x_1,y_1) - f(x_0,y_0) \\ f(x_2,y_2) - f(x_0,y_0) \\ \vdots \\ f(x_n,y_n) - f(x_0,y_0) \end{bmatrix}$$
Convergence: The only approximation we made is the use of the finite differences for the approximation of the directional derivatives. Assuming we have a regular and quasi uniform family of meshes these should converge with the known order. And it certainly provides exact results if $f$ is linear.
EDIT: Equivalently we can rewrite the equation from above as
$$\begin{bmatrix} x_1-x_0 & y_1-y_0 & 1\\
x_2-x_0 & y_2-y_0 & 1\\
\vdots & \vdots &\vdots \\
x_n - x_0 & y_n - y_0 & 1\end{bmatrix} \begin{bmatrix} \nabla f(x_0,y_0) \\ f(x_0,y_0)\end{bmatrix}= \begin{bmatrix} f(x_1,y_1) \\ f(x_2,y_2) \\ \vdots \\ f(x_n,y_n) \end{bmatrix}$$
Now it becomes very apparent that we're just fitting a plane through the surrounding points. Which was what I suggested as a comment here.