Your calculation is almost right, up to the point where you made the huge mistake of thinking that
\begin{align}
\dfrac{\partial}{\partial \theta} = \mathbf{e}_{\theta}
\end{align}
This is completely wrong, because the vector on the RHS by definition is the normalized version of the one on the left.
Let's go through it step by step (even though you got it right for the most part). By definition we have
\begin{align}
\text{grad}(f) := g^{\sharp}(df)
\end{align}
And if we work in a chart $(U,x)$, then
\begin{align}
\text{grad}(f) &:= g^{\sharp}(df) \\
&= g^{\sharp}\left( \dfrac{\partial f}{\partial x^i}\, dx^i\right) \\
&= \dfrac{\partial f}{\partial x^i} \cdot g^{\sharp}\left(dx^i\right) \\
&= \dfrac{\partial f}{\partial x^i} \cdot g^{ij}\dfrac{\partial }{\partial x^j} \tag{$*$}
\end{align}
Where, I use the notation $g_{ij} := g\left(\frac{\partial}{\partial x^i}, \frac{\partial}{\partial x^j}\right)$, and $[g^{ij}]$ denotes the inverse matrix of $[g_{ij}]$. For polar coordinates $(r,\theta)$ in the plane (more precisely on a certain open subset of $\Bbb{R}^2$), we have
\begin{align}
[g_{ij}] =
\begin{pmatrix}
g_{rr} & g_{r\theta}\\
g_{\theta r} & g_{\theta \theta}
\end{pmatrix} =
\begin{pmatrix}
1 & 0 \\
0 & r^2
\end{pmatrix}
\end{align}
where for convenience rather than writing $g_{11}, g_{12}$ etc, I used the notation $g_{rr}, g_{r\theta}$. Now, the inverse matrix is easily calculated because it is diagonal:
\begin{align}
[g^{ij}] =
\begin{pmatrix}
g^{rr} & g^{r\theta}\\
g^{\theta r} & g^{\theta \theta}
\end{pmatrix} =
\begin{pmatrix}
1 & 0 \\
0 & 1/r^2
\end{pmatrix}
\end{align}
Now, what you have to do is use the formula $(*)$ exactly as written. If we apply it directly then we find
\begin{align}
\text{grad}(f) &= g^{rr}\dfrac{\partial f}{\partial r}\dfrac{\partial }{\partial r} + g^{\theta \theta}\dfrac{\partial f}{\partial \theta}\dfrac{\partial }{\partial \theta} \\
&= \dfrac{\partial f}{\partial r}\dfrac{\partial }{\partial r} + \dfrac{1}{r^2}\dfrac{\partial f}{\partial \theta}\dfrac{\partial }{\partial \theta} \tag{$**$}
\end{align}
This formula is $100\%$ correct, and it DOES NOT contradict what you may have seen in standard vector analysis texts. To get the "usual" formula, we have to see how $\mathbf{e}_r, \frac{\partial}{\partial r}, \mathbf{e}_{\theta}, \frac{\partial}{\partial \theta}$ are related to each other. By definition, the $\mathbf{e}$'s are the normalized versions, which means
\begin{align}
\mathbf{e}_r &:= \dfrac{\frac{\partial}{\partial r}}{\lVert \frac{\partial}{\partial r}\rVert} \quad \text{and} \quad \mathbf{e}_{\theta} := \dfrac{\frac{\partial}{\partial \theta}}{\lVert \frac{\partial}{\partial \theta}\rVert}
\end{align}
So, what is the norm of a vector? By definition, it is the square root of the inner product of the vector with itself; i.e $\lVert v\rVert := \sqrt{\langle v,v \rangle} = \sqrt{g(v,v)}$, where the last equality is simple a notational change (recall that the metric tensor $g$ is precisely an inner product on each tangent space $T_pM$ of your manifold... which in this case is $M = \Bbb{R}^2$). So, we have
\begin{align}
\begin{cases}
\mathbf{e}_r &:= \dfrac{\frac{\partial}{\partial r}}{\sqrt{g\left( \frac{\partial}{\partial r},\frac{\partial}{\partial r}\right)}} = \dfrac{1}{\sqrt{g_{rr}}}\dfrac{\partial}{\partial r} = \dfrac{\partial}{\partial r}\\
\mathbf{e}_{\theta} &:= \dfrac{\frac{\partial}{\partial \theta}}{\sqrt{g\left(\frac{\partial}{\partial \theta},\frac{\partial}{\partial \theta}\right)}} = \dfrac{1}{\sqrt{g_{\theta\theta}}}\dfrac{\partial}{\partial \theta} = \dfrac{1}{r}\dfrac{\partial}{\partial \theta}
\end{cases}
\end{align}
If you now make these substitutions into $(**)$, you find exactly that
\begin{align}
\text{grad}(f) &= \dfrac{\partial f}{\partial r} \mathbf{e}_r + \dfrac{1}{r}\dfrac{\partial f}{\partial \theta} \mathbf{e}_{\theta} \tag{$***$}
\end{align}
By the way, when you asked "why is the norm of $\frac{\partial}{\partial \theta}$ is $r$", it's not clear to me whether your confusion is regarding why $[g_{ij}] = \text{diag}(1,r^2)$, or simply what the relationship between the norm and inner product (i.e the metric tensor field) is. If you need more clarification let me know.
Finally, on a more general note, let's go back to $n$ dimensions. We once again define $\mathbf{e}_j$ to be the normalized vector corresponding to $\frac{\partial}{\partial x^j}$, i.e
\begin{align}
\mathbf{e}_j &= \dfrac{1}{\sqrt{g\left(\frac{\partial}{\partial x^j}, \frac{\partial}{\partial x^j}\right)}} \frac{\partial}{\partial x^j} = \dfrac{1}{\sqrt{g_{jj}}}\frac{\partial}{\partial x^j}
\end{align}
If we now plug this into $(*)$, then we see that the gradient vector field, when written in terms of the normalized coordinate vector field (i.e the $e_{j}$'s) is
\begin{align}
\text{grad}(f) &= \sum_{i,j=1}^n g^{ij}\sqrt{g_{jj}}\dfrac{\partial f}{\partial x^i} \, \mathbf{e}_j\tag{$*'$}
\end{align}
This formula above is entirely equivalent to $(*)$. Now let's specialize slightly, just for fun. Suppose the coordinate vector fields are orthogonal (i.e $g_{ij} = 0$ if $i\neq j$). Then, the inverse matrix $[g^{ij}]$ is easily calculated to be $\text{diag}(1/g_{11}, \dots, 1/g_{nn})$, and in this special case, the gradient reduces to:
\begin{align}
\text{grad}(f) &= \sum_{i=1}^n \dfrac{1}{\sqrt{g_{ii}}}\dfrac{\partial f}{\partial x^i} \, \mathbf{e}_i
\end{align}
Now, once again, as a sanity check try applying this to the Polar coordinate case, and you should recover $(***)$.