I'm tutoring a Calculus I student. We've discussed the power rule in previous weeks, and now we've come to the derivatives of exponential and logarithmic functions. I remember how strange it seemed to me, when I was in calculus, that $\ln(x)$ should have anything to do with power functions. "The power rule is mostly so very nice," I thought. "It says that the derivative of a power function is just another power function." If $f:\mathbb{R}\to\mathbb{R}$ is the function $f(x)=x^r$ for $r\in\mathbb{R}$, then the derivative of $f$ is especially simple: $f'(x)=rx^{r-1}$. Letting $r$ take integer values, we may obtain a table like the one that follows. All exponents, even 1 and 0, are included in order to emphasize the general rule relating $f$ to $f'$: \begin{align*} \begin{array}{|c|c|} \hline f(x) & f'(x) \\ \hline \hline x^3 & 3x^2\\ \hline x^2 & 2x^1\\ \hline x^1 & x^0\\ \hline ? & x^{-1}\\ \hline x^{-1} & -x^{-2}\\ \hline x^{-2} & -2x^{-3}\\ \hline \end{array} \end{align*}
In the midst of the otherwise seamless regularity of the power rule is an anomaly: What function has derivative $f'(x)=x^{r}$ when $r=-1$? The pattern established by the power rule would suggest that this function is $f(x)=x^{r+1}$ for $r=-1$. But that would make $f$ the constant function, in which case $f'$ is identically zero!
The power rule doesn't fail to produce $f'$ when $r=0$, of course. It's just that the power rule fails to produce a function whose derivative is the power function $x\mapsto \frac{1}{x}$.
There are simple ways to figure out what function should have derivative of the form $\frac{1}{x}$. For example, we can just apply the FTC I to the function $F(x)=\int^{x}_{1}\frac{1}{t}dt$. But I wanted to think through this anew, and find another approach. My question is the following: Is the following approach, in particular, my interpretation, or "sense-making" story below, valid? It's a helpful (and true, I think) narrative that makes sense to me. But I'd like another set of eyes on it. Is this sense-making story just fanciful nonsense? To ask the question another way: Should it make the sense it makes to me?
So, here's what I'm thinking. Let $f(x)=x^r$ for $r\in\mathbb{R}$. Differentiating $f$ with respect to $x$ demonstrates the power rule, but we want more than a demonstration: we want to understand why the power rule behaves as it does.
Here is what we'll do: for $0<a<b<\infty$, we define the one-parameter family of functions \begin{align*} \mathcal{F}=\{g_r\in C([a,b]): r\in\mathbb{R}\}, \end{align*} where $g_r(x)=x^r$. These are again the power functions, but indexing elements of this family by the exponent value, $r$, suggests a change in perspective: we are emphasizing the role of the exponent. After all, it is the value of $r$ that determines when something seems to go awry with the power rule. Evidently, we are interested in how this family of functions behave as we vary $r$---in particular, when $r=0$. The first derivative with respect to $r$ expresses the most salient (first-order) information about how $g_r$ varies as $r$ varies. Before we take this derivative, let us re-express $g_r$: \begin{align*} g_r(x)&=e^{r\ln(x)}\\ &=\sum^{\infty}_{n=0}\frac{(r\ln(x))^n}{n!}. \end{align*} This series converges uniformly on $[a,b]$ (which explains why each $g_r$ is defined on a compact interval, $[a,b]$) so in differentiating $g_r$ we may swap the order of differentiation and summation: \begin{align*} \frac{\partial}{\partial r}g_r(x)&=\frac{\partial}{\partial r}\Big(\sum^{\infty}_{n=0}\frac{(r\ln(x))^n}{n!}\Big)\\ &=\sum^{\infty}_{n=0}\frac{\partial}{\partial r}\Big(\frac{(r\ln(x))^n}{n!}\Big)\\ &=\sum^{\infty}_{n=1}\frac{n\ln(x)(r\ln(x))^{n-1}}{n!}\\ &=\ln(x)+\sum^{\infty}_{n=2}\frac{n\ln(x)(r\ln(x))^{n-1}}{n!}. \end{align*} Therefore, \begin{align*} \frac{\partial}{\partial r}g_r(x)\Big|_{r=0}=\ln(x), \end{align*} which we recognize as the function that fills the "gap" above in the power rule--the function whose derivative is the function $h(x)=\frac{1}{x}$.
Here is the sense-making story that helped guide my solution, and that explains (I think) why evaluating $\frac{\partial g_r}{\partial r}$ at the "problematic" value $r=0$ reveals the desired function. The family $\mathcal{F}$ exhibits a nice structure: a regularity in how $g_r$ and $g'_r$ relate that is nicely summarized by the power rule. The wrinkle in the otherwise seamless rule occurs at $r=0$. Now, the derivative $$\frac{\partial}{\partial r}g_r(x)$$ gives us a way to linearly interpolate between elements of this family. (See the footnote below.) In particular, if we evaluate this derivative at $r=0$, anchoring the derivative at this problematic point, then this derivative serves as a kind of bridge, spanning the gap between the elements of $\mathcal{F}$ where the nice structure is exhibited ($g_r$ for $r\neq 0$) and the point where the structure disappears ($g_r$ for $r=0$). Indeed, given that the derivative at $r=0$ extends an outstretched hand to immediately proximate points, $g_r$, that do exhibit the nice structure, we might even expect the derivative to confide information about what remains of this structure at $r=0$. And our hopes are fulfilled: the derivative $\frac{\partial g_r}{\partial r}$ evaluated at $r=0$ reveals the function, $\ln(x)$, that fills the "gap" in the power rule.
So, the natural log fills that gap in the table. But, boy does it look out of place admit all those power functions:
\begin{align*} \begin{array}{|c|c|} \hline f(x) & f'(x) \\ \hline \hline x^3 & 3x^2\\ \hline x^2 & 2x^1\\ \hline x^1 & x^0\\ \hline \ln(x) & x^{-1}\\ \hline x^{-1} & -x^{-2}\\ \hline x^{-2} & -2x^{-3}\\ \hline \end{array} \end{align*}
At this point, I wanted a story that helps me understand why $\ln(x)$ really isn't out of place in this table, and I thought that I could use the Weierstass Approximation to good effect. If $\{f_k\}^{\infty}_{k=1}$ is the sequence of power functions on $[a,b]$ given by $f_k(x)=x^{-1+\frac{1}{k}}$, then the functions $f_{k}$ converge uniformly on $[a,b]$ to $x^{-1}$. More generally, let \begin{align*} \mathcal{S}=\big\{f\in C([a,b]): \text{$f$ is the uniform limit of a sequence $\{p'_n\}^{\infty}_{n=1}$}\}, \end{align*} where $\{p'_n\}^{\infty}_{n=1}$ denotes a sequence of derivatives of polynomials, $p_n$, each defined on $[a,b]$. The reciprocal function $x\mapsto\frac{1}{x}$ is an element of $\mathcal{S}$.
I suspected that $\mathcal{S}$ may be dense in $C([a,b])$. In fact, $\mathcal{S}$ and $C([a,b])$ are equal as sets. For, if $F\in C([a,b])$ is $C^1$, then there exits a sequence of polynomials $\{p_n\}^{\infty}_{n=1}$ satisfying \begin{align*} p_n\rightrightarrows F \hspace{2mm} \text{and}\hspace{2mm} p'_n\rightrightarrows F', \end{align*} where the double arrows denotes uniform convergence in the $C^1$ norm. But every element $f\in C([a,b])$ is the derivative of a $C^1$ function, namely, its $C^1$ antiderivative, $F\in C([a,b])$. By the Weierstrass Approximation Theorem, this $F$ may be uniformly approximated by polynomials. And thus, $f$ may be uniformly approximated by the derivatives of these polynomials. Therefore, $f\in \mathcal{S}$, and thus $\mathcal{S}=C([a,b])$. For example, each function $p_n:[a,b]\to\mathbb{R}$ \begin{align*} p_n(x)=\int^{x}_{a}f_{k}(t)dt \end{align*} is a polynomial, with $p_n\rightrightarrows\ln(x)$. In this sense, the appearance of $\ln(x)$ amid the power functions (monomials) is not mysterious at all. However, if I'm thinking correctly, $x\mapsto\ln(x)$ (as defined on $[a,b]$) does evidently have the distinction of being the only non-power-function (on $[a,b]$) to arise as the uniform limit of a sequence of monomials.
Footnote: If $\phi:\mathbb{R}\to C([a,b])$ is the function $\phi(r)=x^r$, then the target space is complicated, but the domain is familiar territory. In particular, we may Taylor expand about $r\in\mathbb{R}$ to give $\phi(r+h)=\phi(r)+h\phi'(r)+O(h^2)$. By the Mean Value Theorem, $\phi(r+h)=\phi(r)+h\phi'(\xi)$ for some $\xi\in (r,r+h)$. That is, $x^{r+h}=x^r+hx^{\xi}\ln(x)$, since $\phi'(c)=x^c\ln(x)$ for $c\in\mathbb{R}$. This implicit talk of tangent lines and secant lines suggests a familiar geometric picture that belies the complexity of the infinite-dimensional function space, $C([a,b])$. But this is fine, since it is only the ambient space is complicated; the image $\phi(\mathbb{R})$ is a relatively simple one-dimensional sub-manifold of $C([a,b])$. That is, the family of functions, $\mathcal{F}$, is a curve in $C([a,b])$ parametrized by $r$. And we can visualize this curve, sort of: if $\Vert\cdot\Vert_{\sup}$ is the sup norm on $C([a,b])$ and $|\cdot|_{\max}$ is the max norm on $\mathbb{R}$, then the map $$\Phi:(\mathcal{F},\Vert\cdot\Vert_{\sup})\to (\mathbb{R}^2,|\cdot|_{\max})$$ defined by $$\Phi(g_r)=(\min_{x\in[a,b]}g_r(x),\max_{x\in[a,b]}g_r(x))=(a^r,b^r)$$ is evidently an isometry. Indeed, the image is a 1-dimensional sub-manifold of $\mathbb{R}$, the graph of the function $$f(x)=x^{\frac{\ln(b)}{\ln(a)}}.$$ I would have liked to have an isometric isomorphism between the two spaces, so that the metric structure as well as the vector space structure is preserved. But of course there is no isometric isomorphism, because the domain, $\mathcal{F}$, and target space, $\mathbb{R}^2$, have different dimensions.