1

I came across this post on how to approximate the absolute function $|x|$ using a smooth function.

I was wondering if it's possible to approximate the rectified linear function (ReLU) using a smooth function.

More specifically, I'm looking for an approximation of $f(x, a, b, L) = L + \max\left(b\left(x-a\right),\ 0\right)$, where $a, b, L > 0$, the gradient of which is not continuous at $x=a$. I'm looking for an approximation that works well within gradient based optimization algorithms, like Hamiltonian Monte Carlo.

I am also interested in the parameter values of $a, b$ after fitting this function through some data.

Thanks for any help!

2 Answers2

1

Your function $f(x,a,b,L) = L + \max(b(x-a),0)$ can be obtained from the basic "ramp" function $R(x) = \max(x,0)$ via shifting and rescaling: \begin{equation} f(x,a,b,L) = L + R(b(x-a)) \end{equation} Therefore it suffices to find a smooth approximation to $R(x)$, from which $f(x,a,b,L)$ can be obtained for any values of $a$, $b$, and $L$. $R(x)$ can be written as the integral of the Heaviside step function \begin{equation} H(x) = \begin{cases} 1 & x > 0\\ 0 & x < 0 \end{cases} \end{equation} (For the purposes of this answer, the value of $H(0)$ doesn't matter.) That is: \begin{equation} R(x) = \int_{-\infty}^x H(y)\, dy \qquad\qquad (1) \end{equation} Now, a useful approximation to $H(x)$ is \begin{equation} H(x) \approx \frac{1}{2}\left(1 + \tanh\left(\frac{x}{\epsilon}\right)\right)\, ,\qquad\qquad (2) \end{equation} where $\epsilon$ is some small positive number. Plugging this approximation into Eq. (1) yields: \begin{equation} R(x) \approx \int_{-\infty}^x \frac{1}{2}\left(1 + \tanh\left(\frac{y}{\epsilon}\right)\right)\, dy \;=\; \frac{1}{2}\left(x + \epsilon\log\left(2\cosh\left(\frac{x}{\epsilon}\right)\right)\right)\qquad (3) \end{equation} For gradient-based algorithms, this approximation of $R(x)$ has the advantage that its derivative is given simply by Eq. (2). Eq. (3) is plotted below for $\epsilon = 0.05$.

enter image description here

John Barber
  • 4,576
  • 2
    $R(x)$ itself can be expressed as $\frac12(x+|x|)$, so any of the smooth approximations of $|x|$ in the Q&A linked by the OP yields such an approximation of $R(x)$ (and thus of $f(x,a,b,L)$ as well). – Semiclassical Sep 17 '24 at 02:32
  • @Semiclassical Good point. – John Barber Sep 17 '24 at 02:40
  • @Semiclassical is there a way to adjust $\epsilon$ with $b$ that can upper bound the absolute error between the smooth approximation and the original function at $x=a$? What I notice with any approximation is that this error is large for very small values of $b$ – chesslad Sep 17 '24 at 15:11
  • 1
    @chesslad It depends on how you quantify the error. The max deviation between the exact $f(x,a,b,L)$ and the smoothed approximation provided by John Barber is $\epsilon\cdot \frac12\ln 2$. The error interval, by contrast, is of order $\epsilon/b$ in width. Finally, the integrated error (the area between the two curves) is $\pi^2 \epsilon^2/(24b)$. So if $b$ is small then we need $\epsilon^2$ to be even smaller to keep the integrated error down. – Semiclassical Sep 17 '24 at 17:15
  • @Semiclassical That is very helpful. Thanks! – chesslad Sep 18 '24 at 15:02
1

For the ramp function you could use as approximation for a real-valued $a>0$: $$f(x)=\begin{cases} \dfrac{x}{1-e^{-ax}},\quad x\neq 0\\ \dfrac{1}{a},\quad x=0 \end{cases}$$ which could be made as closed as desire by increasing the value of $a$. You could see its plot on this question:

animated plot

Joako
  • 1,957