0

I am trying to use the least squares regression to fit a curve to a table of values representing a sine wave. Similar to the question: Least squares regression of sine wave

Except I want to fit a more general formula of: $a * \sin(b*x + c) + d$

So given a function table I want to get the coefficients $a, b, c, d$ that most resemble the table given. Later I want to turn this into a computer program so I want to automatize all of the logic.

The regression formula (if that's the way to name it) I used: $Minimize(\sum_{i=1}^n (a* \sin(bx_i + c) + d - y_i)^2)$

Based on this I got these two formulas for $a$ and $d$: The sums here iterate over the elements in the function table.

$z_i = \sin(bx_i + c)$

$a = \frac{n\sum y_i z_i - (\sum y_i) * (\sum z_i)}{n\sum (z_i^2) - (\sum z_i)^2}$

$d = \frac{(\sum y_i) * (\sum z_i)^2 - (\sum z_i) * (\sum y_i * z_i)}{n \sum (z_i^2) - (\sum z_i)^2}$

I defined $b$ by the period (named $p$) which I will learn to calculate later:

$b = \frac{2\pi}{p}$

What I am now stuck at is the phase shift $c$. Which if I calculate in a similar fashion as $a$ and $b$ is stuck in this horrible formula: $0 = \sum (\cos(bx_i + c) * a * z_i + d * \cos(bx_i + c) - y_i * \cos(bx_i + c))$

This later turns into

$2\sum y_i*\cos(bx_i + c) = a*cos(2c)\sum \sin(2bx_i) + a*\sin(2c)*\sum \cos(2bx_i) + 4d*cos(c) \sum \cos(bx_i)$

If I use the regression formula: $Minimize(\sum(\sin^{-1}(\frac{y_i - d}{a})-bx_i - c)^2)$

I get $c = \frac{\sum \frac{\arcsin(\frac{y_i - d}{a}) - bx_i}{\sqrt{a^2 - (d - y_i)^2}}} {\sum \frac{1}{\sqrt{a^2 - (d - y_i)^2}}} $

Is there any way to define $c$ from this last equation and simplify the above mentioned equations?

Demo of the problem: https://www.desmos.com/calculator/iyacah4ors

1 Answers1

0

You have $n$ data points $(x_i,y_i)$ and you want to fit $$y=a \, \sin(b\,x + c) + d$$ which is highly linear.

If you expand the sine, it write $$y=a \sin (c) \cos (b x)+a \cos(c) \sin (b x)+d$$ Define $\alpha=a \sin (c)$ and $\beta=a \cos(c) $ $$y=\alpha \cos(bx)+\beta \sin(bx)+d$$ Define now $u_i=\cos(bx_i)$ and $v_i=\sin(bx_i)$

$$y=\alpha\, u+\beta\, v+d$$ For a given value of $b$, a two variable linear regression gives $\alpha(b)$, $\beta(b)$ and $d(b)$. In fact, using the normal equations, you have the explicit solution.

So, for this value of $b$ compute the corresponding sum of squares; run a few values of $b$ until you see a minimum. At that point, you have good estimates of $(a,b,c,d)$ and you can run safely a nonlinear regression.

You could even minimize the sum of squares using Newton with numerical derivatives.

For illustration purposes, I took the following data set from one of the good books published by @JJacquelin

$$\left( \begin{array}{cc} x & y \\ -1.983 & +0.936 \\ -1.948 & +0.810 \\ -1.837 & +0.716 \\ -1.827 & +0.906 \\ -1.663 & +0.247 \\ -0.815 & -1.513 \\ -0.778 & -1.901 \\ -0.754 & -1.565 \\ -0.518 & -1.896 \\ +0.322 & +0.051 \\ +0.418 & +0.021 \\ +0.781 & +1.069 \\ +0.931 & +0.862 \\ +1.510 & +0.183 \\ +1.607 & +0.311 \\ \end{array} \right)$$

and applied the proposed method.

For the preliminary step, the results are

$$\left( \begin{array}{cc} b & \text{Sum of Squares} \\ 1.6 & 2.81459 \\ 1.7 & 1.78226 \\ 1.8 & 0.94086 \\ 1.9 & 0.44053 \\ 2.0 & 0.32615 \\ 2.1 & 0.52575 \\ 2.2 & 0.92820 \\ \end{array} \right)$$

The minimum is clearly close to $b=2$. For this value, the estimates $$a=1.405 \qquad c=0.420 \qquad d=-0.398$$

Now, the nonlinear regression leads to $$a=1.410 \qquad b=1.981 \qquad c=0.417 \qquad d=-0.391$$

Notice that the "unconventional" method proposed by @JJacquelin, based on integral equations, lead to almost dentical results.