How to derive the least squares solution for linear regression?

Question

We have $n$ dots: $(x_1,y_1)\cdots (x_n,y_n)$.
We know that if we use the Least squares method we will get a line $y=mx+b$ that giving the minimal value for the function $w=\sum_{i=1}^n (mx_i+b-y_i)^2$.

I need to prove that $w$ maintains: $$b=\frac{\sum_{i=1}^n x_i^2\sum_{i=1}^ny_i -\sum_{i=1}^nx_i\sum_{i=1}^nx_iy_i}{n(\sum_{i=1}^nx_i^2)-(\sum_{i=1}^nx_i)^2 }$$ and: $$\\m=\frac{n(\sum_{i=1}^nx_iy_i-\sum_{i=1}^nx_i \sum_{i=1}^ny_i}{n\sum_{i=1}^nx_i^2-(\sum_{i=1}^nx_i)^2}$$

I'm really don't know how to begin, and I'd like to get any help.
Thank you!

you simply diffentiate w with respect to 'b' and 'm' and equate them to zero , so you get two equations. Solving these two equations simultaneously you get the above mentioned b and m. I hope this answers your question. — SA-255525, May 13 '14 at 08:32
@StefanHansen, I didn't understand the answer at your link... — CS1, May 13 '14 at 08:43
@SA-255525, and hoe do I differentiate $w$ with $b$ for example? — CS1, May 13 '14 at 08:45
for anyone else looking for a proof with a bit more explanation, this pdf helped me most for grasping the concept behind the proof. *credits go the author of the document, Sean Rule @ Central Oregon Community College — lolcatzftw, May 01 '20 at 18:06
Hello @CS1: I have updated my answer providing detailed explanation for the requested proof below. Kindly either provide me feedback or else up-vote the answer if you feel that it is now complete. Hopefully the answer provides everything that you need now. Thank you once again for your question that has the answer results in it - showing that the result is indeed correct! — Stephen Elliott, Jan 10 '24 at 09:24

SA-255525 · Answer 1 · 2014-05-13T11:11:23.623

2

See if this can be any help, i have shown partial derivatives w.r.t. 'b' and 'm'.you solve these two equations simultaneously to get b and m. png

png

I hope this will clear a few things for you. here are some new hints to help you out. png

edited May 13 '14 at 11:11

answered May 13 '14 at 09:29

SA-255525

197

Thank you!! I'll try myself and if I'll stack I'll ask! :-) – CS1 May 13 '14 at 10:01
What do you do with the $b$ at the 2nd equation? – CS1 May 13 '14 at 10:15
I multiplied x with all the terms inside the bracket in second equation (the one at the top). – SA-255525 May 13 '14 at 10:19
I still not understand - what do you do to get $b$? multiplying the 2nd eq. by $x$ doesn't help... Thank you! – CS1 May 13 '14 at 10:33
I'm still stack with finding $b$.... Can you give me some clues? Thank you!! – CS1 May 13 '14 at 10:42
do let me know if you understood . if not i will try and go throw it once more. – SA-255525 May 13 '14 at 11:15

score 0 · Answer 2 · answered Mar 05 '17 at 06:09

This post is an adjunct to the solution of @SA-255525, who solved the problem via the calculus. Another option is to use linear algebra.

Start with a crisp definition. We have a sequence of measurements of the form $\left\{ x_{k}, y_{k} \right\}_{k=1}^{m}$ and the trial function is $y(x) = c_{0} + c_{1}x$.

This implies the linear system $$ \begin{align} \mathbf{A} c &= y \\ \left[ \begin{array}{ccc} 1 & x_{1} \\ 1 & x_{2} \\ \vdots & \vdots \\ 1 & x_{m} \end{array} \right] \left[ \begin{array}{c} c_{0} \\ c_{1} \end{array} \right] &= \left[ \begin{array}{} y_{1} \\ y_{2} \\ \vdots \\ y_{m} \end{array} \right]. \end{align} $$ By definition the least squares solution $c$ minimizes the sum of the squares of the residuals given by $$ \left[ \begin{array}{c} c_{0} \\ c_{1} \end{array} \right] _{LS} = \left\{ \left[ \begin{array}{c} c_{0} \\ c_{1} \end{array} \right] \in \mathbb{R}^{2} \colon \lVert \mathbf{A} c - y \rVert_{2}^{2} \text{ is minimized} \right\} $$ One solution path uses column vectors. The column structure of the system matrix is $$ \mathbf{A} = \left[ \begin{array}{cc} \mathbf{1} & x \end{array} \right]. $$ Using the normal equations $\mathbf{A}^{\mathrm{T}}\mathbf{A}c = \mathbf{A}^{\mathrm{T}} y$, the least squares solution is $$ c_{LS} = \left( \mathbf{A}^{\mathrm{T}}\mathbf{A} \right)^{-1} \mathbf{A}^{\mathrm{T}} y. $$ This is the same solution found earlier. Note that $$ \mathbf{A}^{\mathrm{T}} y = \left[ \begin{array}{c} \mathbf{1}\cdot y\\ x \cdot y \end{array} \right], \qquad \mathbf{A}^{\mathrm{T}}\mathbf{A} = \left[ \begin{array}{cc} \mathbf{1}\cdot\mathbf{1} & \mathbf{1}\cdot x \\ x \cdot \mathbf{1} & x \cdot x \end{array} \right], \qquad \left( \mathbf{A}^{\mathrm{T}}\mathbf{A} \right)^{-1} = \left( \det \mathbf{A}^{\mathrm{T}}\mathbf{A} \right)^{-1} \left[ \begin{array}{rr} x \cdot x & -\mathbf{1}\cdot x \\ -x \cdot \mathbf{1} & \mathbf{1}\cdot\mathbf{1} \end{array} \right]. $$ To connect the two notations use $$ \det \mathbf{A}^{\mathrm{T}}\mathbf{A} = \left( \mathbf{1}\cdot\mathbf{1} \right) \left( x \cdot x \right) - \left( \mathbf{1}\cdot x \right)^{2}, $$ and $$ \mathbf{1}\cdot\mathbf{1} = \sum_{k=1}^{m}(1) = m, \quad \mathbf{1}\cdot x = x\cdot\mathbf{1} = \sum_{k=1}^{m}x_{k}, \quad \mathbf{1}\cdot y = \sum_{k=1}^{m} y_{k}, \quad x \cdot y = \sum_{k=1}^{m} x_{k} y_{k}. $$

Stephen Elliott · Answer 3 · 2024-01-10T09:47:14.557

Abstract

The problem statement has:

$y=mx+b$ that giving the minimal value for the function $w=\sum_{i=1}^n (mx_i+b-y_i)^2$.

I need to prove that $w$ maintains: $$b=\frac{\sum_{i=1}^n x_i^2\sum_{i=1}^ny_i -\sum_{i=1}^nx_i\sum_{i=1}^nx_iy_i}{n(\sum_{i=1}^nx_i^2)-(\sum_{i=1}^nx_i)^2 }$$ and: $$\\m=\frac{n(\sum_{i=1}^nx_iy_i-\sum_{i=1}^nx_i \sum_{i=1}^ny_i}{n\sum_{i=1}^nx_i^2-(\sum_{i=1}^nx_i)^2}$$

In the problem statement, $m$ is the slope and $b$ is the intercept.

Hence, in the question:

$$\text{Intercept }=\frac{\sum_{i=1}^n x_i^2\sum_{i=1}^ny_i -\sum_{i=1}^nx_i\sum_{i=1}^nx_iy_i}{n(\sum_{i=1}^nx_i^2)-(\sum_{i=1}^nx_i)^2 }$$ and: $$\\\text{Slope }=\frac{n(\sum_{i=1}^nx_iy_i-\sum_{i=1}^nx_i \sum_{i=1}^ny_i}{n\sum_{i=1}^nx_i^2-(\sum_{i=1}^nx_i)^2}$$

In the step-by step derivation below using slightly different variables $y=a+bx$, where below $a$ is the intercept and $b$ is the slope, the same result is derived that:

$$ \text{Intercept }=\frac { - \left( \sum_{i=1}^{i\le n } x_i*x_i \right) *\left( \left( \sum_{i=1}^{i\le n } y_i \right) \right) + \left( \sum_{i=1}^{i\le n } x_i \right) * \left( \left( \sum_{i=1}^{i\le n } y_i*x_i \right) \right) } { \left( \sum_{i=1}^{i\le n } x_i*x_i \right) * \left( -n \right) - \left( \sum_{i=1}^{i\le n } x_i \right) * \left( - 1* \left( \sum_{i=1}^{i\le n } x_i \right) \right) } \tag{15} $$ $$\text{Slope }=\frac {-\left( \sum_{i=1}^{i\le n } x_i \right) \left( \left( \sum_{i=1}^{i\le n } y_i \right) \right) + \left( n*\left(\left( \sum_{i=1}^{i\le n } y_i*x_i \right) \right) \right)} { \left( \sum_{i=1}^{i\le n } x_i \right)\left(-1*\left(\sum_{i=1}^{i\le n } x_i \right)\right) + \left( n*\left(\sum_{i=1}^{i\le n } x_i*x_i\right)\right) } \tag{14}$$

And this result is correct as it also duplicates the result for the slope from the reference (expanding the relevant sums):

$$ \text{Slope }=\frac{n*\left(\sum x_i*y_i\right)-\left(\sum x_i \sum y_i \right)} {n \sum{\left(x_i\right)^2} - \left( \sum{x_i} \right)^2} \tag{16}$$

Succinct Derivation of the Standard Linear Regression Approach

This derivation leans heavily on the succinct answers here. But it adds a little bit more detail so that it is a little bit more obvious how to go from step to step, so that it is obvious whether the answer is correct or not. I am considering how to extend the result for linear regression, so I want to make sure to understand each and every step fully.

I do admire the answer from "SA-255525". But I prefer to work the problem out step-by-step to be able to troubleshoot any mistakes. The steps are too large there for me to be able to assert the correctness of that answer.

I also reference at the bottom an answer that is the same final result! I have a spreadsheet or I can use a program, so it is easy for me to calculate floating sums and divisions to further test the reults.

First, the the error $w=\sum_{\text{error squared}}$ be defined as follows:

$$w=\sum_{i=1}^{i<i_{right}}\left(y_i-\left(a+b*x_i\right)\right)^2\tag{1}$$

Now there are two variables to solve for, namely $a$ and $b$. And there are two equations, since the derivative of $w$ with respect to $a$ and $b$ needs to be zero at an inflection point, like a maximum or minimum. The reference proves that this zero point is the minimum error and not the maximum. Hence:

$$w=\sum_{i=1}^{i<i_{right}}\left(y_i-\left(a+b*x_i\right)\right)^2$$ $$\frac{d w}{d a} = \frac {d \sum_{i=1}^{i<i_{right}}\left(y_i-\left(a+b*x_i\right)\right)^2}{d a}=0\tag{2}$$ $$= -2 \sum_{i=1}^{i<i_{right}}\left(y_i-\left(a+b*x_i\right)\right)$$ $$= 0 \text{ at the maximum point for a}$$ $$\underset{\text{implies}}{\longrightarrow}\sum_{i=1}^{i<i_{right}}\left(y_i-\left(a+b*x_i\right)\right)=0 $$

$$\underset{\text{implies}}{\longrightarrow} \text{ (with }n=i_{right}-1) \text{ that }\left( \sum_{i=1}^{i\le n } y_i \right) - b*\left(\sum_{i=1}^{i\le n } x_i \right)-n*a=0\tag{3} $$

Dividing Equation 3 by $n$, defining $\bar{y}=\frac{\sum_{i=1}^{i\le n}{y_i}}{n}$, $\bar{x}=\frac{\sum{x_i}_{i=1}^{i \le n}}{n}$, and setting terms that are not $\bar{y}$ to the right side of the equation immediately results in the widely quoted (definitely correct) result:

$$ \bar{y}=a+b*\bar{x} $$

Also, the derivative $\frac{ d w}{d b}=0$ at the minimum and hence: $$\frac{d w}{d b} = \frac {d \sum_{i=1}^{i\le n}\left(y_i-\left(a+b*x_i\right)\right)^2}{d b}=0\tag{4}$$ Then, $$\frac{d w}{d b}= -2 \sum_{i=1}^{i\le n}\left(y_i-\left(a+b*x_i\right)\right)*x_i=0$$ And similarly it follows that:

$$\text{ (with }n=i_{right}-1) \text{ that }\left( \sum_{i=1}^{i\le n } y_i*x_i \right) - b*\left(\sum_{i=1}^{i\le n } x_i*x_i\right) - a \left( \sum_{i=1}^{i\le n } x_i \right)=0\tag{5} $$

To solve for $b$, by elimination $a$ from Equations 3 and 5, Equation 3 is multiplied by $\left( \sum_{i=1}^{i\le n } x_i \right)$ resulting in Equation 6; and Equation 5 is multiplied by $n$ resulting in Equation 7. And then the modified second Equation 7 is subtracted from the modified first Equation 6 so to solve for $b$ in Equation 8 as follows:

$$\left( \sum_{i=1}^{i\le n } x_i \right)\left(\left( \sum_{i=1}^{i\le n } y_i \right) - b*\left(\sum_{i=1}^{i\le n } x_i \right)\right) -\left(\left(\sum_{i=1}^{i\le n } x_i \right)*n*a\right)=0\tag{6} $$

$$n*\left(\left( \sum_{i=1}^{i\le n } y_i*x_i \right) - b*\left(\sum_{i=1}^{i\le n } x_i*x_i\right)\right) - \left( \left( \sum_{i=1}^{i\le n } x_i \right ) *n*a \right)=0\tag{7} $$

$$\left( \sum_{i=1}^{i\le n } x_i \right)\left(\left( \sum_{i=1}^{i\le n } y_i \right) - b*\left(\sum_{i=1}^{i\le n } x_i \right)\right) - \left( n*\left(\left( \sum_{i=1}^{i\le n } y_i*x_i \right) - b*\left(\sum_{i=1}^{i\le n } x_i*x_i\right)\right) \right)$$ $$ =0\tag{8} $$ By manipulation of Equation 8, $b$ can be easily solved for. Equation 3 and Equation 5 can also be used by appropriate multiplications and subtractions to solve for $a$.

To solve for $a$, by elimination $b$ from Equations 3 and 5, Equation 3 is multiplied by $\left( \sum_{i=1}^{i\le n } x_i*x_i \right)$, resulting in Equation 9. And Equation 5 is multiplied by $\left( \sum_{i=1}^{i\le n } x_i \right)$, resulting in Equation 10. Then the modified second Equation 10 is subtracted from the modified first Equation 9 so solve for $a$ in Equation 11 as follows:

$$\left( \sum_{i=1}^{i\le n } x_i*x_i \right)*\left(\left( \sum_{i=1}^{i\le n } y_i \right) - b*\left(\sum_{i=1}^{i\le n } x_i \right)-n*a \right)=0\tag{9} $$

$$\left( \sum_{i=1}^{i\le n } x_i \right)* \left( \left( \sum_{i=1}^{i\le n } y_i*x_i \right) - b*\left(\sum_{i=1}^{i\le n } x_i*x_i\right) - a \left( \sum_{i=1}^{i\le n } x_i \right) \right) =0\tag{10} $$

$$\left( \sum_{i=1}^{i\le n } x_i*x_i \right)*\left(\left( \sum_{i=1}^{i\le n } y_i \right) -n*a \right) - \left( \sum_{i=1}^{i\le n } x_i \right)* \left( \left( \sum_{i=1}^{i\le n } y_i*x_i \right) - a \left( \sum_{i=1}^{i\le n } x_i \right) \right)$$ $$=0\tag{11} $$

For solving $b$, Equation 8 can be rearranged by putting the terms that include $b$ on the left of the equals sign and those remaining are multiplied by $-1$ on the right resulting in Equation 12 as follows:

$$ \left( \sum_{i=1}^{i\le n } x_i \right)\left(- b*\left(\sum_{i=1}^{i\le n } x_i \right)\right) - \left( - n*b*\left(\sum_{i=1}^{i\le n } x_i*x_i\right)\right) $$

$$=-\left( \sum_{i=1}^{i\le n } x_i \right) \left( \left( \sum_{i=1}^{i\le n } y_i \right) \right) + \left( n*\left(\left( \sum_{i=1}^{i\le n } y_i*x_i \right) \right) \right)\tag{12}$$

For solving $a$, Equation 11 can be rearranged by putting the terms that include $a$ on the left of the equals sign and those remaining are multiplied by $-1$ on the right resulting in Equation 13 as follows:

$$ \left( \sum_{i=1}^{i\le n } x_i*x_i \right) * \left( -n*a \right) - \left( \sum_{i=1}^{i\le n } x_i \right) * \left( - a \left( \sum_{i=1}^{i\le n } x_i \right) \right) $$

$$ = - \left( \sum_{i=1}^{i\le n } x_i*x_i \right) *\left( \left( \sum_{i=1}^{i\le n } y_i \right) \right) + \left( \sum_{i=1}^{i\le n } x_i \right) * \left( \left( \sum_{i=1}^{i\le n } y_i*x_i \right) \right) \tag{13} $$

Now $b$ can be solved from Equation 12, by collecting all the terms on the left that include $b$ and dividing both sides of the equation by them, resulting in Equation 14:

$$b=\frac {-\left( \sum_{i=1}^{i\le n } x_i \right) \left( \left( \sum_{i=1}^{i\le n } y_i \right) \right) + \left( n*\left(\left( \sum_{i=1}^{i\le n } y_i*x_i \right) \right) \right)} { \left( \sum_{i=1}^{i\le n } x_i \right)\left(-1*\left(\sum_{i=1}^{i\le n } x_i \right)\right) + \left( n*\left(\sum_{i=1}^{i\le n } x_i*x_i\right)\right) } \tag{14}$$

Similarly $a$ can be solved from Equation 13, by collecting all the terms on the left that include $a$ and dividing both sides of the equation by them, resulting in Equation 15:

$$ a=\frac { - \left( \sum_{i=1}^{i\le n } x_i*x_i \right) *\left( \left( \sum_{i=1}^{i\le n } y_i \right) \right) + \left( \sum_{i=1}^{i\le n } x_i \right) * \left( \left( \sum_{i=1}^{i\le n } y_i*x_i \right) \right) } { \left( \sum_{i=1}^{i\le n } x_i*x_i \right) * \left( -n \right) - \left( \sum_{i=1}^{i\le n } x_i \right) * \left( - 1* \left( \sum_{i=1}^{i\le n } x_i \right) \right) } \tag{15} $$

To check this result, start with the Reference: Derivation of the formula for Ordinary Least Squares Linear Regression. As to why it is important to reproduce the steps, it is to later have the capacity to expand them to non-linear regions also, perhaps here if not elsewhere.

The reference has the model $y_i=m*x_i+b$ versus here $y_i=a+b*x_i$ is the model applied. It is straightforward enough to adapt the Reference to what is used in the derivation here. And the result is the same. Adapted from the reference:

$$ b=\frac{n*\left(\sum x_i*y_i\right)-\left(\sum x_i \sum y_i \right)} {n \sum{\left(x_i\right)^2} - \left( \sum{x_i} \right)^2} \tag{16}$$

$$ a=\frac{\sum{y_i} - a* \sum{x_i}} {n} \tag{17}$$

http://stackprinter.com/ can be used by selecting the "mathematics" site to print out the latest version of the question and answers here using the Question ID of 792863. It is a lot easier to follow that mathematics that has detail with it using the printed version. I printed the question and answer last night and was able to correct and update it today. Make sure to reload the printing from the main site and not just refresh the printout output site to get the latest version of the question and answers. — Stephen Elliott, Jan 10 '24 at 12:34

Steph · Answer 4 · 2024-01-11T09:29:38.530

First write the objective function as $\phi(m,b)= \frac12 \| m\mathbf{x}+b\mathbf{1}_N-\mathbf{y} \|_2^2$.

The derivatives are \begin{eqnarray*} \frac{\partial \phi}{\partial b} &=& \mathbf{1}_N^T \left( m\mathbf{x}+b\mathbf{1}_N-\mathbf{y} \right) = N \left( m\bar{x} + b - \bar{y} \right) \\ \frac{\partial \phi}{\partial m} &=& \mathbf{x}^T \left( m\mathbf{x}+b\mathbf{1}_N-\mathbf{y} \right) = \mathbf{x}^T\mathbf{x} m + N \bar{x} b - \mathbf{x}^T\mathbf{y} \end{eqnarray*} Setting both derivatives to zero yields the linear system $$ \begin{pmatrix} N \bar{x} & N \\ \mathbf{x}^T\mathbf{x} & N \bar{x} \end{pmatrix} \begin{pmatrix} m \\ b \end{pmatrix} = \begin{pmatrix} N \bar{y} \\ \mathbf{x}^T\mathbf{y} \end{pmatrix} $$ The relations you are seeking are easily found using Cramer's rule to solve this system

A solution is easily found by hand as \begin{eqnarray*} b &=& \bar{y} - m\bar{x} \\ \mathbf{x}^T\mathbf{x} m = \mathbf{x}^T\mathbf{y} - N \bar{x} \left( \bar{y} - m\bar{x} \right) \rightarrow m &=& \frac{\mathbf{x}^T\mathbf{y} - N \bar{x} \bar{y}} {\mathbf{x}^T\mathbf{x} - N \bar{x} \bar{x}} \end{eqnarray*} From here, you should proceed easily to find the different equalities.

score 0 · Answer 5 · answered Jan 10 '24 at 10:04

As mentioned before, this is all about finding the global minimum of the two variable function $Q(m,b)= \sum_{i=1}^n (y_i - b -m x_i)^2$. Since this function is convex, the stationary point will be a global minimizer. So,

$$ \frac{\partial Q}{\partial b} = 0 \Leftrightarrow -2 \sum_{i=1}^n (y_i-b-m x_i) = 0 \Leftrightarrow b +m \bar x = \bar y $$

$$ \frac{\partial Q}{\partial m} = 0 \Leftrightarrow -2 \sum_{i=1}^n x_i(y_i-b-m x_i) = 0 \Leftrightarrow b \bar x + m \overline{x^2} = \overline{xy} $$

Now you just solve this $2 \times 2$ system and get to the answer.

How to derive the least squares solution for linear regression?

5 Answers5

Abstract

Succinct Derivation of the Standard Linear Regression Approach

Linked