Quasi-newton methods: Understanding DFP updating formula

Question

In Nocedal/Wright Numerical Optimization book at pages 138-139 the approximate Hessian $B_k$ update for the quasi-Newton method: (DFP method) $$B_{k+1} = \left(I-\frac{y_ks_k^T}{y_k^Ts_k}\right)B_k\left(I-\frac{s_ky_k^T}{y_k^Ts_k}\right)+ \frac{y_ky_k^T}{y_k^Ts_k}\tag{1}$$ is explained as the solution to the problem: $$\min_B\|B-B_k\|_{F,W} \\ \text{subject to}~B=B^T,~Bs_k=y_k \tag{2}$$ for which $\|A\|_{F,W}$ is the weighted Frobenius norm : $$\|A\|_{F,W} = \|W^{1/2}AW^{1/2}\|_F$$ for $W$ being any symmetric matrix satisfying the relation $Wy_k=s_k$

How can I prove that $B_{k+1}$ given by equation (1) is the solution to the problem (2)?

Have you tried to solve the KKT conditions? – LinAlg Jan 10 '17 at 15:43 — LinAlg, Jan 10 '17 at 15:43

A.Γ. · Accepted Answer · 2017-05-13T17:33:04.137

This answer goes basically along the lines of my answer about BFGS update.

Introduce the short notations $$ \min\|\underbrace{W^{1/2}B_kW^{1/2}}_{\hat B_k}-\underbrace{W^{1/2}BW^{1/2}}_{\hat B}\|_F $$ \begin{align} Wy_k=s_k\quad&\Leftrightarrow\quad \underbrace{\color{red}{W^{-1/2}}Wy_k}_{\hat y_k}=\underbrace{\color{red}{W^{-1/2}}s_k}_{\hat s_k}\quad\Leftrightarrow\quad \hat y_k=\hat s_k,\tag{1}\\ Bs_k=y_k\quad&\Leftrightarrow\quad \underbrace{\color{blue}{W^{1/2}}B\color{red}{W^{1/2}}}_{\hat B}\underbrace{\color{red}{W^{-1/2}}s_k}_{\hat s_k}=\underbrace{\color{blue}{W^{1/2}}y_k}_{\hat y_k}\quad\Leftrightarrow\quad \hat B\hat s_k=\hat y_k.\tag{2} \end{align} Then the problem becomes $$ \min\|\hat B_k-\hat B\|_F\quad\text{subject to }\hat B\hat s_k=\hat s_k. $$
The optimization variable $\hat B$ has the given eigenvector with the given eigenvalue, hence, it is convenient to introduce the new orthonormal basis $$ U=[u\ |\ u_\bot] $$ where $u$ is the normalized eigenvector $\hat s_k$, i.e. $$ u=\frac{\hat s_k}{\|\hat s_k\|}\tag{3}, $$ and $u_\bot$ is any orthonormal complement to $u$. Then $u^T\hat Bu=u^Tu=1$ and $u_\bot^T\hat Bu=u_\bot^Tu=0$, and the operator matrices in the new basis take the form \begin{align} U^T\hat B_kU-U^T\hat BU&=\begin{bmatrix}u^T\\ u_\bot^T\end{bmatrix}\hat B_k\begin{bmatrix}u & u_\bot\end{bmatrix}-\begin{bmatrix}u^T\\ u_\bot^T\end{bmatrix}\hat B\begin{bmatrix}u & u_\bot\end{bmatrix}=\\ &=\begin{bmatrix}\color{blue}{u^T\hat B_ku} & \color{blue}{u^T\hat B_ku_\bot}\\\color{blue}{u_\bot^T\hat B_ku} & \color{red}{u_\bot^T\hat B_ku_\bot}\end{bmatrix}-\begin{bmatrix}\color{blue}{1} & \color{blue}{0}\\\color{blue}{0} & \color{red}{u_\bot^T\hat Bu_\bot}\end{bmatrix}. \end{align} Since the Frobenius norm is unitary invariant (as it depends on the singular values only) we have \begin{align} \|\hat B_k-\hat B\|_F^2&=\|U^T(\hat B_k-\hat B)U\|_F^2= \left\|\begin{bmatrix}\color{blue}{u^T\hat B_ku-1} & \color{blue}{u^T\hat B_ku_\bot}\\\color{blue}{u_\bot^T\hat B_ku} & \color{red}{u_\bot^T\hat B_ku_\bot-u_\bot^T\hat Bu_\bot}\end{bmatrix}\right\|_F^2=\\ &=\color{blue}{(u^T\hat B_ku-1)^2+\|u^T\hat B_ku_\bot\|_F^2+\|u_\bot^T\hat B_ku\|_F^2}+\color{red}{\|u_\bot^T\hat B_ku_\bot-u_\bot^T\hat Bu_\bot\|_F^2} \end{align} The blue part cannot be affected by optimization, and to minimize the Frobenius norm, it is clear that we should make the red part zero, that is, the optimal solution satisfies $$ \color{red}{u_\bot^T\hat Bu_\bot}=\color{red}{u_\bot^T\hat B_ku_\bot}. $$
It gives the optimal solution to be \begin{align} \hat B&=U\begin{bmatrix}\color{blue}1 & \color{blue}0\\\color{blue}0 & \color{red}{u_\bot^T\hat B_ku_\bot}\end{bmatrix}U^T=\begin{bmatrix}u & u_\bot\end{bmatrix}\begin{bmatrix}1 & 0\\0 & u_\bot^T\hat B_ku_\bot\end{bmatrix}\begin{bmatrix}u^T \\ u_\bot^T\end{bmatrix}=uu^T+u_\bot u_\bot^T\hat B_ku_\bot u_\bot^T=\\ &=uu^T+(I-uu^T)\hat B_k(I-uu^T) \end{align} where we used the following representation for the projection operator to the complement of $u$ $$ I=UU^T=\begin{bmatrix}u & u_\bot\end{bmatrix}\begin{bmatrix}u^T \\ u_\bot^T\end{bmatrix}=uu^T+u_\bot u_\bot^T\quad\Leftrightarrow\quad u_\bot u_\bot^T=I-uu^T. $$
Changing variables back to the original ones is straightforward via (1), (2), (3) $$ B=W^{-1/2}\hat BW^{-1/2}=W^{-1/2}uu^TW^{-1/2}+(I-W^{-1/2}uu^TW^{1/2})B_k(I-W^{1/2}uu^TW^{-1/2}) $$ where \begin{align} \|\hat s_k\|^2&=\hat s_k^T\hat s_k=\hat y_k^T\hat s_k=y_k^Ts_k,\\ uu^T&=\frac{\hat s_k^T\hat s_k}{\|\hat s_k\|^2}=\frac{\hat y_k\hat y_k^T}{y_k^Ts_k}=W^{1/2}\frac{y_ky_k^T}{y_k^Ts_k}W^{1/2},\\ W^{-1/2}uu^TW^{-1/2}&=\frac{y_ky_k^T}{y_k^Ts_k},\\ W^{1/2}uu^TW^{-1/2}&=\frac{Wy_ky_k^T}{y_k^Ts_k}=\frac{s_ky_k^T}{y_k^Ts_k}. \end{align}

Comments have been moved to chat; please do not continue the discussion here. Before posting a comment below this one, please review the purposes of comments. Comments that do not request clarification or suggest improvements usually belong as an answer, on [meta], or in [chat]. Comments continuing discussion may be removed. — Shaun, Nov 06 '24 at 23:30

R. W. Prado · Answer 2 · 2024-11-05T17:06:54.610

The red part that vanishes of the proof of A.Γ. is not justified, because we don't know whether, for at least one symmetric matrix $B$ so that $B u=u$, one can guarantee the existence of a matrix $u_{\perp}$ with orthonormal columns so that $u_{\perp} (\hat{B} − B^k) u_{\perp}=0$ and $u^T_{\perp} u = 0$. This assumption is necessary to prove that the red term is zero in the solution set. I found it very difficult to prove this directly, which led me to derive a new proof for your problem.

Let $W$ and $\hat{B}$ be symmetric matrices such that, for fixed $y$ and $s$, $$ W y = s \tag{1}$$ and $W$ is positive definite. Denote $\hat{B}_{+} = W^{1/2} \hat{B} W^{1/2}.$ Let $B$ be an arbitrary symmetric matrix, and call $B_{+} = W^{1/2} B W^{1/2}$. Hence, $$\begin{align}\|B -\hat{B}\|_{F,W} = & \| W^{1/2} B W^{1/2} -W^{1/2} \hat{B} W^{1/2} \|_{F} \\ & \| B_{+} - \hat{B}_{+} \|_{F} \end{align}. $$ Additionally, notice that, by $(1)$, $$ y_{+} = W^{1/2} y = W^{-1/2} s = s_{+}.\tag{2}$$ Consequently, we have that, $ y = B s$ if and only if $$ B_{+} s_{+} = (W^{1/2} B W^{1/2})( W^{-1/2} s ) = W^{1/2} B s = W^{1/2} y = W^{- 1/2} s = s_{+}.$$ Thus, the symmetric matrix $B_{+}$ has the eigenvalue one with eigenvector $s_{+}$. Thus, by the Spectral decomposition, there is a singular symmetric matrix $B_{++}$ with $B_{++} s_{+} = 0$ so that $B_{+} = B_{++} + \dfrac{1}{\|s_{+}\|^2_{2}} s_{+} s_{+}^{T}$. Now, $$\|B_{+} - \hat{B}_{+}\|_{F} = \|B_{++} + \dfrac{1}{\|s_{+}\|^2_{2}} s_{+} s_{+}^{T} - \hat{B}_{+}\|_{F}$$. Now, notice that $$ \|\hat{B}_{+} s_{+} - s_{+} \|^2_{2} = \| (\hat{B}_{+} - (B_{++} + \dfrac{1}{\|s_{+}\|^2_{2}} s_{+} s_{+}^{T}) s_{+}\|^2_{2} \leq \| \hat{B}_{+} - B_{+} \|^2_{F} \| s_{+}\|^2_{2} $$ Thus, $$ \dfrac{\|\hat{B}_{+} s_{+} - s_{+}\|^2}{\|s_{+}\|^2} \leq \|B_{+} - \hat{B}_{+}\|^2_{F}.$$ Now, if we find a symmetric matrix $A$ so that $A s_{+} = 0$ and $$\|A + \dfrac{1}{\|s_{+}\|^2_{2}} s_{+} s_{+}^{T}-\hat{B}_{+}\|^2_{F} = \dfrac{\|B_{+} s_{+} - s_{+}\|^2}{\|s_{+}\|^2},$$ we have found the solution, and it is $$B_{\text{sol}} = A+\dfrac{1}{\|s_{+}\|^2_{2}} s_{+} s_{+}^{T}, \tag{3}$$ since $B_{\text{sol}} s_{+} = s_{+}$ and $B_{\text{sol}}$ is symmetric. Indeed, take $$ A = (I - \dfrac{1}{\|s_{+}\|^2}s_{+}s_{+}^T) \hat{B}_{+}(I - \dfrac{1}{\|s_{+}\|^2}s_{+}s_{+}^T).$$ Notice that the matrix $A$ is symmetric and $A s_{+} = 0$. Additionally, recall that $\| u v^{T}\|_{F} = \|u\|_{2} \|v\|_{2}$, $\|Z\|^2_{F} = \text{tr} (Z Z^{T})$ and $\text{tr}(A B) = \text{tr} (B A)$. Using these properties we obtain $$\|A\|^2_{F} = \text{tr} ( A \hat{B}_{+}) = \| (I - \dfrac{1}{\|s_{+}\|^2}s_{+}s_{+}^T) \hat{B}_{+}\|^2_{F} = \|\hat{B}_{+}\|^2_{F} - \dfrac{1}{\|s_{+}\|^2_{2}} \| \hat{B}_{+} s_{+}\|^2_{2}.$$ And finally, $$\begin{align} & \|A + \dfrac{1}{\|s_{+}\|^2_{2}} s_{+} s_{+}^{T}-\hat{B}_{+}\|^2_{F} \\ = {} & \| A + \dfrac{1}{\|s_{+}\|^2_{2}} s_{+} s_{+}^{T}\|^2_{F} + \|\hat{B}_{+} \|^2_{F} - 2 \text{tr} (A + \dfrac{1}{\|s_{+}\|^2_{2}} s_{+} s_{+}^{T}) \hat{B}_{+} \\ = {} & \| A \|^2_{F} + \|\dfrac{1}{\|s_{+}\|^2_{2}} s_{+} s_{+}^{T}\|^2_{F} + 2 \text{tr} (A \dfrac{1}{\|s_{+}\|^2_{2}} s_{+} s_{+}^{T} ) + \|\hat{B}_{+} \|^2_{F} - 2 \text{tr} (\dfrac{1}{\|s_{+}\|^2_{2}} s_{+} s_{+}^{T} \hat{B}_{+}) - 2 \text{tr} (A \hat{B}_{+}) \\ = {} & \| A \|^2_{F} + 1 + \|\hat{B}_{+} \|^2_{F} - \dfrac{2}{\|s_{+}\|^2_{2}} s_{+}^{T} \hat{B}_{+} s_{+} - 2 \|A\|^2_{F} \\ = {} & - \| A \|^2_{F} + 1 + \|\hat{B}_{+} \|^2_{F} - \dfrac{2}{\|s_{+}\|^2_{2}} s_{+}^{T} \hat{B}_{+} s_{+} \\ {} = {} & - \|\hat{B}_{+}\|^2_{F} + \dfrac{1}{\|s_{+}\|^2_{2}} \| \hat{B}_{+} s_{+}\|^2_{2} + 1 + \|\hat{B}_{+} \|^2_{F} - \dfrac{2}{\|s_{+}\|^2_{2}} s_{+}^{T} \hat{B}_{+} s_{+}\\ {} = {} & \dfrac{1}{\|s_{+}\|^2_{2}} \| \hat{B}_{+} s_{+}\|^2_{2} + 1 - \dfrac{2}{\|s_{+}\|^2_{2}} s_{+}^{T} \hat{B}_{+} s_{+}\\ {} = {} & \dfrac{1}{\|s_{+}\|^2_{2}} \| \hat{B}_{+} s_{+} - s_{+}\|^2_{2}, \end{align}$$ as we wanted to show, and the solution is $(3).$

P.S.: With a similar technique, we can show that the solution of the problem $$\text{minimize}_{B \in \mathbb{R}^{n \times n}}\quad \|B - A\|^2_{F} \quad \text{subject to } \quad B s = 0 \text{ and } B^{T} = B $$ is $$ B = (I - \dfrac{1}{\|s\|^2_{2}}s s^{T}) A (I - \dfrac{1}{\|s\|^2_{2}}s s^{T}).$$

Quasi-newton methods: Understanding DFP updating formula

2 Answers2

Linked