On page 411 in this book, the authors give the following BFGS formula $$ \boxed{\boldsymbol C_{\textrm{BFGS}} = \boldsymbol C + \underbrace{\frac{\boldsymbol g^\top\boldsymbol\delta+\boldsymbol g^\top\boldsymbol C\boldsymbol g}{(\boldsymbol g^\top\boldsymbol\delta)^2}\boldsymbol\delta\boldsymbol\delta^\top-\frac{1}{\boldsymbol g^\top\boldsymbol\delta}\left(\boldsymbol\delta\boldsymbol g^\top\boldsymbol C + (\boldsymbol\delta\boldsymbol g^\top\boldsymbol C)^\top\right)}_{\textrm{BFGS update}}}\tag{1} $$ by considering the constrained optimisation problem: $$\begin{array}{rlcll} \min_{\boldsymbol A} & \mathcal D(\boldsymbol 0, \boldsymbol C \mid \boldsymbol 0, \boldsymbol A) \\ \textrm{subject to}: & \boldsymbol A\boldsymbol g = \boldsymbol\delta, \\ & \boldsymbol A ~~= \boldsymbol A^\top, \end{array}\tag{2}$$ where $$ \mathcal D(\boldsymbol 0, \boldsymbol C \mid \boldsymbol 0, \boldsymbol A) := \frac12\left(\mathrm{tr}\left(\boldsymbol A^{-1}\boldsymbol C\right) - \log\left(\det(\boldsymbol A^{-1}\boldsymbol C)\right) - n\right)$$ is the Kullback-Leibler divergence between the normal distributions $\mathcal N(\boldsymbol 0, \boldsymbol C)$ and $\mathcal N(\boldsymbol 0, \boldsymbol A)$.
Using Lagrange multipliers with the Lagrangian $$ \mathcal L(\boldsymbol A, \boldsymbol \beta) := \mathcal D(\boldsymbol 0, \boldsymbol C \mid \boldsymbol 0, \boldsymbol A) + \boldsymbol \beta^\top(\boldsymbol A\boldsymbol g - \boldsymbol \delta)$$ gives (also using symmetry of $\boldsymbol A$ and $\boldsymbol C$) $$ \frac{\partial}{\partial\boldsymbol A}\mathcal L(\boldsymbol A, \boldsymbol \beta) = \frac12\left(-\boldsymbol A^{-1}\boldsymbol C\boldsymbol A^{-1} + \boldsymbol A^{-1}\right) + \boldsymbol \beta\boldsymbol g^\top, \quad \boldsymbol{\nabla}_{\boldsymbol \beta}\mathcal L(\boldsymbol A, \boldsymbol \beta) = \boldsymbol A\boldsymbol g - \boldsymbol \delta,$$ and equating these both equal to zero gives $\boldsymbol A\boldsymbol g = \boldsymbol \delta$ and $\boldsymbol A^{-1}(\boldsymbol I - \boldsymbol C\boldsymbol A^{-1}) = -2\boldsymbol \beta \boldsymbol g^\top$. It is not obvious to me how this last expression could be used to identify the corresponding values of $\boldsymbol \beta$ and $\boldsymbol A$. I tried multiplying through by $\boldsymbol \delta$ to use the $\boldsymbol A\boldsymbol g = \boldsymbol \delta$ constraint, yielding $\boldsymbol{\delta}\boldsymbol \beta^\top\boldsymbol \delta = \frac12\left(\boldsymbol C\boldsymbol g - \boldsymbol \delta\right)$, but it is still unclear from this (1) how $\boldsymbol \beta$ is specified and (2) how $\boldsymbol{A}$ would be recovered.
What is the correct way to solve the constrained optimisation problem (2) to obtain the BFGS formula (1) with the method of Lagrange multipliers that gives values for $\boldsymbol A$ and $\boldsymbol \beta$?
This other question about DFP goes in the same direction: https://math.stackexchange.com/questions/2091867/quasi-newton-methods-understanding-dfp-updating-formula
– Antonio Horta Ribeiro Jul 08 '21 at 13:24