-1

Consider the unconstrained optimization problem $$\underset{\mathbb{R}^n}{\text{min}} f(x) $$ with $$ f (x) = \frac12 x^T Qx - c^T x + \frac{1}{2\mu}(a^T x - \beta)^2 , $$

in which $c$ and $a$ are $n$ vectors, Q an $n\times n$ symmetric and positive-definite matrix, $\beta \in \mathbb{R}$, and $\mu > 0$.

The steepest-descent direction for $f$ at a point $x$ can be written $p_{sd} = -Kx + d$.
a) Determine an expression for matrix K and vector d in terms of the objects defined above.

The steepest-descent direction is found by forming $$\phi(t)=f(x_0+td),$$ where $$d=\nabla f(x_0)$$ whereof $x_0$ is some starting point for minimization. Let us consider $$f(x)=\frac12x^TQx-c^Tx+\frac{1}{2\mu}(a^Tx-\beta)^2$$ Now put $x=x_0+td$ and obtain:

$$f(x_0+td)=\frac12(x_0+td)^TQ(x_0+td)-c^Tx_0+\frac{1}{2\mu}(a^Tx_0-\beta)^2$$

Since $$\phi(t)=f(x_0+td),$$ then we have

\begin{equation} \phi(t)= \frac12(x_0+td)^TQ(x_0+td)-c^Tx_0+\frac{1}{2\mu}(a^Tx_0-\beta)^2 \end{equation} The descent direction is a vector, and can be found by differentiation the latter with respect to $t$. Hence, we have

\begin{equation} \phi'(t)= \frac12(x_0+d)^TQ(x_0+d)-c^Tx_0+\frac{1}{2\mu}(a^Tx_0-\beta)^2 \end{equation}

Since the descent direction is given by $p_{sd} = -Kx + d$, then we can solve for $K$ by equating this to the latter, and obtain:

\begin{equation} -Kx + d= \frac12(x_0+d)^TQ(x_0+d)-c^Tx_0+\frac{1}{2\mu}(a^T-\beta)^2 \end{equation}

Solving for $K$, we obtain \begin{equation} K= \frac1x\bigg(c^Tx_0+d-\frac12(x_0+d)^TQ(x_0+d)-\frac{1}{2\mu}(a^T-\beta)^2\bigg) \end{equation}

However, I am not sure I differentiated that $\phi(t)$ expression correctly, given that we have matrices and vectors in the variables. However, t is still a n-vector variable, and must be differentiated accordingly?

UPDATE:

Now we expand the expression $$\phi(t)=f(x_0+td)=\frac12(x_0+td)^TQ(x_0+td)-c^Tx_0+\frac{1}{2\mu}(a^Tx_0-\beta)^2$$, set $d=\nabla f(x_0)$ and assume linearity $f(a+b)=f(a)+f(b)$: \begin{equation} \begin{split} &\phi(t)= \frac{1}{2}\bigg(x_0 - t\nabla f(x_0) \bigg)^T\bigg(Qx_0 - Q\nabla f(x_0)t \bigg)-c^Tx_0\\&+\frac{1}{2\mu}\bigg(a^T\nabla f(x_0)-\beta\bigg)\bigg(a^T\nabla f(x_0)-\beta\bigg) \end{split} \end{equation} which gives \begin{equation} \begin{split} &\phi(t)= \frac{1}{2}\bigg(x_0Q\nabla f(x_0)+x_0Qx_0-Qx_0\nabla f(x_0) t+Q\nabla^2f(x_0)t^2\bigg) -c^Tx_0\\&+\frac{1}{2\mu}\bigg((a^T)^2 \nabla^2f(x_0)-2\beta a^T\nabla f(x_0)+\beta^2\bigg) \end{split} \end{equation} Now we differentiate with respect to $t$ and obtain: \begin{equation} \begin{split} &\phi'(t)= \frac{1}{2}\bigg(2Q\nabla^2f(x_0)t-Qx_0\nabla f(x_0)\bigg) \end{split} \end{equation} To represent $K$, we have:

$$p_{sd} = -Kx + d=\phi'(t),$$ hence, \begin{equation} -Kx + d=\frac{1}{2}\bigg(2Q\nabla^2f(x_0)t-Qx_0\nabla f(x_0)\bigg) \end{equation} which gives the following:

\begin{equation} K=\frac{1}{2x}\bigg(Qx_0\nabla f(x_0)-2Q\nabla^2f(x_0)t+2d\bigg) \end{equation} Assuming $d=-\nabla f(x_0)$, and putting $x=x_0$ we obtain: \begin{equation} K=\frac{1}{2x_0}\bigg(\nabla f(x_0)\big(Qx_0-2)-2Q\nabla^2f(x_0)t\bigg) \end{equation}

Superunknown
  • 3,089
  • Aren't you worried that, substituting your $K$ into the equation of $p_{sd}$, the latter becomes independent of $x$? That means, that your quadratic form is sort of a line. :) – Egor Larionov Jul 02 '24 at 08:34
  • In your solution I am puzzled to notice your $f(x)$ lacks the terms linear in $x$. Why did you omit them? And since you know the form of the vector, wouldn't it be easier to just find the gradient vector and consider the matrix in front of the linear term as $K$? – Egor Larionov Jul 02 '24 at 08:37
  • You mean to find $\nabla f(x_0)=d$? How does that look like in a $n\times n$ matrix? – Superunknown Jul 02 '24 at 08:38
  • 1
    You don't equate anything. You just find the gradient of the quadratic equation and gather all the linear terms together. The same way as $Q$, being $n \times n$ matrix, maps a vector $x$ into a vector, $\nabla f(x)$ may take a form $\nabla f(x) = Kx+c$. – Egor Larionov Jul 02 '24 at 08:42
  • @EgorLarionov OK, what I am wondering about is what does the gradient look like? Would it just look like this? https://math.stackexchange.com/questions/156880/what-does-it-mean-to-take-the-gradient-of-a-vector-field , that is, in a generic form? – Superunknown Jul 02 '24 at 08:55
  • 1
    The expression for $f(s)$ just after the words 'Let us consider' differs from the one given in the problem statement: there is $c^T$ there instead of $c^T \color{red}x$ and $(a^T - \beta)$ instead of $(a^T \color{red}x - \beta)$. Is this modification intentional? – CiaPan Jul 02 '24 at 11:00
  • @CiaPan thanks for that. Corrected. – Superunknown Jul 02 '24 at 11:11
  • It's not that form that you need -- you don't have a vector field to start with, your function is scalar. Just find the gradient and equate the quotient before the linear term to $K$. – Egor Larionov Jul 02 '24 at 11:16
  • 1
    Hi, you seem to have added $x$ to the $a^T$ factor inside the brackets, but the $c^T$ in the middle term still seems to lack its $x$..... – CiaPan Jul 02 '24 at 11:17

1 Answers1

3

For the last time, I compel you to just calculate the gradient and rearrange the terms. Given a function $f(\mathbf{x}) = \frac{1}{2}\mathbf{x}^TQ\mathbf{x}-\mathbf{c}^T\mathbf{x}+\frac{1}{2\mu}{(\mathbf{a}^T\mathbf{x}-\beta)}^{2}$, its gradient reads: $$ \nabla f(x) = \frac{\partial}{\partial \mathbf{x}}f(x_1, x_2, \dots, x_n) = \frac{1}{2}(Q+Q^T)\mathbf{x}-\mathbf{c}+\frac{1}{\mu}(\mathbf{a}^T\mathbf{x}-\beta)\mathbf{a} = Q\mathbf{x}+\frac{\mathbf{a}\mathbf{a}^T}{\mu}\mathbf{x} -\mathbf{c}-\frac{\beta}{\mu}\mathbf{a} $$ The last equation comes from the fact that $Q=Q^T$. That is already a direction of the steepest ascent. To make it the steepest descent, we should reverse the sign.

Finally, we see that this function is linear in $\mathbf{x}$. Hence, we can rearrange it to the form: $-K\mathbf{x}+\mathbf{d}$. To do so, we need to rearrange the terms in the equation above:

$$ -\nabla f(x) = -Q\mathbf{x}-\frac{\mathbf{a}\mathbf{a}^T}{\mu}\mathbf{x}+\mathbf{c}+\frac{\beta}{\mu}\mathbf{a} = -\underbrace{(Q+\frac{\mathbf{a}\mathbf{a}^T}{\mu})}_{K} \mathbf{x}+\underbrace{\mathbf{c}+\frac{\beta}{\mu}\mathbf{a}}_{\mathbf{d}} $$

  • Thanks Egor Larionov! – Superunknown Jul 02 '24 at 13:46
  • It still beats me how you got that answer...as you can see in this post, no one got the result you proposed, and no one proposed any method in that direction. Seeing you are from the Russian school of mathematics, that does nor surprise me, you have the best mathematicians in the world there! Cheers – Superunknown Jul 09 '24 at 09:40