0

Let

$$\underset{\mathbb{R}^n}{\text{min}} f(x) $$ with \begin{equation} f (x) = \frac12 x^T Qx - c^T x + \frac{1}{2\mu}(a^T x - \beta)^2 , \end{equation} in which $c$ and $a$ are $n$ vectors, Q an $n\times n$ symmetric and positive-definite matrix, $\beta \in \mathbb{R}$, and $\mu > 0$.

Find the gradient of this function, $\nabla f(x)$. Then find $\phi(t)=f(x_0-t\cdot\nabla f(x_0) )$ and $\phi'(t)$.

With

Then we have that \begin{equation} -\nabla f(x)=-\begin{pmatrix} \frac{\partial f_1}{\partial x_1} & \cdots & \frac{\partial f_n}{\partial x_1}\\ \vdots & \ddots & \vdots \\ \frac{\partial f_1}{\partial x_n} & \cdots & \frac{\partial f_n}{\partial x_n} \end{pmatrix} \end{equation}

I have the given expression, \begin{equation} \phi(t)= \frac{1}{2}\bigg(x_0 - t\begin{pmatrix} \frac{\partial f_1}{\partial x_1} & \cdots & \frac{\partial f_n}{\partial x_1}\\ \vdots & \ddots & \vdots \\ \frac{\partial f_1}{\partial x_n} & \cdots & \frac{\partial f_n}{\partial x_n} \end{pmatrix} \bigg)^TQ\bigg(x_0 - t\begin{pmatrix} \frac{\partial f_1}{\partial x_1} & \cdots & \frac{\partial f_n}{\partial x_1}\\ \vdots & \ddots & \vdots \\ \frac{\partial f_1}{\partial x_n} & \cdots & \frac{\partial f_n}{\partial x_n} \end{pmatrix} \bigg)-c^T+\frac{1}{2\mu}(a^T\begin{pmatrix} \frac{\partial f_1}{\partial x_1} & \cdots & \frac{\partial f_n}{\partial x_1}\\ \vdots & \ddots & \vdots \\ \frac{\partial f_1}{\partial x_n} & \cdots & \frac{\partial f_n}{\partial x_n} \end{pmatrix}-\beta)^2 \end{equation}

where $a$ and $c$ are $n$-dimensional vectors and $Q$ is an $n$-dimensional matrix and $\beta$ is a real parameter.

The variable is $t$. How do I find $\phi'(t)$?

I thought of the matrices and vectors as constants, since they have no $t$ variables inside, and therefore use the regular rule $f(x)=ax+b \to f'(x)=a$.

Is that correct, or am I misintrepreting the structure of the function $f$?

Thanks

Superunknown
  • 3,089
  • How are you defining $(a^T-\beta)$, or any of the other dimensionally incorrect expressions in $\phi(t)$ ? – Ninad Munshi Jul 02 '24 at 09:30
  • as a constant I would think! – Superunknown Jul 02 '24 at 09:31
  • 1
    I don't think you understand my question. How are you defining a vector minus a constant? Or in the case of $x_0-tD_f$, how are you defining a scalar minus a matrix? – Ninad Munshi Jul 02 '24 at 09:32
  • Good question, I didn't think about that. I would define that as $a^T-\beta I$, with I the identity matrix – Superunknown Jul 02 '24 at 09:33
  • First of all, that still is not defined. You've just changed the problem to be what does it mean to have a vector minus a matrix. Second, how do you know what the correct interpretation is for your use case? You haven't defined any context or motivation for your problem so it is hard to offer any useful information. You're putting thr cart before the horse. You have to offer a meaningful expression in order to make the choice of which of several derivative conventions and methodologies is appropriate for $\phi$. – Ninad Munshi Jul 02 '24 at 09:35
  • Let me update the question. Note a typo in the function, there is an x after $a^T$ – Superunknown Jul 02 '24 at 09:37
  • 1
    It looks like $f:\mathbb{R}^n\rightarrow \mathbb{R}$. The gradient is the vector $\nabla f(x)=(\frac{\partial f}{\partial x_1},\ldots, \frac{\partial f}{\partial x_n})^T$. – user408858 Jul 02 '24 at 09:55
  • 2
    Note that $f$ is a scalar function, thus $\nabla f(x)=\begin{pmatrix}\frac{\partial f(x)}{\partial x_1}\.\.\ \frac{\partial f(x)}{\partial x_n}\end{pmatrix}$ – Peter Melech Jul 02 '24 at 09:58
  • 1
    Are you sure, that $\phi(t)=f(x_0-t\nabla f)$? Shouldn't it be $\phi(t)=f(x_0-t\nabla f(x_0))$ or something similiar? – user408858 Jul 02 '24 at 10:04
  • @user408858 you are right. It should be as you write. Let me correct – Superunknown Jul 02 '24 at 10:07
  • @PeterMelech then, with your correction, would the derivative of $\phi(t)$ simply consider $\nabla f(x)$ as a constant? – Superunknown Jul 02 '24 at 10:13
  • 2
    A constant vector. You pick a (constant) vector $x_0\in\mathbb{R}^n$. Then also $\nabla f(x_0)\in\mathbb{R}^n$ is a (constant) vector. Only $t$ is a variable in the function $\phi:\mathbb{R}\rightarrow\mathbb{R}$, $\phi(t)=f(x_0-t\nabla f(x_0))$. – user408858 Jul 02 '24 at 10:21
  • @user408858 then the differentiation of $\phi(t)$ follows the rule of $y=ax+b \to y'=a$? – Superunknown Jul 02 '24 at 10:23
  • 1
    No, there is a $t^2$ involved. – user408858 Jul 02 '24 at 10:24
  • You're right, my bad. – Superunknown Jul 02 '24 at 10:25

0 Answers0