I understand that in projected gradient descent, I need to project my (intermediate) solution to the feasible set with shortest distance like here:
What is the difference between projected gradient descent and ordinary gradient descent?
$$ \min_x f(x) \text{ subject to } x \in C $$
$$ y_{k+1} = x_k - t_k \nabla f(x_k)\\ x_{k+1} = \text{arg} \min_{x \in C} \|y_{k+1}-x\| $$
So in my specific case, my optimization problem is $f(X) = \text{tr}((I-X)A(I-X)^T + XBX^T )$ subject to $l \leq Xt \leq u$. $A, B$ are positive semidefinite matrices. $X$ is an $n$ by $m$ matrix, $l, u$ are known $n$ by $1$ vectors and $t$ is a known $m$ by $1$ vector.
How do I project a given $X$ to above feasible set?
Basically in the projection step I want to do something like $\min ||X - Y||^2$ s.t $Xt \leq b$, but this seems a bit messy, as in I dont know how to exactly define $||X - Y||^2$ when $X, Y$ are matrices.
[EDIT] If it's a problem with $x,y,a$ as vectors and $b$ as a scalar such as $\min_{x} || x - y ||^{2}$ subject to ${a}^{T} x \leq b$
Then I can find
$$ x = \begin{cases} y & \text{ if } \; {a}^{T} y \leq b \\ y - \frac{{a}^{T} y - b}{ {\left\| a \right\|}_{2}^{2} } a & \text{ if } \; {a}^{T} y > b \end{cases} $$