7

Inspired by this question I wondered if it is possible to fully parameterize the inverse optimal control problem. So given a stabilizing state feedback policy

$$ u(t) = -K\,x(t), \tag{1} $$

for a linear time invariant state space system

$$ \dot{x}(t) = A\,x(t) + B\,u(t), \tag{2} $$

with $A - B\,K$ Hurwitz, find all possible combinations of $Q$, $R$ and $N$ such that $(1)$ minimizes

$$ J(u(t)) = \int_0^\infty x^\top(t)\,Q\,x(t) + 2\,x^\top(t)\,N\,u(t) + u^\top(t)\,R\,u(t)\,dt. \tag{3} $$

Normally $(3)$ can be minimized by solving the continuous time algebraic Riccati equation

$$ A^\top P + P\,A - (P\,B + N)\,R^{-1} (B^\top P + N^\top) + Q = 0, \tag{4} $$

with

$$ K = R^{-1} (B^\top P + N^\top). \tag{5} $$


At first I thought I could maybe just choose values for $R = R^\top \succ 0$ and $P = P^\top \succeq 0$. Combining this with the given value of $K$ in $(4)$ and $(5)$ gives that the remaining matrices can be found using

\begin{align} N &= K^\top R - P\,B, \tag{6} \\ Q &= K^\top R\,K - A^\top P - P\,A. \tag{7} \end{align}

However, in order for the resulting LQR problem to be well defined it also requires

$$ Q - N\,R^{-1} N^\top = W^\top W \succeq 0, \tag{8} $$

with $(A,W)$ detectable. However, detectability is hard to enforce and therefore instead of choosing the value for $P$ one could choose the value for $S := Q - N\,R^{-1} N^\top$. Substituting $(6)$ and $(7)$ in $S$ yields

\begin{align} S &= K^\top R\,K - A^\top P - P\,A - (K^\top R - P\,B) R^{-1} (R\,K - B^\top P), \tag{9a} \\ &= (B\,K - A)^\top P + P\,(B\,K - A) - P\,B\,R^{-1} B^\top P. \tag{9b} \end{align}

Using $\mathcal{A} = A - B\,K$ in $(9b)$ gives the following algebraic Riccati equation in $P$

$$ \mathcal{A}^\top P + P\,\mathcal{A} + P\,B\,R^{-1} B^\top P + S = 0, \tag{10} $$

which looks very similar to algebraic Riccati equation related to LQR from $(4)$ but with $N=0$ and a plus sign instead of a minus sign in front of the quadratic term in $P$. Thus parameterizing the inverse optimal control problem is equivalent to showing for which $R = R^\top \succ 0$ and detectable $S$ $(10)$ has a positive semi-definite solution for $P$. In $(10)$ it can be noted that $\mathcal{A}$ is Hurwitz. Normally $(4)$ could be solved by constructing a Hamiltonian matrix. However, due to the change from a minus to a plus sign (which is equivalent to using $-R$ instead $R$) would make such matrix not Hamiltonian matrix anymore.

When considering the scalar case for $(10)$ with $\mathcal{A} = a < 0$, $P = p \geq 0$, $B\,R^{-1} B^\top = \rho > 0$ and $S = s \geq 0$ yields

$$ 2\,a\,p + \rho\,p^2 + s = 0, \tag{11} $$

which has the solutions

$$ p = \frac{-a \pm \sqrt{a^2 - \rho\,s}}{\rho}. \tag{12} $$

The value for $p$ should be real, which thus requires that $a^2 \geq \rho\,s$. It can be noted that if this inequality is satisfied then both solution from $(12)$ would be valid. Could this constraint be generalized to the non-scalar case and how many positive semi-definite solutions could $(10)$ have for $P$? Or is there maybe another way to parameterize this inverse optimal control problem?


Edit: I just realized that $(A - B\,R^{-1} N^\top, W)$ instead of $(A, W)$ should be detectable. By using $(6)$ this can also be written as $(A - B\,K + B\,R^{-1}B^\top P, W)$, which means that the detectability requirement is not just a function of $S$, but also either of $R$ and $N$ or $R$ and $P$. This would not decouple the detectability requirement from finding the values for the remaining matrices, keeping all constraints intertwined. However, if $S$ is full rank then $(A - B\,R^{-1} N^\top, W)$ should always be detectable, and thus the previous equations could still be used to parameterize a subset of all $Q$, $R$ and $N$.

  • I know that it is an old question, but are you still looking for answer? This is an interesting problem. – KBS Feb 10 '22 at 16:30

0 Answers0