Relative camera matrix (pose) from global camera matrixes

Question

I have a list of camera poses from a given ground truth. Each pose is given in the form of a quaternion and a translation, from some arbitrary world origin.

Each pose can be assembled into a 4x4 camera matrix of the form : $ P = \begin{bmatrix} R & t \\ \mathbf{0}^t & 1 \end{bmatrix} $

Now given P1, and P2 with their respective R1|t1 and R2|t2, I want to compute $P_{relative}$ between the two.

Is there a way to do that directly with P1 and P2, or do I need to compute the relative rotation and translation separately ?

Thank you!

So you have $P_1$ and $P_2$ and you want to know for example what is the $P_2$ in camera 1's frame? — Carser, Mar 12 '14 at 16:17

Carser · Accepted Answer · 2014-03-12T16:58:54.873

With cameras $C_1$ and $C_2$ with respective camera matrices $P_1^{\{W\}} = \begin{bmatrix} R_1 & t_1 \\ \mathbf{0} & 1 \end{bmatrix}$ and $P_2^{\{W\}} = \begin{bmatrix} R_2 & t_2 \\ \mathbf{0} & 1 \end{bmatrix}$, where $W$ denotes the world frame, we want to find the transformation matrix $P_1^{\{2\}}$ that is the transformation from $C_1$ to $C_2$. You can just use $P_1^{\{W\}}$ and $P_2^{\{W\}}$ to find this, since you know they are both given in the same frame. The basic process is to transform from $C_1$ to $W$ to $C_2$.

Step 1:
Given a point $q^{\{1\}}$ in $C_1$, the the world coordinate is given by $q^{\{W\}} = t_1 + R_1 q^{\{1\}}$

Step 2:
Given a point $q^{\{W\}}$ in $W$, the $C_2$ coordinate is given by $q^{\{2\}} = R_2^{-1} (q^{\{W\}} - t_2)$

Step 3:
Combine steps 1 and 2. You have $$ q^{\{2\}} = R_2^{-1} (q^{\{W\}} - t_2) $$ $$ q^{\{2\}} = R_2^{-1} ((t_1 + R_1 q^{\{1\}}) - t_2) $$ $$ q^{\{2\}} = R_2^{-1} (R_1 q^{\{1\}} + t_1 - t_2) $$ $$ q^{\{2\}} = R_2^{-1} R_1 q^{\{1\}} + R_2^{-1} (t1-t2) $$ which you can write as $$ q^{\{2\}} = P_1^{\{2\}} q^{\{1\}} $$ where $$ P_1^{\{2\}} = \begin{bmatrix} R_2^{-1} R_1 & R_2^{-1} (t_1 - t_2) \\ \mathbf{0} & 1\end{bmatrix} $$ If you'd like to simplify with notation a bit, and knowing that since $R_2$ is orthonormal that $R_2^{-1} = R_2^T$, you can write $$ P_1^{\{2\}} = \begin{bmatrix} R_2^{T} R_1 & t_{12} \\ \mathbf{0} & 1\end{bmatrix} $$ where $t_{12} = t_1^{\{2\}} - t_2^{\{2\}}$.

Are you missing a R^T_2 in front of the translation t_{12} in the last P^{{2}}_1 matrix? It disappeared in the last simplifying step — benbo, Apr 07 '18 at 17:36
@benbo good question. The $R^{T}{2}$ is actually accounted for since $t{12}$ is defined as being in the frame of {2}. I can see how it might look like it is missing though, and we could definitely still write it as in the line above that. — Carser, Mar 26 '21 at 15:21

burrata · Answer 2 · 2024-10-30T22:03:30.423

This is an alternative solution following the Hartley & Zisserman [1] notation which is commonly used in stereo vision. Note, while the above formulation is fine, this one assigns the rotated camera origin in the respective camera frame to $t = -R \tilde{C}$, where $\tilde{C}$ corresponds to the camera origin in world coordinates.

We have two transformation matrices $T_1 = [R_1 | t_1]$ and $T_2 = [R_2 | t_2]$ and are interested to find the transformation $T_{1->2}$ that transforms a point from coordinate frame $T_1$ into $T_2$.

From the camera transformation equation we get: \begin{gather*} x_{c_1} = R_1 x_w + t_1 \\ x_w = R_1^T (x_{c_1} - t_1) \\ \end{gather*}

$x_w$ is a point in world coordinates.
$x_{c_1}$ corresponds to the world point $x_w$ expressed in the first frame's local coordinate system.

and we also have:

\begin{gather*} x_{c_2} = R_2 x_w + t_2 \end{gather*}

$x_{c_2}$ corresponds to the world point $x_w$ expressed in the second frame's local coordinate system.

Note, and this is the important difference to the above. $t_1$ and $t_2$ are expressed in their local frame's coordinate system respectively. They are not in world coordinates.

When substituting $x_w$ it follows:

\begin{gather*} x_{c_2} = R_2 (R_1^T (x_{c_1} - t_1)) + t_2 \\ x_{c_2} = R_2 R_1^T x_{c_1} - R_2 R_1^T t_1 + t_2 \\ \end{gather*}

and when we substitute $t_1$ and $t_2$ with $-R_1 \tilde{C_1}$ and $-R_2 \tilde{C_2}$ respectively:

\begin{gather*} x_{c_2} = R_2 R_1^T x_{c_1} - R_2 R_1^T (-R_1 \tilde{C_1}) + (-R_2 \tilde{C_2}) \\ x_{c_2} = R_2 R_1^T x_{c_1} + R_2 \tilde{C_1} - R_2 \tilde{C_2} \\ x_{c_2} = R_2 R_1^T x_{c_1} + R_2 (\tilde{C_1} - \tilde{C_2}) \end{gather*}

With the same argument and simplification this can be re-written as:

\begin{gather*} T_{12} = \begin{bmatrix} R_2 R_1^T & -R_2 (\tilde{C_2} - \tilde{C_1}) \\ 0 & 1 \end{bmatrix} \end{gather*}

or with $R_{12} = R_2 R_1^T$ and $t_{12} = -R_2 (\tilde{C_2} - \tilde{C_1})$

\begin{gather*} T_{12} = \begin{bmatrix} R_{12} & t_{12} \\ 0 & 1 \end{bmatrix} \end{gather*}

I hope this helps.

Relative camera matrix (pose) from global camera matrixes

2 Answers2