3

I know that a Sylvester equation $$AX+XB=0$$ has nontrivial solutions exactly when there is a common eigenvalue of $A$ and $-B$. This is because if there is a common eigenvalue $k$, then there exists a column eigenvector $v$ of $A$ and a row eigenvector $w$ of $-B$. Then $vw$ is a nontrivial solution to the equation: $$A(vw)+(vw)B=(Av)w+v(wB)=kvw-kvw=0$$

But what if we are working with $GF(2)$? Suppose the characteristic polynomials of $A$ and $-B$ are both $x^2+x+1$. This polynomial is irreducible over $GF(2)$, so $A$ and $-B$ do not share an eigenvalue in $GF(2)$. But they share an eigenvalue in $\mathbb{C}$, so does this mean there is a nontrivial solution in $GF(2)$ or not? The argument above certainly doesn't work because $vw$ isn't necessarily in $GF(2)$.

Riley
  • 2,845

2 Answers2

2

Let us assume that the characteristic polynomials $\chi_A$ and $\chi_{-B}$ have a common irreducible factor $p$. Let $C$ be the companion matrix of $p$. Then $A$ is similar to an upper triangular block matrix with $C$ in the upper left corner and $-B$ is similar to a lower triangular block matrix with $C$ in the upper left corner (other diagonal blocks can be chosen as the companion matrices of the other irreducible factors of the characteristic polynomials.) Hence we have $$ A=S_A \begin{pmatrix} C & \star & \dots & \star \\ 0 & \star & \ddots & \vdots \\ \vdots & \ddots & \ddots & \star \\ 0 & \dots & 0 & \star \end{pmatrix} S_A^{-1} $$ and $$ -B=S_B^{-1} \begin{pmatrix} C & 0 & \dots & 0 \\ \star & \star & \ddots & \vdots \\ \vdots & \ddots & \ddots & 0 \\ \star & \dots & \star & \star \end{pmatrix} S_B $$ Now choose $$ X=S_A \begin{pmatrix} I & 0 & \dots & 0 \\ 0 & 0 & \ddots & \vdots \\ \vdots & \ddots & \ddots & 0 \\ 0 & \dots & 0 & 0 \end{pmatrix} S_B $$ where $I$ is the identity matrix of the same size as $C$. Then $$ AX = X(-B) = S_A \begin{pmatrix} C & 0 & \dots & 0 \\ 0 & 0 & \ddots & \vdots \\ \vdots & \ddots & \ddots & 0 \\ 0 & \dots & 0 & 0 \end{pmatrix} S_B $$ Note: We only need the first $d$ columns of $S_A$ and the first $d$ rows of $S_B$, where $d$ is the degree of $p.$ The columns of $S_A$ can be found by choosing one vector $v$ of the kernel of $p(A)$ and use $A^kv,\; k=0,\ldots d-1$ as the first $d$ columns of $S_A.$ (Similar approach for $S_B$, but we must pay attention to get the same $C$ and not its transpose.)

Edit

We have to find $d$ column vectors $v_0,\ldots,v_{d-1}$ such that $$ A \begin{pmatrix} & & \\ v_0 & \dots & v_{d-1} \\ & & \end{pmatrix} = \begin{pmatrix} & & \\ v_0 & \dots & v_{d-1} \\ & & \end{pmatrix} C $$ and $d$ row vectors $w_0,\ldots,w_{d-1}$ such that $$ \begin{pmatrix} & w_{0} & \\ \;\; & \vdots & \;\; \\ & w_{d-1} & \end{pmatrix} (-B) = C \begin{pmatrix} & w_{0} & \\ \;\; & \vdots & \;\; \\ & w_{d-1} & \end{pmatrix} $$ Let $$ C=\begin{pmatrix} 0 & & 0 & -a_0 \\ 1 & \ddots & & -a_1 \\ & \ddots & 0 & \vdots \\ 0 & & 1 & -a_{d-1} \end{pmatrix} $$ where $a_0,\ldots,a_{d-1}$ are the coefficients of $p$, such that $p(t)=t^d+a_{d-1}t^{d-1}+\ldots+a_0.$

Let $v_0$ be a vector in $\ker(p(A))$ and $v_k=A^kv_0.$ As $p(A)v_0=0,$ it can easily be seen that $$ A \begin{pmatrix} & & \\ v_0 & \dots & v_{d-1} \\ & & \end{pmatrix} = \begin{pmatrix} & & & \\ v_1 & \dots & v_{d-1} & A^dv_0 \\ & & & \end{pmatrix} \\ = \begin{pmatrix} & & & \\ v_1 & \dots & v_{d-1} & (A^d-p(A))v_0 \\ & & & \end{pmatrix} = \begin{pmatrix} & & \\ v_0 & \dots & v_{d-1} \\ & & \end{pmatrix} C $$ We can also show that the $v_k$ are linearly independent. Assume there was a non-trivial linear combination $\sum_{k=0}^{d-1}\alpha_kv_k=0.$ This is equivalent with the statement that there is a polynomial $q$ of degree less than $d$ with $q(A)v_0=0.$ $q$ and $p$ must be coprime, because $p$ is irreducible. But if $q(A)x_1=0$ and $p(A)x_2=0$ for comprime $p$ and $q$ and $x_1\neq 0$ and $x_2\neq 0,$ then $x_1$ and $x_2$ are linearly independent, see here. We have a contradiction, because in our case $p(A)v_0=q(a)v_0=0.$ There cannot be a non-trivial linear combination $\sum_{k=0}^{d-1}\alpha_kv_k=0,$ hence the vectors $v_0,\ldots, v_{d-1}$ are linearly independent.

We select $w_{d-1}$ from the "left kernel" of $p(-B),$ such that $w_{d-1}\,p(-B)=0.$

Let $w_{k-1}=-w_kB + a_kw_{d-1}$ for $k=d-1,d-2,\ldots,0.$

Then $w_{-1}=w_{d-1}\,p(-B) = 0.$ Example ($d=4$): \begin{eqnarray*} w_2 = -w_3B+a_3w_3 &=& w_3(-B+a_3I) \\ w_1 = -w_2B+a_2w_3 &=& w_3((-B+a_3I)(-B)+a_2I) \\ w_0 = -w_1B+a_1w_3 &=& w_3(((-B+a_3I)(-B)+a_2I)(-B)+a_1I) \\ w_{-1}= -w_0B+a_0w_3 &=& w_3((((-B+a_3I)(-B)+a_2I)(-B)+a_1I)(-B)+a_0I) = w_3p(-B) = 0 \end{eqnarray*} see Horner's method. Then we have $$ \begin{pmatrix} & w_{0} & \\ \;\; & \vdots & \;\; \\ & w_{d-1} & \end{pmatrix} (-B) = \begin{pmatrix} & -w_0B & \\ & -w_1B & \\ & \vdots & \\ & -w_{d-1}B & \end{pmatrix} \\ = \begin{pmatrix} & -a_0w_{d-1} & \\ & w_0-a_1w_{d-1} & \\ & \vdots & \\ & w_{d-2}-a_{d-1}w_{d-1} & \end{pmatrix} = C \begin{pmatrix} & w_{0} & \\ \;\; & \vdots & \;\; \\ & w_{d-1} & \end{pmatrix} $$ It can also be shown that the row vectors $w_0,\ldots,w_{d-1}$ are linearly independent, in the same way as we have shown that $v_0,\ldots,v_{d-1}$ are linearly independent. This linear independence guarantees $X\neq 0$ when we set $$ X= \begin{pmatrix} & & \\ v_0 & \dots & v_{d-1} \\ & & \end{pmatrix} \cdot \begin{pmatrix} & w_{0} & \\ \;\; & \vdots & \;\; \\ & w_{d-1} & \end{pmatrix} $$ The complete construction works for any field $F.$

  • Thank you so much for your answer! Just one question: how do we know that your construction of the columns of $S_A$ (and rows of $S_B$) are valid? – Riley Oct 04 '17 at 22:58
  • It follows from the existence of the Frobenius normal form – Reinhard Meier Oct 04 '17 at 23:02
  • How is your construction of the $k$th column of $S_A$, $A^kv$, related to the Frobenius normal form? – Riley Oct 05 '17 at 00:40
  • Maybe I mixed up the various normal forms. I will explain the construction in more detail tomorrow. It is late after midnight in my time zone, now. – Reinhard Meier Oct 05 '17 at 00:48
  • I have added the construction of the first $d$ columns of $S_A$ and the first $d$ rows of $S_B.$ – Reinhard Meier Oct 05 '17 at 10:05
  • How do we know that there exist nonzero elements in the kernel of $p(A)$ and the left kernel of $p(-B)$? – Riley Oct 05 '17 at 12:42
  • Take a field extension in which the roots $\lambda_1,\ldots,\lambda_d$ of $p$ exist. Then $\det(p(A))=\det((A-\lambda_1I)\cdot\ldots\cdot(A-\lambda_dI))$ $=\det(A-\lambda_1I)\cdot\ldots\cdot\det(A-\lambda_dI)=\chi_A(\lambda_1)\cdot\ldots\cdot\chi_A(\lambda_d)=0.$ Note that $\chi_A(\lambda_1)\cdot\ldots\cdot\chi_A(\lambda_d)$ can be expressed in the original field, because it is symmetric in $\lambda_1,\ldots,\lambda_d$ and the related elementary symmetric polynomials are $\pm$ the coefficients of $p.$ – Reinhard Meier Oct 05 '17 at 18:34
  • Thanks, that makes sense! Although, I don't see why you have to show that $\chi_A(\lambda_1)\cdots\chi_A(\lambda_d)$ can be expressed in the original field if we already know it equals 0. – Riley Oct 05 '17 at 18:54
  • If $A$ and $-B$ have a common eigenvalue (over $\mathbb{C}$), do they necessarily share an irreducible factor (over $F$)? – smapers Jun 16 '20 at 19:00
  • 1
    Their characteristic polynomials share the minimal polynomial of that eigenvalue as a common factor. – Reinhard Meier Jun 16 '20 at 22:14
2

For any $n\times n$ square matrix $A$ with entries in a field $F$, let $M^A$ be the vector space $F^n$ with $F [x]$-module structure in which the multiplication by $x$ is applying the multiplication by $A$ on the left. Let $A$ be $m\times m$ matrix over the field $F$, and $B$ be $n\times n$ matrix over the field $F$, $X$ be $m\times n$ matrix over $F$. Let $C_{A,B}=\{X \in M_{m\times n}(F)| AX+XB=0\}$. Then the solution $X$ to the Sylvester's equation $AX+XB=0$ can be interpreted as a module homomorphism from $M^{-B}$ to $M^A$. This can be summarized as $$ C_{A,B} \simeq \mathrm{Hom}_{F[x]} (M^{-B},M^A) $$ as $F$-vector spaces.

The RHS is nontrivial if and only if $M^{-B}$ and $M^A$ has a common irreducible polynomial as the invariant factors.

In your case, the field is $F=\mathrm{GF}(2)$, and both $M^A$, $M^{-B}$ have $x^2+x+1$ as invariant factors. Thus, the RHS is nontrivial. Hence, $C_{A,B}$ is nontrivial.

Sungjin Kim
  • 20,850
  • I didn't specify that the matrices have to be $2\times 2$. I just used $x^2+x+1$ as an example characteristic polynomial. Does your answer generalize to different sizes, given that $X,C$ are $m\times n$ matrices, $A$ is $m\times m$, and $B$ is $n\times n$? – Riley Oct 05 '17 at 01:05
  • Edited the answer accordingly. The same argument works and for $X,C$ $m\times n$, $A$ is $m\times m$, $B$ is $n\times n$. – Sungjin Kim Oct 05 '17 at 06:42
  • Ok. By the way, it looks like I mistakenly said "$C$ is $m\times n$" when there is no matrix $C$ in the equation. You should remove any references to a matrix $C$ in your answer. Also, I have not learned about modules yet, so I'd appreciate it if you could clarify a few things in your answer for me to understand your answer. 1. How do we know $C_{A,B}$ can be interpreted as a module homomorphism? 2. How do you conclude that the RHS is nontrivial under your condition? 3. You defined $M^A$ in terms of left multiplication. The term $XB$ uses right multiplication. Is using $M^{-B}$ valid? – Riley Oct 05 '17 at 15:35
  • $AX+XB=0$ guarantees that the multiplication by elements in $F[x]$ is preserved in the module homomorphism. For any module homomorphism $f$ and $r\in F[x]$, we must have $f(ra)=rf(a)$ for any $a\in M^{-B}$. – Sungjin Kim Oct 05 '17 at 17:34
  • What's your answer to my other 2 questions? Particularly #3. I don't see how $M^{-B}$ is valid in this case since it is defined in terms of left multiplication but $XB$ uses right multiplication. – Riley Oct 06 '17 at 02:17
  • My previous comment answers your questions 1 and 3. For question 2, see https://en.wikipedia.org/wiki/Structure_theorem_for_finitely_generated_modules_over_a_principal_ideal_domain . In your case, $M^A$ and $M^{-B}$ both contains a factor $F[x]/(x^2+x+1)^k$ for some $k\geq 1$. That's why the RHS is nontrivial. – Sungjin Kim Oct 06 '17 at 04:32
  • I greatly appreciate your answer, but it's just beyond my comprehension at this point. I hope you can understand I accepted the other answer also because it came first, and provides a method of construction for the solutions X. – Riley Oct 06 '17 at 13:33