20

Recall that the permanent is the 'positive analog' of the determinant whereby the signs in the cofactor expansion process are taken as positive. That is, the permanent is the immanant corresponding to the trivial character.

Many enumerative problems involving permutations and many enumerative problems involving graph theory may be reformulated using the permanents of binary matrices.

I have previously considered the natural combinatorial problem of determining the number A192892$(n)$ of $n \times n$ binary matrices $A$ such that $\det\left(A\right) = \text{perm}\left(A\right)$. Observe that A192892$(n)$ is also equal to the number of binary matrices $\left( a_{i, j} \right)_{n \times n}$ such that the product $$a_{1, \sigma(1)}a_{2, \sigma(2)}\cdot \cdots \cdot a_{n, \sigma(n)}$$ vanishes for all odd permutations $\sigma \in S_{n}$.

I have computed A192892$(n)$ for $n \leq 4$. Obviously, brute force algorithms for this enumerative problem are very inefficient. So it is natural to ask:

(1) Is there a simple combinatorial formula for A192892$(n)$?

(2) Is there a polynomial-time algorithm for computing A192892$(n)$?

  • 1
    The narrow case $\det A = 0 = \text{perm} A$ seems easy enough to deserve a mention. – hardmath Mar 21 '16 at 18:52
  • 4
    I can take brute force a little further. Rather than iterating through all $2^{n^2}$ binary $n\times n$ matrices and testing each one for odd permutations, you can iterate through $2^{n^2-n}$ binary $n\times(n-1)$ matrices and test each one for the number of 1s that may be put in the final row. Doing this with optimized C++ code, we can get up to $n=6$ with a few minutes of computation. I found $a(3)=343, a(4)=34997, a(5)=12515441, a(6)=15749457081$. – Chris Culter Mar 25 '16 at 06:27
  • 1
    At this rate, it would take two months to compute $a(7)$ on a single CPU core. – Chris Culter Mar 25 '16 at 06:36
  • 5
    I got $a(7)=72424550598849$ in a few minutes by grouping matrices related by row swaps (a factor of almost $7!$ speedup) and running on $8$ CPU cores. Now, it would take months to compute $a(8)$. – Chris Culter Mar 26 '16 at 09:37
  • can you send me your code? – nik Mar 26 '16 at 12:38
  • 1
    It should be noted that this is the same as the cardinality of this set: $$ { A = (a_{i,\sigma(i)}) \in {0,1}^{n\times n} | \forall \sigma\in S_n : \text{sgn}(\sigma) = -1 : \exists i: a_{i,\sigma(i)} = 0 } $$ But you probably found that already :) This is the result of working out $0 = \text{perm}(A) - \det(A)$ and note that this is zero if and only if $\sum_{\sigma \text{ odd}} \mathbb{1} { \forall i: a_{i,\sigma(i)} =1 } = 0$ (where $\mathbb{1}$ denotes the indicator function), so for all odd $\sigma$: $\forall i: a_{i,\sigma(i)} =1$ must yield "false". – Jasper Mar 26 '16 at 14:59
  • 2
    @nik Here you go: https://github.com/Culter/PermDet I make no guarantees about the quality or stability of this code. :) I've been considering adding tests and reference versions of the algorithm as I continue to improve it, but we'll see. – Chris Culter Mar 27 '16 at 05:14
  • thank you, looks very nice. now to find a(8) – nik Mar 27 '16 at 12:25
  • 4
    @nik No prob, thanks for the compliment! I have a couple ideas for making a(8) fast. To be brief and vague, I can prune and shuffle the set of target permutations earlier in the search tree, and I can optimistically combine permutations before enumerating the final row. I can also re-use some intermediate computations for row values that differ by a single bit. I suspect this will bring a(8) under a day of computation, while a(9) will stay out of reach unless we discover another big win (maybe column swaps)? – Chris Culter Mar 28 '16 at 19:14
  • 4
    Update for those on the edges of their seats: The above improvements got a(7) down to 15 seconds, but I haven't improved the complexity much, so a(8) is still slow. I'm going to try a limited amount of quotienting out column swaps; it won't be another $n!$ speedup, but maybe it'll be enough to reach a(8). – Chris Culter Mar 30 '16 at 16:13
  • 4
    With the above improvements plus dynamic programming to reduce the effort of tracking permutations, I got a(7) down to 0.3 seconds, and $a(8)=1282759836215548737$ in a few minutes! Updated Github. At the current complexity, with some changes to handle numbers greater than 64 bits, a(9) should take less than a day. I might take a break from doing that, though, and write up what I've done so far in an answer. – Chris Culter Apr 12 '16 at 18:33

1 Answers1

8

To address questions (1) and (2), let's start with hardmath's comment. The set of matrices with zero permanent is a subset of the set we want to count, and it's easier to describe. Nonetheless, the number of such matrices is recorded in https://oeis.org/A088672 only up to $6\times6$, combining the efforts of three contributors. In the literature, we find this article:

The authors apply standard techniques such as Inclusion-Exclusion, but they can only get asymptotic bounds. All this suggests to me that there is no obvious, simple formula or algorithm. Maybe there exists a simple formula, but finding it will take some novel insight into these problems.

Now on to brute force. If we naively iterate through all $2^{n^2}$ matrices and check each one against all $n!/2$ odd permutations, then we can compute only a few terms of the sequence. Even if we perform 20 billion checks per second, computing $a(7)$ would take two years, and $a(8)$ would take 600,000 years.

Now, it's hard to erase that $2^{n^2}$ term. Still, there are several key speedups, each of which makes roughly another term feasible. They can be implemented in this order:

  1. Instead of iterating over possible values of the last row, compute how many spaces on the last row are "free", meaning that putting a $1$ there would not complete an odd permutation with the given values in the previous $n-1$ rows. If $k$ is the number of free spaces, then $2^k$ is the number of values of the last row that avoid all odd permutations, so we can immediately add that number to our running total. This increases our efficiency by a factor of $2^n/n$.
  2. Dynamic programming. As we iterate through the rows of the matrix, filling in values, reduce the set of odd permutations to check by grouping them according to their values in the remaining rows. By the time we enumerate the next-to-last row, we have only $n^2$ permutations to check. This increases our efficiency by a factor of $n!/n^2$.
  3. Column swaps. Modulo the sign of the permutations, we really only have to inspect representative matrices from orbits under the action of the symmetry group $(S_n\times S_n)\rtimes\mathbb Z/2$, which acts by permuting the rows and columns and transposing the matrix. In other words, we only need to check bipartite graphs up to graph isomorphism. Now, it's a little awkward to solve the graph isomorphism problem in the inner loop of the algorithm. A better tactic is to break up the orbits into more predictable chunks. First, we don't want to give up the big speedup that we got from projecting out the last row, so the symmetry group is reduced to $S_{n-1}\times S_n$. Let's take advantage of the bigger $S_n$ factor first, which swaps columns. We do this by only enumerating columns in increasing order, where the order is defined by taking the value of a column as a binary expansion, where the top row is the most significant bit. This can be computed row-by-row, and it dramatically reduces the number of matrices considered, by almost $n!$. I say "almost" for two reasons. One, some matrices will have large stabilizers under this action, equivalently small orbits. Second, we now have to keep track of even permutation matrices as well as odd permutation matrices, since some orbits will have to include odd permutations of the columns. And it takes a little effort to enforce the ordering and compute the stabilizers. But those problems aside, this increases our efficiency by $n!$.
  4. Row swaps. We can't get an similar $(n-1)!$ speedup by enforcing the same ordering among rows, because the previous column permutations don't preserve the binary value of a row. We can get close, though, by ordering rows according to their Hamming weight, which is preserved by column permutations. There are only $n+1$ possible weights, so this scheme results in unfortunately large stabilizers, and therefore unfortunately small orbits. But even if the typical stabilizer order is $2^{n-1}$, that's still a speedup of $(n-1)!/2^{n-1}$, which is totally worthwhile.
  5. Code optimization. A subset of the set of $2\times n$ partial permutations matrices can be represented as an $n^2$ bitfield. If $n\leq8$, this fits is a single $64$-bit machine register, and using $256$-bit SIMD instructions, we can process $4$ of those per cycle. Use a low-level language with support for generic programming, like C++, to eliminate dynamic allocation, minimize space of arrays, and optimize each row individually. Also, the problem is embarrassingly easy to parallelize to multiple CPU cores, or even multiple machines.

Improvements 1-4 are implemented in https://github.com/Culter/PermDet. They result in the following values:

$a(0)=1$
$a(1)=2$
$a(2)=12$
$a(3)=343$
$a(4)=34997$
$a(5)=12515441$
$a(6)=15749457081$
$a(7)=72424550598849$
$a(8)=1282759836215548737$

In fact, $a(9)$ is well in reach using this algorithm, but it would take greater-than-$64$-bit arithmetic to implement, which I haven't done.

Chris Culter
  • 27,415