Enumerating all subsets of a family of set by size and size of union

Question

Let $\mathcal B$ be a family of subsets of $\{1,2,\dots, N\}$. (Not necessarily all of them, just some collection.) Let also $n\leq N$. (Note: originally $n=N$.)

We define the polynomial

$$ g(x,y) = \sum_{J\subset B, \left|\bigcup_{A \in J} A \right|\leq n} x^{|J|}y^{\left|\bigcup_{A \in J} A \right|} $$

That is: we count all subsets of $B$ by their size and by the size of their union (and can ignore subsets whose union is larger than $n$).

For example if $\mathcal{B} = \{ \{1\}, \{2,4\}, \{1,2,3\}, \{2,4,5\} \}$ an example subset is $J = \{ \{1\}, \{2,4\}, \{2,4,5\} \}$ which gives the term $x^3y^4$ as $|J| = 3$ and the union is $\{1,2,4,5\}$ which has size $4$. The whole polynomial for this $\mathcal B$ is

$$ x^{4} y^{5} + 2 x^{3} y^{5} + 2 x^{3} y^{4} + x^{2} y^{5} + 2 x^{2} y^{4} + 3 x^{2} y^{3} + 2 x y^{3} + x y^{2} + x y + 1 $$

Is there a more efficient way of calculating $g$ for a given $B$, than just going through all the $2^{|B|}$ subsets. Can we use the poset structure of $\mathcal B$ somehow?

Background

I encountored this problem while thinking about inclusion-exclusion for solving this problem. So in that case we would have $N=100, n=10$ and $\mathcal B$ is all the subsets of at most size $n$ that sum to $N$. I think $|\mathcal B | = 435886$.

If we find $g(x,y)$, then I believe the solution to the link's question is given by turning $x^r$ into $(-1)^r$ and $y^u$ into $\binom{100-u}{10-u}$. So we would only need the value of $g$ after plugging in $x=-1$ and can that operation of $y^u$ be experessed somehow...

And I just realized: $\binom{N-u}{n-u} = 0$ if $n>10$, so we can add the restriction $u\leq n$.

I can think of a DP algo with memory $\mathcal{O}(2^n \cdot |\mathcal{B}|)$ and time $\mathcal{O}(2^n \cdot |\mathcal{B}|^2)$. Not sure if this helps your usecase or not. — EnEm, Jun 17 '24 at 20:02
Saw the background now. Feels like any algorithm which is developed for a general case $\mathcal{B}$, would be an overkill for your usecase. — EnEm, Jun 18 '24 at 15:06

score 1 · Answer 1 · answered Jun 17 '24 at 21:13

I describe below an algorithm using Dynamic Programming.

Let $A_k = \{0, 1, 2, \dots, k\}$ and $A_k^* = A_k \setminus \{0\}$ for all $k\in \mathbb{N}^*$. Also let $\mathcal{B} = \{\mathcal{B}_1, \mathcal{B}_2, \dots ,\mathcal{B}_{|\mathcal{B}|} \}$. Then I define a method $dp: A_{|\mathcal{B}|} \times \mathcal{P}(A_N^*) \times A_{|\mathcal{B}|} \rightarrow \mathbb{N}_0$ as follows $$dp(i, S, m) = \begin{cases} 1 \qquad \text{if } (i=0 \land S=\emptyset \land m=0) \\ \\ 0 \qquad \text{if } (i=0 \land (S\not=\emptyset \lor m>0)) \\ \\ \sum\limits_{H\in\mathcal{P}(A_N^*)}^{H\cup \mathcal{B}_i = S} dp(i-1, H, m-1) \\ \;+ \;dp(i-1, S, m) \qquad \text{if } (i>0) \end{cases}$$

Here $dp(i, S, m)$ denotes the number of subsets $J$ of $\{\mathcal{B}_1, \mathcal{B}_2, \dots, \mathcal{B}_i \}$, such that, $|J| = m$ and $\cup_{H\in J} H = S$. Then we know $$ g(x, y) = \sum_{S \in \mathcal{P}(A_N^*)}^{m \in A_{|\mathcal{B}|}} dp(|\mathcal{B}|, S, m) \cdot x^m y^{|S|} $$

Implementing the above as is takes $\mathcal{O}(2^N \cdot N |\mathcal{B}|^2)$ memory and $\mathcal{O}(2^{2N} \cdot N |\mathcal{B}|^2)$ time. Some optimizations to bring these down:

I am assuming $N$ is small enough ($\le 64$) so we can use a $64$-bit system to store sets like $S \in \mathcal{P}(A_N^*)$, and do set operations like union by bitwise OR. This brings down memory and time by a factor of $N$.
We don't need the full $A_{|\mathcal{B}|} \times \mathcal{P}(A_N^*) \times A_{|\mathcal{B}|}$ dp table at all times, but we can initialise a $\mathcal{P}(A_N^*) \times A_{|\mathcal{B}|}$ dp table for $i=0$, and sum up later terms in the same table (after handling some edge cases) to save memory by a factor of $|\mathcal{B}|$
For the general case of dp, Instead of looping like

for S from 0 to 2^N - 1:
  for H from 0 to 2^N - 1:
    if (H BIT-OR B_i) = S:
      do stuff

we can loop like

for H from 0 to 2^N - 1:
    S <- (H BIT-OR B_i)
    do stuff

saving in time by a factor of $2^N$

So finally, this algo works in $\mathcal{O}(2^N \cdot |\mathcal{B}|)$ memory and $\mathcal{O}(2^{N} \cdot |\mathcal{B}|^2)$ time.

I vaguely remember another optimisation which brings the time down to $\mathcal{O}(2^{N/2} \cdot |\mathcal{B}|^2)$, something like meet-in-the-middle during the general case loop. Will edit that in if I figure it out

Thanks for your answer! Sadly the factor $2^N$ is too big for intended usage $N=100$ (I edited that to the question, should've mentioned in the first place). And the other factor is kind of big too. But I also realized that we can restrict $|J|\leq 10$, if that's any help. — ploosu2, Jun 18 '24 at 07:41

Enumerating all subsets of a family of set by size and size of union

Background

1 Answers1