Partitioning a multiset into multisets of fixed sizes

Question

Say we have a multiset $S(\mathbf{d}$) where $\mathbf{d}$ is a list of $l$ numbers and the multiplicity of the $i$th element of $S$ is $d_i$. The cardinality $N$ of $S$ is $\sum d_i$.

We want to partition $S$ into $m$ multisets of size $k_i$ respectively, so that $\sum k_i = \sum d_i = N$. How many ways can we do this?

In my mind this is a generalization of the multinomial coefficient $\binom{n}{k_1,k_2,\ldots,k_m}$ representing the number of ways to partition a set of $n=\sum k_i$ objects into $m$ bins of sizes $k_i$, to a sort of number like $\binom{\mathbf{d}}{k_1,k_2,\ldots,k_m}$ or $\binom{\mathbf{d}}{\mathbf{k}}$ representing the number of ways to partition a multiset of $n=\sum k_i = \sum d_i$ into $m$ bins of sizes $k_i$.

There are a few special cases that are simpler to calculate:

If $m=1$, then clearly $k_1 = N$ and you're choosing the whole multiset. So $\binom{\mathbf{d}}{(N)} = 1$
If $m=2$, then you only have to handle choosing $k_1$ or $k_2$ elements from a multiset, because the rest will be the other set. So, as mentioned below, you can use a generating function and $\binom{\mathbf{d}}{(k_1,k_2)}$ is equal to the coefficient of $x^{k_1}$ or $x^{k_2}$ in $\prod\limits_{i=1}^l 1 + x^2 + \cdots + x^{d_i} = \prod\limits_{i=1}^l \frac{1-x^{d_i - 1}}{1 - x}$. But then you also need to account for the fact that order doesn't matter, which I'm not sure how to do. Like in the first example below, you would find that there are $3$ ways to choose $2$ elements, but there are only $2$ ways to split the multiset because you have to choose 2 of them that are compatible.

Examples

Let's say that $\mathbf{d} = (2, 2)$, so $S(\mathbf{d})$ might be $\{a, a, b, b\}$. Let $k_1 = k_2 = 2$, so we need to find all the ways of splitting $S$ into two sub-multisets of size $2$. There are exactly $2$ ways of doing this: $\{\{a,a\},\{b,b\}\}$ and $\{\{a,b\},\{a,b\}\}$, so $\binom{(2,2)}{(2,2)} = 2$.

Another example: $\mathbf{d} = (2,2)$, so $S(\mathbf{d})$ could be $\{a,a,b,b\}$. Let $k_1 = 1$, $k_2 = 1$, and $k_3 = 2$. There are $3$ ways of doing this: $\{\{a\},\{a\},\{b,b\}\}$, $\{\{b\},\{b\},\{a,a\}\}$, and $\{\{a\},\{b\},\{a,b\}\}$. So $\binom{(2,2)}{(1,1,2)}=3$.

My Attempts

I've tried to figure this out two ways. The first was to find a recurrence relation and some base cases, kind of how Stirling numbers of the second kind can be computed using the identity $S(n,k) = kS(n-1,k) + S(n-1,k-1)$. I tried to think about what happens if you already have a partition and want to add an element to the original multiset, but then you have to decide which bin that element will go into or whether or not to add a new bin.

I also tried to derive it in the way that multinomial coefficients are derived, by counting the number of ways to fill the first bin, and then the second, and so on. The number of ways to choose $k_1$ elements from the multiset to put in the first bin can be computed by finding the coefficient of $x^{k_1}$ in $\prod\limits_{i=1}^l 1+x+x^2+\cdots+x^{d_i}$, which isn't explicit but it's a start. But then, depending on which elements you chose, you don't know how to adjust your multiset to reflect the remaining elements.

Partitioning of a set of $n$ distinguished members is counted by the exponential Bell polynomial (and by Stirling number of the second kind and Bell number). This is the set partition problem. Maybe there is a possibility to change the calculation algorithm of this combinatorial numbers to consider multisets. — IV_, Jul 20 '18 at 14:46

Marko Riedel · Accepted Answer · 2019-01-25T22:56:18.127

It would appear that these are multisets of multisets which may be enumerated using the Polya Enumeration Theorem (PET). Let the multiset that is being drawn have factorization

$$\prod_{k=1}^m B_k^{\sigma_{k}}$$

where $k$ is the value of a term and $\sigma_k$ the number of times it ocurs and recall that we have $l$ types of items forming the source multiset

$$\prod_{k=1}^l A_k^{\tau_{k}}.$$

The answer is then given by

$$\left[\prod_{k=1}^l A_k^{\tau_{k}}\right] \prod_{k=1}^m Z\left(S_{\sigma_k}; Z\left(S_k; \sum_{k'=1}^l A_{k'}\right)\right).$$

In terms of combinatorial classes we have made use of the unlabeled class

$$\def\textsc#1{\dosc#1\csod} \def\dosc#1#2\csod{{\rm #1{\small #2}}} \textsc{MSET}_{=\sigma_k} \left(\textsc{MSET}_{=k} \left(\sum_{k'=1}^l \mathcal{A}_{k'}\right)\right).$$

As an example for ${2,2\choose 1,1,2} = 3$ we get

$$\textsc{MSET}_{=2} (\textsc{MSET}_{=1}(\mathcal{A_1}+\mathcal{A}_2)) \times \textsc{MSET}_{=1} (\textsc{MSET}_{=2}(\mathcal{A_1}+\mathcal{A}_2)).$$

As an additional example we find for ${2,2,4\choose 1,1,3,3} = 16$

$$\textsc{MSET}_{=2} (\textsc{MSET}_{=1}(\mathcal{A_1}+\mathcal{A}_2+\mathcal{A}_3)) \times \textsc{MSET}_{=2} (\textsc{MSET}_{=3}(\mathcal{A_1}+\mathcal{A}_2+\mathcal{A}_3)).$$

Here we have used the cycle index of the symmetric group $Z(S_n)$, which is computed from the recurrence by Lovasz which says that

$$Z(S_n) = \frac{1}{n} \sum_{l=1}^n a_l Z(S_{n-l}) \quad\text{where}\quad Z(S_0) = 1.$$

For this to be effective we need to compute the iterated cycle index when $Z(S_k)$ is substituted into $Z(S_{\sigma_k}).$ This is accomplished with the substitution rule for substitution of the former into the latter:

$$a_q = Z(S_k;\; a_1=a_q, \; a_2=a_{2q}, \; a_3=a_{3q}, \; \ldots).$$

We have used the notation $Z(S_k; F)$ for substitution of a generating function and on the previous line, the notation for substitution into the variables of the cycle index. This is in fact all we need and we can start computing some of these multiset coefficients. For example we find for the example given by OP the cycle index

$$Z(B_1^2 B_2) = 1/4\,{a_{{1}}}^{4}+1/2\,{a_{{1}}}^{2}a_{{2}}+1/4\,{a_{{2}}}^{2}.$$

Continuing with the example we get

$$Z(B_1^2 B_2; A_1+A_2) = 1/4\, \left( A_{{1}}+A_{{2}} \right) ^{4} +1/2\, \left( A_{{1}}+A_{{2}} \right) ^{2} \left( {A_{{1}}}^{2}+{A_{{2}}}^{2} \right) \\ +1/4\, \left( {A_{{1}}}^{2}+{A_{{2}}}^{2} \right) ^{2} \\ = {A_{{1}}}^{4}+2\,{A_{{1}}}^{3}A_{{2}} +3\,{A_{{1}}}^{2}{A_{{2}}}^{2}+2\,A_{{1}}{A_{{2}} }^{3}+{A_{{2}}}^{4}$$

and we confirm the value $3$ obtained by OP. This algorithm will make it possible to compute cycle indices not obtainable by enumeration. As an extra example we find the following excerpt from the cycle index for $[2,2,2,3,5,5]:$

$$Z(B_2^3 B_3 B_5^2) = \ldots +{\frac {11\,{a_{{1}}}^{8}a_{{2}}a_{{4}}a_{{5}}}{7200}} +{\frac {49\,{a_{{1}}}^{7}{a_{{2}}}^{2}a_{{3}}a_{{5}}}{14400}} \\ +{\frac {5\,{a_{{1}}}^{7} a_{{2}}{a_{{3}}}^{2}a_{{4}}}{1152}} +{\frac {1021\,{a_{{1}}}^{6}{a_{{2}}}^{3}a_{{3}}a_{{4}}}{69120}} +{\frac {43\,{a_{{1}}}^{7}a_{{2}}a_{{4}}a_{{6}}}{17280}}+\ldots$$

Here are some additional values that may assist the reader who is investigating this problem to verify the results of their approach:

$${1,3,3\choose 3,4} = 7, \; {2,3,3\choose 4,4} = 5, \; {2,3,3\choose 2,2,4} = 16 \quad\text{and}\quad {1,2,3,3\choose 2,3,4} = 87.$$

The Maple code for this problem was as follows.

with(combinat);


pet_cycleind_symm :=
proc(n)
option remember;

    if n=0 then return 1; fi;

    expand(1/n*
           add(a[l]*pet_cycleind_symm(n-l), l=1..n));
end;

pet_varinto_cind :=
proc(poly, ind)
local subs1, subs2, polyvars, indvars, v, pot, res;

    res := ind;

    polyvars := indets(poly);
    indvars := indets(ind);

    for v in indvars do
        pot := op(1, v);

        subs1 :=
        [seq(polyvars[k]=polyvars[k]^pot,
             k=1..nops(polyvars))];

        subs2 := [v=subs(subs1, poly)];

        res := subs(subs2, res);
    od;

    res;
end;

pet_cycleind_comp :=
proc(idxtrg, idx)
local varstrg, vars, vt, sbstrg, sbs, len;

    varstrg := indets(idxtrg);
    vars := indets(idx);

    sbstrg := [];

    for vt in varstrg do
        len := op(1, vt);

        sbs :=
        [seq(v = a[op(1, v)*len], v in vars)];

        sbstrg :=
        [op(sbstrg),
         a[len] = subs(sbs, idx)];
    od;

    expand(subs(sbstrg, idxtrg));
end;

pet_cycleind_mset :=
proc(msetlst)
option remember;
local mset, res, ent, idxtrg, idx;

    mset := convert(msetlst, `multiset`);

    res := 1;

    for ent in mset do
        idx := pet_cycleind_symm(ent[1]);
        idxtrg := pet_cycleind_symm(ent[2]);

        res := res *
        pet_cycleind_comp(idxtrg, idx);
    od;

    expand(res);
end;


GENF :=
proc(src, msetlst)
local vars, srcp, res, gf, term;

    vars := add(A[q], q=1..nops(src));
    srcp := mul(A[q]^src[q], q=1..nops(src));

    gf := expand(pet_varinto_cind
                 (vars, pet_cycleind_mset(msetlst)));

    if not type(gf, `+`) then
        gf := [gf];
    fi;

    res := 0;

    for term in gf do
        if type(srcp/term, `polynom`) then
            res := res + term;
        fi;
    od;

    res;
end;

The syntax to compute ${\mathrm{A}\choose \mathrm{B}}$ is documented by the following examples:

> GENF([1,2,3,3], [2,3,4]);

                        2     3     3
            87 A[1] A[2]  A[3]  A[4]

> GENF([1,2,3,3], [2,2,5]);

                        2     3     3
            33 A[1] A[2]  A[3]  A[4]

> GENF([1,1,1,1], [2,2]);  

             3 A[1] A[2] A[3] A[4].

The last one is $\frac{1}{2} {4\choose 2}.$

Addendum. There is a slight improvement on this algorithm at the following MSE link.

This is fantastic! I'm having some trouble understanding why this works, and specifically why this is a multiset of multisets, which I think is related to why we're itersting the cycle index of the symmetric group. What do you mean by a factorization of a multiset, and source multiset? — JJW5432, Jul 26 '18 at 00:14
Thank you. The cycle index of the symmetric group implements the multiset operator in the symbolic method. Now your subsets are multisets and since we may choose them more than once we get multisets of multisets. The support of a multiset is a partition and in cycle indices these are usually written in factorized form, hence I chose to use this notation here as well. — Marko Riedel, Jul 26 '18 at 09:22
For more information consult Wikipedia on the symbolic method. — Marko Riedel, Jul 26 '18 at 09:31
I've read the first chapter of Flajolet and Sedgewick's "Analytic Combinatorics" on the symbolic method, so now I understand more about combinatorial classes and generator functions. Can you explain why the construction $$\def\textsc#1{\dosc#1\csod} \def\dosc#1#2\csod{{\rm #1{\small #2}}} \textsc{MSET}{=\sigma_k} \left(\textsc{MSET}{=k} \left(\sum_{k'=1}^l \mathcal{A}_{k'}\right)\right)$$ is the correct one? — JJW5432, Jul 28 '18 at 03:39
This part follows by inspection for the user of the symbolic method. I have added two examples to clarify the notation. The book by Harary and Palmer Graphical Enumeration is a useful reference for the nested cycle indices. I also checked my work using several enumeration routines for cycle indices and multisets. — Marko Riedel, Jul 28 '18 at 12:50
Okay I think the last bit I don't understand is your notation for cycle indices. You provide a recurrence relation for the cycle index of symmetric groups, but then you talk about cycle indices like $Z(B_1^2B_2) = \frac{1}{4}a_1^4 + \frac{1}{2}a_1^2a_2 + \frac{1}{4}a_2^2$. What group is that the cycle index of and how do you calculate it? — JJW5432, Aug 07 '18 at 16:45
Given a multiset of values and their multiplicities construct a row of slots by sequentially placing adjacent slots, $k$ of them for each multiset of size $k.$ Now we may permute the slots corresponding to a single multiset in $k!$ ways (symmetric group $S_k$) and we may also simultaneously permute the slots corresponding to multisets of the same size in $\sigma_k!$ ways (symmetric group $S_{\sigma_k}$). (...) — Marko Riedel, Aug 07 '18 at 17:03
(...) The totality of all combinations of these permutations (inside a multiset and between multisets of the same size) forms a group and it is the cycle index of that group acting on the slots that we compute with the substitution rule cited above. The slots are the recipients of the variables $\mathcal{A}_k.$ N.B. The elements $k$ of the multiset that is the source themselves represent multisets of size $k.$ — Marko Riedel, Aug 07 '18 at 17:14
Ah that makes sense, and I understand why that is what's going on with the $\text{MSET}$ notation where you expanded out the product. But how do you calculate those polynomials with the $Z(B_1^2B_2)$ from the cycle groups for some symmetric groups? I think it's related to the substitution rule you mentioned, but I don't understand that so well. — JJW5432, Aug 07 '18 at 19:06
For example $Z(B_1^2B_2) = \frac{1}{4}a_1^4 + \frac{1}{2}a_1^2a_2 + \frac{1}{4}a_2^2$ is equal to $Z(S_2)^2$; why is that what you want here? — JJW5432, Aug 07 '18 at 19:08
For $B_1^2 B_2$ we get two multisets containing one element and one containing two elements, for a total of four slots. We get two permutations of the two multisets $B_1$ containing one element: identity and exchange. There are two permutations of the two slots of the multiset $B_2:$ identity and exchange. Collect and factor to obtain the cycle index. — Marko Riedel, Aug 07 '18 at 19:31
And because there are two permutations for both, both are $S_2$ which is how you get $Z(S_2)^2$? And then how do you do that formulaically/programmatically? — JJW5432, Aug 07 '18 at 19:35
It is all documented in the two examples and the Maple code (first example for your query). — Marko Riedel, Aug 07 '18 at 19:54

score 3 · Answer 2 · answered Jan 06 '19 at 07:14

I'm posting an implementation of Marko Riedel's algorithm in Sage because Sage is open source and widely available. It works on all the examples he posted, but for larger examples like $\binom{49, 49, 1, 1}{10, 10, 10, 10, 10, 10, 10, 10, 10, 10}$ it's hanging.

#!/usr/bin/env sage

import sys
from sage.all import *

Sym = SymmetricFunctions(QQ)
p = Sym.powersum()

def sub_cycle_index(Zout, Zin):
    """Substitutes Zin into Zout to produce Zout evaluated at Zin.

    This is accomplished by replacing p[i] in Zout with Zin, but with
    every p[j] in Zin replaced with p[i*j].
    """

    return p.sum(c * p.prod(Zin.frobenius(i) for i in mu) for mu, c in Zout)

def multiset_cycle_index(ms):
    """The cycle index of the given multiset, given by the formula

    .. math:: \prod_{\{k\}}\left( Z(S_{\sigma_k}; Z(S_k))\right)

    where :math:`\{k\}` are the elements of the multiset and
    :math:`\sigma_k` is the multiplicity of the element :math:`k`.
    """

    Z = lambda i: SymmetricGroup(i).cycle_index()
    return p.prod(sub_cycle_index(Z(sk), Z(k)) for k, sk in ms.items())

def list_to_multiset(els):
    """Converts a list of elements representing a multiset to a dictionary
    where the keys are the elements of the multiset and the values are
    the multiplicities.
    """

    ms = {}
    for x in els:
        ms[x] = ms.get(x,0) + 1
    return ms

def mset_choose(s, d):
    """Compute the "multiset coefficient" :math:`\binom{s}{d}`."""

    A = PolynomialRing(QQ, len(s), 'A').gens()
    mono = prod(a^i for a, i in zip(A, s))
    Z = multiset_cycle_index(list_to_multiset(d))
    return Z.expand(len(A), A).coefficient(mono)

if __name__ == '__main__':
    if len(sys.argv) != 3:
        print("Usage: %s 's_1, s_2, ..' 'd_1, s_2, ..'" % sys.argv[0])
        print("Outputs the number of ways the multiset s can be partitioned into multisets of sizes d_i.")
        sys.exit(1)

    s = map(int, sys.argv[1].split(' '))
    d = map(int, sys.argv[2].split(' '))

    if sum(s) != sum(d):
        print("The sum of the elements of s must equal the sum of the elements of d")
        sys.exit(1)

    print(mset_choose(s, d))

Partitioning a multiset into multisets of fixed sizes

Examples

My Attempts

2 Answers2

Linked

Related