0

How to count all the ordered lists (of any length) that can be made from the letters of a given word? Let's denote this by $f(w)$. Is there a better way than the following (grouping by how many of each repeated character appears in the list)?

Given a word $w$ of length $n$, let $\textbf c$ be the vector of its character counts. Let $m$ be the number of $1$'s in $\textbf c$. Form the vector $\textbf t_{tot}$ by dropping $1$'s from $\textbf c$.

$$f(w) = \sum_{\textbf t} \sum_{k=0}^{n-|\textbf t|_1} {k+|\textbf t|_1 \choose k \space \dots \textbf t} (mPk) $$

where $t$ runs over all integer vectors satisfying $ \textbf 0 \leq \textbf t \leq \textbf t_{tot}$ component-wise. I call these "subtakes". The notation $|\textbf t|_1$ means the sum of components of $\textbf t$. The multinomial coefficient has $k$ and the elements of $\textbf t$. The notation $mPk = (m)_k = m(m-1)\dots(m-k+1)$.

Example: $w = \text{"missisippi"}$.
$\textbf c=(1, 4, 3, 2)$
$\textbf t_{tot} = (4,3,2)$
$\textbf t$ runs over [(0, 0, 0), (0, 0, 1), (0, 0, 2), (0, 1, 0), (0, 1, 1), (0, 1, 2), (0, 2, 0), (0, 2, 1), (0, 2, 2), (0, 3, 0), (0, 3, 1), (0, 3, 2), (1, 0, 0), (1, 0, 1), (1, 0, 2), (1, 1, 0), (1, 1, 1), (1, 1, 2), (1, 2, 0), (1, 2, 1), (1, 2, 2), (1, 3, 0), (1, 3, 1), (1, 3, 2), (2, 0, 0), (2, 0, 1), (2, 0, 2), (2, 1, 0), (2, 1, 1), (2, 1, 2), (2, 2, 0), (2, 2, 1), (2, 2, 2), (2, 3, 0), (2, 3, 1), (2, 3, 2), (3, 0, 0), (3, 0, 1), (3, 0, 2), (3, 1, 0), (3, 1, 1), (3, 1, 2), (3, 2, 0), (3, 2, 1), (3, 2, 2), (3, 3, 0), (3, 3, 1), (3, 3, 2), (4, 0, 0), (4, 0, 1), (4, 0, 2), (4, 1, 0), (4, 1, 1), (4, 1, 2), (4, 2, 0), (4, 2, 1), (4, 2, 2), (4, 3, 0), (4, 3, 1), (4, 3, 2)]
$f(w) = 38848$.

I have coded this in SageMath:

import itertools
from collections import Counter

def subtakes(a): yield from itertools.product(*[range(v+1) for v in a])

def countOrdLists(w): lC = list(Counter(w).values()) #char counts m = sum(1 for x in lC if x==1) #number of single chars ret = 0 for t in subtakes([v for v in lC if v>1]): ret += sum(multinomial([k]+list(t))*falling_factorial(m,k) for k in range(len(w)-sum(t)+1)) return ret

def countOrdListsCheckWithBruteForce(w): ret = 0 for r in range(len(w)+1): for p in Permutations(w, r): ret += 1 #print(p) return ret

word = "missisippi" print (countOrdLists(word)) print (countOrdListsCheckWithBruteForce(word))

Idea: The order of elements in $\textbf t$ doesn't matter so we could restrict the first sum to be over non-decreasing $\textbf t$ (if we also first sort the vector $\textbf t_{tot}$). But how to count how many non-sorted $\textbf t$ correspond to a particular sorted one?

This code generates all sorted subtakes, but the coefficient that goes with it should be somehow calculated along as we generate the take.

def subtakesD(a):
    a = sorted(a)
    def make(b):
        if len(b)==len(a):
            yield tuple(b)
            return
        first = 0 if len(b)==0 else b[-1] #ensure increasing
        last = a[len(b)]
        for v in range(first, last+1):
            yield from make(b+[v])
        return
    yield from make([])
    return

EDIT
The generating function solution coded in SageMath:

from collections import Counter

def countOrdListsGF(w): c = Counter(w).values() R.<z> = QQ[] f = prod(sum(1/factorial(k)z^k for k in range(m+1)) for m in c) #return integral(e^(-x)f(x), x, 0, infinity) return sum(a*factorial(j) for j,a in enumerate(f.list()))

ploosu2
  • 12,367
  • In other words, you want to count the number of 'words' (i.e. distinct character strings) that can be made from the letters of a given 'word'. If you expect to only calculate this via code, then a simple recursive anagram routine will suffice. – Daniel Mathias Dec 05 '21 at 13:12

1 Answers1

2

Here is a solution using exponential generating functions. This approach is most convenient if you have access to a computer algebra system. I used Mathematica, but I suppose SageMath or Wolfram Alpha should work as well. Readers not familiar with generating functions may find many resources in the answers to this question: How can I learn about generating functions?

The exponential generating function for the number of $n$-letter words taken from MISSISIPPI is $$f(x) = (1+x) \left( 1+x+\frac{1}{2!} x^2 \right) \left( 1+x+\frac{1}{2!} x^2 + \frac{1}{3!} x^3 \right) \left( 1+x+\frac{1}{2!} x^2 + \frac{1}{3!} x^3 + \frac{1}{4!} x^4 \right)$$ (Note: the usual spelling is MISSISSIPPI, with four S's. I have stuck with the spelling in the OP for the sake of consistency.) The number of $n$-letter words is the coefficient of $(1/n!) x^n$ when $f(x)$ is expanded: $$f(x) = 1 + 4x + \frac{15}{2!} x^2 + \frac{53}{3!}x^3 + \frac{175}{4!}x^4 + \\\frac{535}{5!}x^5 +\ \frac{1490}{6!}x^6 + \frac{3675}{7!}x^7 + \frac{7700}{8!}x^8 + \\ \frac{12600}{9!}x^9 + \frac{12600}{10!}x^{10}$$ so, for example, the number of $5$-character words is $535$. If we add all these counts together, $1+4+15+ \dots + 12600$, the sum is $38848$. Note that this total includes one zero-length word.

However, there is a shortcut, provided our algebra system supports integration. In general, if $$f(x) = \sum_{n=0}^{\infty} \frac{a_n}{n!} x^n$$ then since $$\int_0^{\infty} e^{-x} x^n \; dx = n!$$ we have $$\sum_{n=0}^{\infty} a_n = \int_0^{\infty} e^{-x} f(x) \; dx$$ Using Mathematica to evaluate $$\int_0^{\infty} e^{-x} f(x) \; dx$$ we find the result is $38848$, agreeing with our previous result.

awkward
  • 15,626