4

I came across the following question:

"In how many ways can we select 4 letters from the word MISSISSIPPI?"

This question can be solved by considering the different possibilities as follows and adding the numbers:

Ways of selecting {(4 alike)+(3 alike and 1 different)+(2 alike and 2 different)+(4 different)}

But I directly went and did 11C4 and got a marginally BIG number. What have I done wrong? Why should I approach all word-letter question like this?

Edit: I'm truly very sorry that I didn't look at other questions regarding the same topic first but I am glad that I posted again because I have received answers other than the ones related to the generating function, which is a bit too advanced for students at my level; the stars and bars method, as suggested by Shagnik can be understood much better.

Please help! Thanks so much in advance :) Regards.

  • 5
    The mistake with $\binom{11}{4}$ is that it overcounts some solutions. For example, the choice of "MISS" would be counted several times, because you could choose any of the four I's and any pair of S's. One way to solve this is to assign a variable for the number of each letter selected, subject to each variable being non-negative and $x_M + x_I + x_S + x_P = 4$. The answer is then given by a "stars-and-bars" argument (https://en.wikipedia.org/wiki/Stars_and_bars_(combinatorics)). – Shagnik Aug 29 '16 at 11:51
  • Forgive me for my lack of knowledge, but while reading that page, I came across the term k-tuple, which I didn't understand. If it's not too much to ask, can you please explain the "Stars and bars" argument very briefly? –  Aug 29 '16 at 12:41
  • Or of http://math.stackexchange.com/questions/960046/number-of-ways-to-choose-a-sequence-of-three-letters-from-the-letters-of-mississ – Ian Miller Aug 29 '16 at 12:48
  • Oh shucks. Sorry! (This question is a little different though; I'm interested in the total number of combinations and not permutations; nevertheless, this is not difference enough to ask the question all over again). –  Aug 29 '16 at 12:48
  • Ah, I just realised I overlooked a condition - you cannot choose more letters than there are. For instance, we must have $x_M \le 1$. This adds a couple of wrinkles to the calculation, so I'll post an answer explaining what needs to be done. – Shagnik Aug 29 '16 at 13:11
  • @Shagnik: Can you please briefly explain the "Stars and Bars" argument also in your answer? I feel like it is something that I will find use for in combinatorics but I don't understand the Wikipedia page. Please. Thanks :) –  Aug 29 '16 at 13:13
  • Sure, will do. By the way, a $k$-tuple just means an ordered collection of $k$ objects. In this setting, it refers to the vector $(x_M, x_I, x_S, x_P)$. – Shagnik Aug 29 '16 at 13:17

2 Answers2

4

We will determine the $4$ letters chosen by asking four questions:

  1. How many I's are there?
  2. How many M's are there?
  3. How many P's are there?
  4. How many S's are there?

We will represent the answers to these questions by $x_I, x_M, x_P$ and $x_S$ respectively*. What conditions do we have on these variables?

Since we have $4$ letters in total, we must have $x_I + x_M + x_P + x_S = 4$. Also, since each variable is counting something, it must be a non-negative integer. That is, $x_I, x_M, x_P, x_S \in \mathbb{Z}_{\ge 0}$.**


Stars and bars

Now there is a general method, often called the "stars and bars" argument, for counting the number of ways of writing a number as a sum of smaller numbers. In full generality, suppose we want to write some non-negative integer $n$ as a sum of $k$ non-negative integers; that is $$ n = y_1 + y_2 + ... + y_k, $$ where $y_1, y_2, ..., y_k \in \mathbb{Z}_{\ge 0}$.

We will draw every such sum using stars and bars; there will be $n$ stars, and $k-1$ bars splitting the separate summands. $y_1$ will be the number of stars before the first bar, $y_2$ will be the number of stars between the first and second bar, and so on, until we get $y_k$ to be the number of stars after the last ($k-1$st) bar. For example, the sum $8 = 3 + 0 + 4 + 1$ would look like $$ \underbrace{* * *}_{y_1 = 3} | \underbrace{}_{y_2 = 0} | \underbrace{* * * *}_{y_3 = 4} | \underbrace{*}_{y_4 = 1} \; .$$

One important thing to note is that we are counting ordered sums, so the order of the summands is important. For example, the sum $8 = 4 + 3 + 1 + 0$ has the different diagram $$ \underbrace{* * * *}_{y_1 = 4} | \underbrace{* * *}_{y_2 = 3} | \underbrace{*}_{y_3 = 1} | \underbrace{}_{y_4 = 0} \; ,$$ which will be counted as a distinct solution.

How does this help us count the number of sums? Well, note that every such diagram is simply a permutation of $n$ stars and $k-1$ bars. Moreover, every such permutation corresponds to a different sum. Hence we only need to count these permutations of symbols.

Now there are a total of $n+k-1$ symbols in a line. If we choose $k-1$ of thoes symbols to be bars, that determines the diagram, since the remaining symbols must be stars. Hence the number of sums is $$\binom{n+k-1}{k-1}.$$ Note that this is equal to $\binom{n+k-1}{n}$, since we could instead choose which $n$ symbols are stars.


"Going down to Mississippi"

Returning to the problem at hand, this means the number of solutions to $x_I + x_M + x_P + x_S = 4$ is $\binom{7}{3} = 35$.

However, while every set of $4$ letters corresponds to one such sum, not every sum comes from a valid choice of letters. Our final restriction comes from the law of conservation of mass, adapted for mathematical wordplay. We cannot select more copies of a letter than appear in the word "MISSISSIPPI".

This means that, along with our non-negativity assumption, we must also have $x_I \le 4, x_M \le 1, x_P \le 2$ and $x_S \le 4$.

Now note that since we are only selecting $4$ letters in total, we can never have $x_I \ge 5$ or $x_S \ge 5$, hence we can ignore those requirements. This leaves only the conditions $x_M \le 1$ and $x_P \le 2$.

Another very useful observation is that we cannot violate both of these conditions at the same time. Indeed, if we had both $x_M \ge 2$ and $x_P \ge 3$, then $x_I + x_M + x_P + x_S \ge 5$. Hence from the $35$ solutions to the sum we counted earlier, we can subtract the solutions with $x_M \ge 2$ and the solutions with $x_P \ge 3$. Since no solution satisfies both those bounds, every bad solution is subtracted exactly once, as it should be.

So how many solutions have $x_M \ge 2$? Well, a nice way to count these is to introduce a new variable, setting $z_M = x_M - 2$. We then have $x_I + z_M + x_P + x_S = x_I + x_M + x_P + x_S - 2 = 4 - 2 = 2$, with $x_I, z_M, x_P, x_S \in \mathbb{Z}_{\ge 0}$. Since we now only require all our variables to be non-negative, rather than having one being at least $2$, we have reduced this to the stars and bars setting. Using the formula, there are $\binom{2 + 3}{3} = \binom{5}{3}= 10$ such solutions.

What about $x_P \ge 3$? This time we set $z_P = x_P - 3$. A similar argument shows there are $\binom{1+3}{3} = \binom{4}{3} = 4$ solutions in this case.

Hence the final answer, that is, the number of selections of $4$ letters from "MISSISSIPPI", is $$ \binom{7}{3} - \binom{5}{3} - \binom{4}{3} = 35 - 10 - 4 = 21. $$


Final remarks

In closing, let me remark that this is not that much shorter than the case analysis you suggested in your question (indeed, I would suspect that is how you were meant to solve the problem). However, it is always good to know different solutions, and with a longer word this may have proven to be a shortcut.

That being said, with a longer word, several further complications could have arisen. Here we got lucky in that there were only two kinds of bad solutions, and no solution was bad in both ways. In general, this correction would have involved the Inclusion-Exclusion Principle, which would provide an added level of difficulty.

The nice way to solve these problems in general is to use ordinary generating functions (or exponential generating functions if dealing with permutations), as suggested in @Beta's answer. These are a little more advanced, but very powerful and incredibly interesting, and you should hopefully encounter them in some combinatorics course at a later point.


Footnotes

*Why order the variables this way? In honour of Tyrion, of course!

**Some would call this set the naturals, denoted $\mathbb{N}$, but I prefer to have my naturals start from $1$.

Shagnik
  • 3,663
  • 1
    Oh my God, thanks so so much for such an elaborate answer. I understood and feel very grateful :) –  Aug 29 '16 at 14:10
  • I have one doubt; you mentioned that (n+k-1)C(k-1) is the same as (n+k-1)Cn. Can you explain why? If there were k-1 bars lying around and I had n stars with me, how are there n positions b/w the bars for the stars to occupy? I'm not certain if I've expressed my doubt clearly enough and I hope that you understand what I'm trying to ask. –  Aug 29 '16 at 14:23
  • 1
    @KaumudiHarikumar: we don't think of fixing the bars and fitting the stars in between. Instead, imagine there are $n+k-1$ blank spaces, with each space to be filled with either a bar or a star. From those $n+k-1$ spaces, choose $k-1$ to be occupied by bars. The rest will be filled by stars. There are $\binom{n+k-1}{k-1}$ such choices. Alternatively, choose $n$ of the $n+k-1$ spaces for the stars, with the rest for bars, giving $\binom{n+k-1}{n}$ options. However, these count the same thing (in two different ways), so they must be equal; that is, $\binom{n+k-1}{k-1} = \binom{n+k-1}{n}$. – Shagnik Aug 29 '16 at 14:44
  • Yes, that's the better and definitely the more useful way to think about it. Thanks! Can you think of some other problems in combinatorics where I might benefit from using this method? –  Aug 29 '16 at 14:50
  • And also, I'm sorry for asking so many doubts but I'm afraid I don't understand how you calculated the number of "bad solutions". How did you write this: xI+zM+xP+xS=xI+xM? If it isn't too much to ask for, can you please explain the parts concerning finding the "bad solutions" or perhaps elaborate that JUST a little bit more in your answer? –  Aug 29 '16 at 14:59
  • @KaumudiHarikumar: That's not what I wrote. We have $x_I + z_M + x_P + x_S = x_I + x_M + x_P + x_S - 2$, since $z_M = x_M - 2$. The reason for this change of variables is that the stars-and-bars formula applies when we require our variables to be non-negative. However, here we needed $x_M \ge 2$. By subtracting $2$ from $x_M$, we get a variable that is just required to be non-negative instead. However, you have to be a bit careful - by subtracting $2$, the whole sum because $2$ instead of $4$, which this calculation was meant to show. – Shagnik Aug 29 '16 at 15:07
  • @KaumudiHarikumar: Regarding where this method can be used, it is very useful for a wide range of counting problems. Specifically, when you are trying to divide identical items (the $n$) into distinct parts (the variables $x_i$). For example, if you are sharing an apartment with a total rent of $$n$ with $k-1$ other people, how many ways are there for you to split the rent (assuming you each pay an integer amount)? However, this doesn't apply to every counting situation (e.g. if the items are distinct). You could look at https://en.wikipedia.org/wiki/Twelvefold_way for similar problems. – Shagnik Aug 29 '16 at 15:10
  • Okay, thanks :) I see that when there are more than just the basic condition of non-negativity on the variables, this method is used to remove the bad solutions, yes? –  Aug 29 '16 at 15:33
0

Very interesting question.

There is a fairly "simple" (untrue) general formula if we use the exponential generating function.

Let $n_i$ be the number of times the $i$th letter is repeated - the order we put the letters in doesn't matter of course. In your example, if we put the word MISSISSIPPI we have two P, four I, one M, four S, so we get

$$(n_1, ..., n_{11}) = (1,4,4,4,4,4,4,4,2,2,4)$$

Now let $A_{\ell}$ be the number of words of length $\ell$, and $k$ be the number of distinct letters.

I claim $$\sum_\ell \frac{A_{\ell}}{\ell!} x^\ell = \prod_{i=1}^k \sum_{j=0}^{n_i} \frac{x^i}{i!}$$

in your example this is $$\sum_{\ell} \frac{A_{\ell}}{\ell!} x^\ell = (1+x)^1\left(1+x + \frac{x^2}{2}\right)^1\left(1 + x + \frac{x^2}{2} + \frac{x^3}{6} + \frac{x^4}{24}\right)^2$$

since you have $1$ letter appearing once, $1$ letter appearing twice, and $2$ letters appearing four times.

So if you want to know the number of such words with a specific length, you can extract that coefficient (probably with a computer).

In this case we get

$$(1+x)^1\left(1+x + \frac{x^2}{2}\right)^1\left(1 + x + \frac{x^2}{2} + \frac{x^3}{6} + \frac{x^4}{24}\right)^2 = $$

$$ = \frac{x^{11}}{1152}+\frac{11 x^{10}}{1152}+\frac{17 x^9}{288}+\frac{149 x^8}{576}+\frac{31 x^7}{36}+\frac{161 x^6}{72}+\frac{55 x^5}{12}+\frac{22 x^4}{3}+\frac{53 x^3}{6}+\frac{15 x^2}{2}+4 x+1$$

and so the answer is, extracting the coefficient in $x^4$ (because you want 4 lettered words)

$$\frac{22}{3} \cdot 4! = 176$$

  • 1
    Wow, this looks awesome but way too complicated to use in exams at my level. Thank you! :) –  Aug 29 '16 at 12:34
  • @KaumudiHarikumar Unfortunately I don't know any other methods. Maybe there is something really easier and fast! :) Your question is indeed interesting because there is no formula in combinatory calculus that provides a simple answer. But again: to me! Maybe someone will make things easier :D –  Aug 29 '16 at 12:37
  • Shagnik has provided a nice method! –  Aug 29 '16 at 12:40
  • @KaumudiHarikumar I just noticed! Thanks –  Aug 29 '16 at 12:43
  • 1
    The exponential generating function will give you the number of permutations (ordered) choices of four letters. To get the number of combinations (unordered) sets, you should use the ordinary generating function $(1+x)(1+x+x^2)(1+x+x^2+x^3+x^4)^2$ instead. The coefficient of $x^4$ is then $21$, which is the number of subsets of $4$ letters. – Shagnik Aug 29 '16 at 13:10