Example for writing the multinomial distribution as sum of multinoulli distributions

Question

I am trying to understand the answer provided by @MichaelHardy in this thread (https://math.stackexchange.com/a/204094/554130). He explains how a multinomial variable can be viewed as sum of multinoulli variables. I have pasted his answer below for reference.

Quote

Suppose $X_1,\ldots,X_n$ are independent identically distributed random variables and $$ \Pr(X_1 = (0,0,0,\ldots0,0,\underset{\uparrow}{1},0,0,\ldots,0,0,0)) = p_i $$ where there are $k$ components and the single "$1$" is the $i$th component, for $i=1,\ldots,k$.

Suppose $c_1+\cdots+c_n = n$, and ask what is $$ \Pr((X_1+\cdots+X_n)=(c_1,\ldots,c_n)). $$ The vector $(c_1,\ldots,c_n)$ is a sum of $c_1$ terms equal to $(1,0,0,0,\ldots,0)$, then $c_2$ terms equal to $(0,1,0,0,\ldots,0)$, and so on. The probability of getting any particular sequence of $c_1$ terms equal to $(1,0,0,0,\ldots,0)$, then $c_2$ terms equal to $(0,1,0,0,\ldots,0)$, and so on, is $p_1^{c_1}p_2^{c_2}\cdots p_k^{c_k}$. So the probability we seek is $$ (p_1^{c_1}p_2^{c_2}\cdots p_k^{c_k}) + (p_1^{c_1}p_2^{c_2}\cdots p_k^{c_k}) + \cdots + (p_1^{c_1}p_2^{c_2}\cdots p_k^{c_k}), $$ where the number of terms is the number of distinguishable orders in which we can list $c_1$ copies of $(1,0,0,0,\ldots,0)$, $c_2$ copies of $(0,1,0,0,\ldots,0)$, and so on. That is a combinatorial problem, whose solution is $\dbinom{n}{c_1,c_2,\ldots,c_k}$. Hence the probability we seek is $$ \binom{n}{c_1,c_2,\ldots,c_k} p_1^{c_1}p_2^{c_2}\cdots p_k^{c_k}, $$ so there we have the multinomial distribution.

End of Quote

I don't understand the answer fully, so I tried to craft a numeric example. But I am stuck, so I hope somebody can help me here.

I am assuming the following parameters:

n = 3

k = 3

p = [0.5 0.3 0.2]

To find the probability of getting a certain $X = (2, 1, 0)$ I would normally use the joint pmf of the multinomial distribution to get

$$Pr(X = (2, 1, 0)) = {3 \choose 2, 1, 0}\cdot 0.5² \cdot 0.3 = 0.225$$

According to the answer, the first step is realizing that the result of that the input vector $X$ can be written as $$ x = (2, 1, 0) = c_1 \cdot (1, 0, 0)^T + c_2 \cdot (0, 1, 0)^T = 2 \cdot (1, 0, 0)^T + 1 \cdot (0, 1, 0)^T $$

This is the point where I cannot follow the argument. How do I get from this equation to this one: $$ (p_1^{c_1}p_2^{c_2}\cdots p_k^{c_k}) + (p_1^{c_1}p_2^{c_2}\cdots p_k^{c_k}) + \cdots + (p_1^{c_1}p_2^{c_2}\cdots p_k^{c_k}) $$ Wouldn't the probability of observing one of these unit vectors be the corresponding probability in p? E.g. $$ Pr(X = (1, 0, 0)) = p_1 = 0.5 $$ And therefore the result equal to $$ Pr(X = (2, 1, 0)) = 2 \cdot Pr(X = (1, 0, 0)) + Pr(X = (0, 1, 0)) = 2 \cdot 0.5 + 0.3 = 1.3 $$ Where did I go wrong?

What does $\Pr(X)$ mean? Also, notice that the formula Michael Hardy wrote is correct. $P(X = (2,1,0)) = m (0.5)^2(0.3)^1(0.2)^0,$ where $m$ is the number of distinguishable ways in which the first entry was selected twice, the second entry once and the third entry zero times. Namely, these are (1st,1st,2n), (1st, 2n, 1st), (2n,1st, 1st), which are three ways. Then $P(X = (2,1,0)) = 3(0.5)^2(0.3) = 0.225$ as you obtained. — William M., Nov 10 '22 at 20:20
Sorry, that was a typo. I am afraid that I still miss the reasoning how to transition from the decomposition in unit vectors to the suggested formula you referenced. I understand that there have to be m terms, but does $x = 2 \cdot (1, 0, 0) + 1 \cdot (0, 1, 0)$ simply translate to multiplying the probabilities of the unit vectors? E.g. $Pr(X=(2, 1, 0)) = Pr(X=(1, 0, 0)) \cdot Pr(X=(1, 0, 0)) \cdot Pr(X=(0, 1, 0)) = 0.075$ and then multiplying by the possible distinct combinations? — DocDriven, Nov 11 '22 at 08:44
Also, another formulation of the problem would be: how to calculate the probability of the sum of unit vectors? I hope that this makes the problem clear. — DocDriven, Nov 11 '22 at 08:53
Yes, the vector (1,0,0) was chosen twice, each with probability (0.5), the vector (0,1,0) was chosen once with probability 0.3. But you didn't see the order in which these vectors appeared. If the three unit vectors are $u_1, u_2, u_3,$ the order could have been $u_1, u_1, u_2$ or $u_1, u_2, u_1$ or $u_2, u_1, u_1,$ and that is where the three comes from. If this still doesn't make it for you, try to read one of the bazillion explanations of the binomial distribution out there. — William M., Nov 12 '22 at 19:13

Example for writing the multinomial distribution as sum of multinoulli distributions

0 Answers0