I am trying to understand the answer provided by @MichaelHardy in this thread (https://math.stackexchange.com/a/204094/554130). He explains how a multinomial variable can be viewed as sum of multinoulli variables. I have pasted his answer below for reference.
Quote
Suppose $X_1,\ldots,X_n$ are independent identically distributed random variables and $$ \Pr(X_1 = (0,0,0,\ldots0,0,\underset{\uparrow}{1},0,0,\ldots,0,0,0)) = p_i $$ where there are $k$ components and the single "$1$" is the $i$th component, for $i=1,\ldots,k$.
Suppose $c_1+\cdots+c_n = n$, and ask what is $$ \Pr((X_1+\cdots+X_n)=(c_1,\ldots,c_n)). $$ The vector $(c_1,\ldots,c_n)$ is a sum of $c_1$ terms equal to $(1,0,0,0,\ldots,0)$, then $c_2$ terms equal to $(0,1,0,0,\ldots,0)$, and so on. The probability of getting any particular sequence of $c_1$ terms equal to $(1,0,0,0,\ldots,0)$, then $c_2$ terms equal to $(0,1,0,0,\ldots,0)$, and so on, is $p_1^{c_1}p_2^{c_2}\cdots p_k^{c_k}$. So the probability we seek is $$ (p_1^{c_1}p_2^{c_2}\cdots p_k^{c_k}) + (p_1^{c_1}p_2^{c_2}\cdots p_k^{c_k}) + \cdots + (p_1^{c_1}p_2^{c_2}\cdots p_k^{c_k}), $$ where the number of terms is the number of distinguishable orders in which we can list $c_1$ copies of $(1,0,0,0,\ldots,0)$, $c_2$ copies of $(0,1,0,0,\ldots,0)$, and so on. That is a combinatorial problem, whose solution is $\dbinom{n}{c_1,c_2,\ldots,c_k}$. Hence the probability we seek is $$ \binom{n}{c_1,c_2,\ldots,c_k} p_1^{c_1}p_2^{c_2}\cdots p_k^{c_k}, $$ so there we have the multinomial distribution.
End of Quote
I don't understand the answer fully, so I tried to craft a numeric example. But I am stuck, so I hope somebody can help me here.
I am assuming the following parameters:
n = 3
k = 3
p = [0.5 0.3 0.2]
To find the probability of getting a certain $X = (2, 1, 0)$ I would normally use the joint pmf of the multinomial distribution to get
$$Pr(X = (2, 1, 0)) = {3 \choose 2, 1, 0}\cdot 0.5² \cdot 0.3 = 0.225$$
According to the answer, the first step is realizing that the result of that the input vector $X$ can be written as $$ x = (2, 1, 0) = c_1 \cdot (1, 0, 0)^T + c_2 \cdot (0, 1, 0)^T = 2 \cdot (1, 0, 0)^T + 1 \cdot (0, 1, 0)^T $$
This is the point where I cannot follow the argument. How do I get from this equation to this one: $$ (p_1^{c_1}p_2^{c_2}\cdots p_k^{c_k}) + (p_1^{c_1}p_2^{c_2}\cdots p_k^{c_k}) + \cdots + (p_1^{c_1}p_2^{c_2}\cdots p_k^{c_k}) $$ Wouldn't the probability of observing one of these unit vectors be the corresponding probability in p? E.g. $$ Pr(X = (1, 0, 0)) = p_1 = 0.5 $$ And therefore the result equal to $$ Pr(X = (2, 1, 0)) = 2 \cdot Pr(X = (1, 0, 0)) + Pr(X = (0, 1, 0)) = 2 \cdot 0.5 + 0.3 = 1.3 $$ Where did I go wrong?