3

Consider repeated independent trials of two outcomes S (success) or F (failure) with probabilities $p$ and $q$, respectively. Determine the distribution of the number of trials required for the first occurrence of the event 50 S containing at least one SSSSS, i.e., totally 50 successes and the 5 successive successes should happen at least one time.

My efforts:

Let $M_n$ be the number of trials required for the first occurrence of totally $n$ S. We have calculated that $P(M_n=k)={k-1\choose n-1}p^{n}q^{k-n}$. Let $N_n$ be the number of trials required for the first occurrence of totally $n$ S containing at least one SSSSS. Then $P(N_n=k)=0$ if $n<5$. We want to determine the distribution of $N_{50}$.

Condition on the following possible initial events:

  1. A1: The first five results were Fxxxx (with probability $q$ ), x = S or F,
  2. A2: The first five results were SFxxx (with probability $pq$ ),
  3. A3: The first five results were SSFxx (with probability $p^2q$),
  4. A4: The first five results were SSSFx (with probability $p^3q$),
  5. A5: The first five results were SSSSF (with probability $p^4q$),
  6. A6: The first five results were SSSSS (with probability $p^5$).

Note that $P(A_1)+P(A_2)+P(A_3)+P(A_4) +P(A_5)+P(A_6)=1$.

Let $k>5$.

In case 1, $P(N_n=k∣\text{first result was F})=P(N_n=k−1)$ because we have just not made any progress towards $n$ S containing SSSSS with the first result, and there are now $k−1$ trials remaining to get $n$ S containing SSSSS.

In case 2, $P(N_n=k∣\text{first two results were SF})=P(N_{n-1}=k−2)$. Although we have just not made any progress towards SSSSS with the first two results, but we do have a S and there is $(n-1)$ S remaining. There are now $k−2$ trials remaining to get $(n-1)$ S containing SSSSS.

In case 3, $P(N_n=k∣\text{first three results were SSF})=P(N_{n-2} =k−3)$. Although we have just not made any progress towards SSSSS with the first three results, but we do have two S and there is $(n-2)$ S remaining. There are now $k−3$ trials remaining to get $(n-2)$ S containing SSSSS.

In case 4, $P(N_n=k∣\text{first four results were SSSF})=P(N_{n-3} =k−4)$. Although we have just not made any progress towards SSSSS with the first four results, but we do have three S and there is $(n-3)$ S remaining. There are now $k−4$ trials remaining to get $(n-3)$ S containing SSSSS.

In case 5, $P(N_n=k∣\text{first five results were SSSSF})=P(N_{n-4} =k−5)$. Although we have just not made any progress towards SSSSS with the first five results, but we do have four S and there is $(n-4)$ S remaining. There are now $k−5$ trials remaining to get $(n-4)$ S containing SSSSS.

In case 6, $P(N_n=k\mid\text{first five results were SSSSS})=P(M_{n-5}=k−5)$. We already have SSSSS. We no longer need to worry about SSSSS. We just need to get $(n-5)$ S and we are done. There are now $k−5$ trials remaining to get $(n-5)$ S.

If you put that all together using the Law of Total Probability $$P(N_n =k)=P(N_n =k\mid A_1)P(A_1)+P(N_n =k\mid A_2)P(A_2)+ +\cdots+P(N_n =k\mid A_6)P(A_6),$$ where $A_1, A_2, A_3, \ldots, A_6$ are the 6 possible initial events, then you get the recursive formula for $k> 5$,

$$P(N_n =k)=qP(N_n =k−1)+pqP(N_{n-1} =k−2)+p^2qP(N_{n-2} =k−3) +\cdots+ p^4qP(N_{n-4} =k−5)+p^5P(M_{n-5}=k−5)$$

Am I on the correct track? Some strange things happen when I tried to calculate the base case $P(N_5=k)$. Please tell me what is $P(N_5=k)$ and help me verify the recursive relation for $P(N_6=k)$.

  • Can you please convert the images to text+MathJax or at least share the document into the google docs or somewhere so someone can do it? Thanks. – Alexey Burdin Aug 20 '20 at 04:22
  • This diverges:$$\begin{align} &\sum_{n\ge 62}(n-62)!\binom{n-1}{61}\left(\frac{\gamma}2\right)^{n-50}\left(\frac{1-\gamma}2\right)^{50}\ &\qquad=\frac{(1-\gamma)^{50}}{2^{50}\cdot 61!}\sum_{n\ge 62}(n-1)!\left(\frac{\gamma}2\right)^{n-50}\ &\qquad=\frac{\gamma^{12}(1-\gamma)^{62}}{2^{50}\cdot 61!}\sum_{n\ge 0}(n+61)!\left(\frac{\gamma}2\right)^{n} \end{align}$$ – Brian M. Scott Aug 20 '20 at 05:22
  • @AlexeyBurdin A google drive link for downloading the document was added. – Junk Warrior Aug 20 '20 at 18:34
  • @BrianM.Scott Sir, please give me a hint about limsup at https://math.stackexchange.com/questions/3801010/extension-of-continuous-function-on-a-closed-set-of-a-metric-space – Junk Warrior Aug 24 '20 at 15:22
  • @AlexeyBurdin Please see my update. I have some progresses and new difficulties. – Junk Warrior Sep 02 '20 at 23:58
  • @BrianM.Scott Sir, please see my progresses and new concerns of this problem. – Junk Warrior Sep 03 '20 at 00:46
  • It is not an easy subject .. I am working at your previous post (Markov approach) and shall be able to give soon some directions – G Cab Sep 03 '20 at 01:18
  • @JunkWarrior the first thing I want to try -- to separate the conditions on $50$S and SSSSS, say, if we use inclusion-exclusion, $p=1-P(\le49S)-P(\text{no }SSSSS)+P((\le49S)\cap(\text{no }SSSSS))$ and maybe the last term can be computed directly. – Alexey Burdin Sep 03 '20 at 02:33

3 Answers3

2

This answer is a generating function approach based upon the Goulden-Jackson Cluster Method. We will show for $5\leq r\leq k$: \begin{align*} \color{blue}{P(M_r=k)}&\color{blue}{=\left(\sum_{j\geq 1}(-1)^{j+1}\binom{k-r+1}{j}\binom{k-5j}{k-r}\right.}\\ &\qquad\color{blue}{\left.-\sum_{j\geq 1}(-1)^{j+1}\binom{k-r}{j}\binom{k-1-5j}{k-1-r}\right)p^rq^{k-r}}\tag{1} \end{align*} where the sums in (1) are finite since $\binom{s}{t}=0$ for integral $0<s<t$.

First step: A generating function

We consider the set of words of length $k\geq 0$ built from an alphabet $$\mathcal{V}=\{S,F\}$$ and the set $B=\{SSSSS\}$ of bad words, which are not allowed to be part of the words we are looking for in a first step. We will derive a generating function $G(z)$ where $[z^k]G(z)$, the coefficient of $z^k$ of $G(z)$ gives the number of binary words of length $k$ which do not contain $SSSSS$.

Since we want the number of words which do contain $SSSSS$, we take the generating function of all binary words which is $1+2z+4z^2+8z^3+\cdots = \frac{1}{1-2z}$ and subtract $G(z)$ from it. This way we get $H(z) = \frac{1}{1-2z}-G(z)$.

According to the paper (p.7) the generating function $G(z)$ is \begin{align*} G(z)=\frac{1}{1-dz-\text{weight}(\mathcal{C})}\tag{2} \end{align*} with $d=|\mathcal{V}|=2$, the size of the alphabet and $\mathcal{C}$ the weight-numerator of bad words with \begin{align*} \text{weight}(\mathcal{C})=\text{weight}(\mathcal{C}[SSSSS]) \end{align*}

We calculate according to the paper \begin{align*} \text{weight}(\mathcal{C}[S^5])&=-z^5-z\cdot \text{weight}(\mathcal{C}[S^5])-\cdots-z^4\cdot\text{weight}(\mathcal{C}[S^5])\tag{3}\\ \end{align*}

and get \begin{align*} \text{weight}(\mathcal{C})=\text{weight}(\mathcal{C}[S^5])=-\frac{z^5(1-z)}{1-z^5} \end{align*}

It follows from (2) and (3):

\begin{align*} G(z)&=\frac{1}{1-dz-\text{weight}(\mathcal{C})}\\ &=\frac{1}{1-2z+\frac{z^5(1-z)}{1-z^5}}\\ &=\frac{1-z^5}{1-2z+z^6}\tag{4}\\ \end{align*}

From (4) we obtain \begin{align*} H(z) = \frac{1}{1-2z}-\frac{1+z^5}{1-2z+z^6}\tag{5} \end{align*}

Second step: A refinement

Since we are looking for the number of valid words of length $k$ which contain $50 S$ resp. $r\geq 5$ S in general, we need a refinement of $H(z)$ to keep track of the number of successes $S$. In order to do so we mark successes with $s$.

We obtain from (3) \begin{align*} \text{weight}(\mathcal{C}[S^5])&=-(sz)^5-(sz)\text{weight}(\mathcal{C}[S^5])-\cdots-(sz)^4\text{weight}(\mathcal{C}[S^5]) \end{align*} and get \begin{align*} \text{weight}(\mathcal{C})=-\frac{(sz)^5(1-sz)}{1-(sz)^5} \end{align*}

Using this generalized weight we obtain a generating function $H(z;s)$ \begin{align*} H(z;s)&=\frac{1}{1-(1+s)z}-\frac{1}{1-(1+s)z+\frac{(sz)^5(1-sz)}{1-(sz)^5}}\\ &=\frac{1}{1-(1+s)z}-\frac{1-(sz)^5}{1-(1+s)z+s^5z^6} \end{align*}

Third step: Words terminating with success $S$.

The coefficient $[s^rz^k]H(z;s)$ gives the number of words of length $k$ containing exactly $r$ S with S-runs of length $5$, but not necessarily with an S at the end. To force this we subtract the words of length $k$ which contain exactly $r$ S and S-runs of S of length $5$ and terminate with $F$.

This way we get finally the wanted generating function \begin{align*} \color{blue}{H(z;s)(1-z)}&\color{blue}{=\frac{1-z}{1-(1+s)z}-\frac{\left(1-(sz)^5\right)(1-z)}{1-(1+s)z+s^5z^6}}\tag{6}\\ &=s^5z^5+(s+f)s^5z^6+\left(s^2+3sf+f^2\right)s^5z^7\\ &\qquad+\left(s^3+5s^2f+5sf^2+f^3\right)s^5z^8\\ &\qquad+\left(s^4+7s^3f+\color{blue}{12}s^2f^2+7sf^3+f^4\right)s^5z^9+\cdots \end{align*} where the last line was calculated with the help of Wolfram Alpha.

Note the coefficients of the series correspond with the table entries stated by @GCab.

Looking for instance at the blue marked coefficient $12$ of $s^7f^2z^9$ this gives the number of words of length $9$ containing $7$ S at least one run of $5$ S and ending with S. These words are \begin{align*} \color{blue}{SSSSS}SFFS&\qquad SSFF\color{blue}{SSSSS}\\ \color{blue}{SSSSS}FSFS&\qquad SFSF\color{blue}{SSSSS}\\ \color{blue}{SSSSS}FFSS&\qquad SFFS\color{blue}{SSSSS}\\ SF\color{blue}{SSSSS}FS&\qquad FSSF\color{blue}{SSSSS}\\ FS\color{blue}{SSSSS}FS&\qquad FSFS\color{blue}{SSSSS}\\ F\color{blue}{SSSSS}FSS&\qquad FFSS\color{blue}{SSSSS} \end{align*} where the right-most run of $5$ S is marked blue.

Coefficients of $H(z;s)(1-z)$:

We finally calculate the coefficients of $H(z;s)(1-z)$. We start with

\begin{align*} [s^rz^k]H(z;s) &=[s^rz^k]\frac{1}{1-(1+s)z}-[s^rz^k]\frac{1}{1-(1+s)z+\frac{(sz)^5(1-(sz))}{1-(sz)^5}}\\ &=[s^rz^k]\frac{1}{1-(1+s)z}-[s^rz^k]\frac{1-(sz)^5}{1-(1+s)z+s^5z^6} \end{align*}

At first the easy part:

\begin{align*} \color{blue}{[s^rz^k]\frac{1}{1-(1+s)z}}=[s^rz^k]\sum_{j=0}^{\infty}(1+s)^jz^j =[s^r](1+s)^k \,\,\color{blue}{=\binom{k}{r}}\tag{7} \end{align*}

Now the somewhat longish part. We obtain

\begin{align*} \color{blue}{[s^rz^k]}&\color{blue}{\frac{1}{1-(1+s)z+s^5z^6}}\\ &=[s^rz^k]\sum_{j=0}^\infty\left((1+s)z-s^5z^6\right)^j\\ &=[s^rz^k]\sum_{j=0}^\infty z^j\left((1+s)-s^5z^5\right)^j\\ &=[s^r]\sum_{j=0}^k[z^{k-j}]\sum_{l=0}^j\binom{j}{l}(-1)^ls^{5l}z^{5l}(1+s)^{j-l}\tag{8}\\ &=[s^r]\sum_{j=0}^k[z^j]\sum_{l=0}^{k-j}\binom{k-j}{l}(-1)^ls^{5l}z^{5l}(1+s)^{k-j-l}\tag{9}\\ &=[s^r]\sum_{j=0}^{\left\lfloor k/5\right\rfloor}[z^{5j}]\sum_{l=0}^{k-5j}\binom{k-5j}{l}(-1)^ls^{5l}z^{5l}(1+s)^{k-5j-l}\tag{10}\\ &=[s^r]\sum_{j=0}^{\left\lfloor k/5\right\rfloor}\binom{k-5j}{j}(-1)^js^{5j}(1+s)^{k-6j}\tag{11}\\ &=\sum_{j=0}^{\min\{\left\lfloor k/5\right\rfloor, \left\lfloor r/5\right\rfloor\}}(-1)^j\binom{k-5j}{j}[s^{r-5j}](1+s)^{k-6j}\\ &=\sum_{j\geq 0}(-1)^j\binom{k-5j}{j}\binom{k-6j}{r-5j}\tag{12}\\ &\,\,\color{blue}{=\sum_{j\geq 0}(-1)^j\binom{k-r}{j}\binom{k-5j}{k-r}}\tag{13} \end{align*}

Comment:

  • In (8) we apply the rule $[z^s]z^tA(z)=[z^{s-t}]A(z)$. We also set the upper limit of the outer sum to $k$ since other values do not contribute to the coefficient of $z^k$.

  • In (9) we change the order of summation of the outer sum $j \to k-j$.

  • In (10) we observe we have to take multiples of $5$ only of the index $j$ due to the term $z^{5l}$.

  • In (11) we select the coefficient of $z^{5j}$.

  • In (12) we select the coefficient of $s^{r-5j}$.

  • In (13) we use the binomial identity \begin{align*} \binom{k-5j}{j}\binom{k-6j}{r-6j}&=\frac{(k-5j)!}{j!}\,\frac{1}{(r-6j)!(k-r-j)!}\\ &=\frac{1}{j!(r-6j)!}\,\frac{(k-5j)!}{(k-r)!}=\binom{r-5j}{j}\binom{k-5j}{k-r} \end{align*}

and it follows from (6) and (13): \begin{align*} [s^rz^k]&\frac{1-(sz)^5}{1-(1+s)z+s^5z^6}\\ &=\left([s^rz^k]-[s^{r-5}z^{k-5}]\right)\frac{1}{1-(1+s)z+s^5z^6}\\ &=\sum_{j\geq 0}(-1)^j\binom{k-r}{j}\binom{k-5j}{k-r}-\sum_{j\geq 0}(-1)^j\binom{k-r}{j}\binom{k-5-5j}{k-r}\\ &=\binom{k}{r}+\sum_{j\geq 1}\binom{k-r}{j}\binom{k-5j}{k-r} +\sum_{j\geq 1}(-1)^j\binom{k-r}{j-1}\binom{k-5j}{k-r}\\ &=\binom{k}{r}+\sum_{j\geq 1}(-1)^j\binom{k-r+1}{j}\binom{k-5j}{k-r}\tag{14} \end{align*}

and we obtain \begin{align*} \color{blue}{[s^rz^k]H(z;s)}&=[s^rz^k]\frac{1}{1-(1+s)z}-[s^rz^k]\frac{1-(sz)^5}{1-(1+s)z+s^5z^6}\\ &\,\,\color{blue}{=\sum_{j\geq 1}(-1)^{j+1}\binom{k-r+1}{j}\binom{k-5j}{k-r}}\tag{15} \end{align*}

Last step: Putting all together

We consider the (interesting) case $5\leq r\leq k$ only. Taking the results from (6) and (15) we can now write the coefficients of $H(z;s)(1-z)$ as

\begin{align*} [s^rz^k]&H(z;s)(1-z)\\ &=[s^rz^k]H(z;s)-[s^rz^{k-1}]H(z;s)\\ &=\sum_{j\geq 1}(-1)^{j+1}\binom{k-r+1}{j}\binom{k-5j}{k-r}\\ &\qquad-\sum_{j\geq 1}(-1)^{j+1}\binom{k-r}{j}\binom{k-1-5j}{k-1-r} \end{align*} and the claim (1) follows.

Markus Scheuer
  • 112,413
  • Fix r and sum P(Mr=k) from r to infinigy. The sum is less than 1, e.g., take r=5. Pr(trials needed to reach r S = k) can sum to 1 if summing over k. It is negative binomial distribution. If I interpret P(Mr=k) as Pr(trials needed to reach r S containing 5 consecutive = k), it should sum to 1 like the negative binomial distribution. What is wrong with this interpretation? – Junk Warrior Sep 05 '20 at 21:40
  • @JunkWarrior: We know the probability of the sum of outcomes following the negative binomial distribution is $1$. Consequently we have here as sum of probabilities $1$ minus the sum of probabilities of all the outcomes with runs less than $5$. – Markus Scheuer Sep 05 '20 at 22:20
  • Let Kr be the random variable for the number of trials needed to reach r S with 5 consecutive. The support of Kr is r, r+1, .... Kr should have a probability distribution P(Kr=k) and its distribution should sum up to 1 over its support. So your P(Mr=k) is not P(Kr=k)? Then what is P(Kr=k)? – Junk Warrior Sep 05 '20 at 22:33
  • @JunkWarrior: Let's focus at the negative binomial distribution where we are looking for exactly $r$ times having a success. We consider words of length $k\geq r$ valid if they terminate with a success. Each of these words has probability $p^r(1-p)^{k-r}$ and we know that summing over $k\geq r$ results in a probability of $1$. A subset of these events are those having runs $\geq 5$. – Markus Scheuer Sep 06 '20 at 12:56
  • Summing over these words only results in a probability less than $1$, since we do not consider words of length $k\geq r$ which have $r$ times success and end with success and have S-runs smaller than $5$, which also occur with probability $p^r(1-p)^{k-r}$. – Markus Scheuer Sep 06 '20 at 12:56
  • The expression of H(z;s,f) at the end of Second Step should be $\frac{1}{1-(s+f)z}-\frac{1-(sz)^5}{1-(s+f)z+(sz)^5fz}$. Why do you have a $(sz)^6$ in the denominator? – Junk Warrior Sep 06 '20 at 23:11
  • @JunkWarrior: First of all many thanks for granting an additional bounty. It's very kind of you and yes, you're right, there is a mistake which has to be corrected. I'll check it after work, regards. – Markus Scheuer Sep 07 '20 at 11:44
  • At the beginning of Third Step, to force ending with S, you subtract the words of length k−1 which contain exactly r S and S-runs of S of length 5. Why don't you need to subtract the words of length k-2, k-3, ... which contain exactly r S and S-runs of S of length 5? – Junk Warrior Sep 08 '20 at 18:53
  • @JunkWarrior: Revised, corrected and simplified. Review is welcome. At the beginning of the third step we in fact subtract the words of length $k$ which end with $F$ - reformulated. – Markus Scheuer Sep 08 '20 at 20:35
1

a) Your approach to deduce the recurrence is correct, the problem is to fix the appropriated initial conditions and bounds of validity.

b) To the purpose of solving that clearly we need to proceed as follows.

Given a sequence of Bernoulli trials, with probability of success $p$ (failure $q=(1-p)$), allow me to represent that by a binary string $1 = $ success, $0 =$ failure so as to keep congruence with other posts I am going to link to.
For the same reason and for putting your recurrence with proper initial conditions, allow me to change your denominations and consider

the binary strings of length $n$, having $m$ zeros and $s$ ones, including a one which is fixed at the end of the string;
also let' go general and consider run of consecutive ones of length $r$.

We indicate as $$ P(s,r,n) = N_c (s,r,n)\, p^{\,s } q^{\,n - s} $$ the probability that in a string of length $n$, with total $s$ ones and terminating with a one, there might be runs of consecutive ones of length $r$ or greater.

Now your recurrence reads $$ \eqalign{ & P(s,r,n) = q\,P(s,r,n - 1) + pq\,P(s - 1,r,n - 2) + p^{\,2} q\,P(s - 1,r,n - 2) + \cr & + \cdots + p^{\,4} q\,P(s - r + 1,r,n - r) + \left( \matrix{ n - 1 - r \hfill \cr s - 1 - r \hfill \cr} \right)p^{\,s} q^{\,n - s} \cr} $$

Note that each term is a homogeneous polynomial in $p^s\, q^{n-s}$, so we do not need to bring them around and we can profitably concentrate on the number of strings given by $N_c$, that is $$ \bbox[lightyellow] { \eqalign{ & N_c (s,r,n) = \cr & = \left\{ {\matrix{ 1 & {\left| \matrix{ \;0 \le r \le s \hfill \cr \;1 \le s = n \hfill \cr} \right.} \cr {\sum\limits_{k = 0}^{r - 1} {N_c (s - k,r,n - 1 - k)} + \binom{ n - r - 1 }{ s - r - 1 } } & {\left| \matrix{ \;0 \le r \le s \hfill \cr \;1 \le s < n \hfill \cr} \right.} \cr 0 & {{\rm otherwise}} \cr } } \right. \cr} } \tag{1}$$

Regarding the conditions,

  • the case $s=n$ was not covered in the construction, and must be added;
  • because of the one in last position, $s$ shall be greater than $1$;
  • the remaining are obvious.

The recurrence above has been checked with direct computation for the smaller values of the parameters.
Example:
Nb_s_ones_r_cons_2

c) The recurrence (1) can solved in a closed form as a finite sum.

Consider in fact the strings of this type

Nb_s_ones_r_cons_1

Their total number is $\binom{n}{s}$ and those having a run of length $r$ or greater are $N_c (s+1,r, n+1)$. Therefore, the complement of $N_c$ will represent the strings of the same architecture, which have runs up to $r-1$.

The number of strings composed as above but excluding the last one, which have runs of length up to $r-1$ is given by $$ N_b (s,r - 1,m + 1) $$ where $$ N_b (s,r,m)\quad \left| {\;0 \leqslant \text{integers }s,m,r} \right.\quad = \sum\limits_{\left( {0\, \leqslant } \right)\,\,k\,\,\left( { \leqslant \,\frac{s}{r+1}\, \leqslant \,m} \right)} {\left( { - 1} \right)^k \binom{m}{k} \binom { s + m - 1 - k\left( {r + 1} \right) } { s - k\left( {r + 1} \right)}\ } $$ as explained in various posts, refer mainly to this and to this other one.

But because of the presence of the one at the end, we have to deduct from the above the strings which end in zero plus $ r-1$ ones, giving a final run of $r$.
These are $$ N_b (s-r+1,r - 1,m ) $$ and we conclude that $$ \bbox[lightyellow] { \eqalign{ & N_c (s + 1,r,n + 1) = N_c (s + 1,r,s + m + 1) = \cr & = \left( \matrix{ s + m \cr s \cr} \right) - N_b (s,r - 1,m + 1) + N_b (s - r + 1,r - 1,m) = \cr & = \left( \matrix{ n \cr s \cr} \right) - N_b (s,r - 1,n - s + 1) + N_b (s - r + 1,r - 1,n - s) \quad \left| {\;0 \le s,r,m} \right. \cr} } \tag{2}$$

$N_b$ is more present in literature, has plenty of recurrent relations, and a simple o.g.f. . Not to make the answer too long, I am not going further into details.

d) Summing on $n$.

Consider the strings composed as shown in the sketch in para. c) above.

Their total number is $\binom {s+m}{s} = \binom {n}{s}$ and each has the same probability $p^{s+1}\, q^m = p^{s+1}\, q^{n-s}$.

Keeping $n$ fixed, and summing over $s$ we get $$ \sum\limits_{\left( {0\, \le } \right)\,s\,\left( { \le \,n} \right)} {\binom{ n }{ s } p^{\,s + 1} q^{\,n - s} } = p\left( {p + q} \right)^{\,n} = p $$ which is obvious, since if we add the complementary strings ending in zero we get $(p+q)^{n+1} =1$.

Keeping instead $s$ fixed and summing on $n$, which means to sum on $m$, gives $$ \eqalign{ & \sum\limits_{\left( {0\, \le \,s\, \le } \right)\,n\,} {\binom{ n }{s} p ^{\,s + 1} q^{\,n - s} } = \sum\limits_{\left( {0\, \le } \right)\,\,m\,} {\binom{ s + m }{m} p^{\,s + 1} q^{\,m} } = \cr & = p^{\,s + 1} \sum\limits_{\left( {0\, \le } \right)\,\,m\,} {\binom{ - s - 1 }{m} \left( { - 1} \right)^{\,m} q^{\,m} } = {{p^{\,s + 1} } \over {\left( {1 - q} \right)^{\,s + 1} }} = 1 \cr} $$ which is the Negative Binomial distribution.

Since, by its combinatoric meaning we have $$ \left\{ \matrix{ 0 \le N_c (s + 1,r,n + 1) \le N_c (s + 1,r - 1,n + 1) \le \binom{n }{ s } \hfill \cr N_c (s + 1,s + 2,n + 1) = 0 \hfill \cr N_c (s + 1,s + 1,n + 1) = 1 \hfill \cr N_c (s + 1,1,n + 1) = \binom{n }{ s } \hfill \cr N_c (s + 1,0,n + 1) = \binom{n }{ s } \hfill \cr } \right. $$ then $$ 0 \le P_c (s + 1,r,p) = \sum\limits_{\left( {0\, \le \,s\, \le } \right)\,n\,} {N_c (s + 1,r,n + 1)p^{\,s + 1} q^{\,n - s} } \le 1 \quad \left| \matrix{ \,0 \le s \hfill \cr \;0 \le r \le s + 1 \hfill \cr \;0 < p < 1 \hfill \cr} \right. $$ converges (albeit slowly), and given $s,p$, it is a CDF in $(s+1-r)$ (in case with a further shift of the support).

Unfortunately, as to my knowledge, the sum in $n$ of $N_c$ (and of $N_b$) does not have a closed form : re. to this already cited post.
It is possible however to compute, from (2), a double o.g.f. if you are interested in.

G Cab
  • 35,964
  • Take r=5 and s=5. Intuitively I think Nc(5,5,n)=1 for all 5<=n. But the resulting P(5,5,n) does not sum to 1. Nc(5,5,5)=1. I also have difficulty using recursive to calculate Nc(5,5,6). This is the biggest problem I have now. Would you please clarify this case? – Junk Warrior Sep 03 '20 at 23:53
  • a) in fact, for $r=s \le n$, you get $N_c = 1$ : I rendered the recurrence with the following code >Nc := proc(s,r,n)
    local Ris, k ;
    begin
    if s<1 or r<0 or n<1 then return(0): end_if:
    if s < r or n<s then return(0): end_if:
    if s =n then return(1): end_if :
    Ris:=_plus(Nc(s-k,r,n-1-k)$k=0..r-1)+CBinr(n-r-1,s-r-1):
    return (Ris):
    end_proc:
    – G Cab Sep 04 '20 at 13:42
  • b) why should $P(5,5,n)$ sum to 1 ? – G Cab Sep 04 '20 at 13:44
  • P(s,r,n) is the probability that given a string of length n, there are totally s ones and terminating with a one, and there are runs of consecutive ones of length r or greater. Then the sampling space is {strings of length n}. P(s,r,n) is probability of existence of desired strings. 1-P(s,r,n) is probability of non-existence. Then I really should not sum over n, since it is fixed. – Junk Warrior Sep 04 '20 at 19:16
  • I'm still not satisfied with P(s,r,n) not summing up to 1 over n. Why can't I interpret P(s,r,n) as the probability that the number of bits required to achieve the first s ones with consecutive ones of length >=r is n? Note that the probability for totally s ones ending with a one does sum up to 1 over n, which is the P(Mn=k) in my post. It has the interpretation of the probability of the number of trials required to achieve the first 50 S. – Junk Warrior Sep 04 '20 at 22:48
  • added para. d): now the picture shall be clear – G Cab Sep 05 '20 at 16:44
  • @GCab: I see, the table entries correspond with the series expansion in my answer. Good work. (+1) – Markus Scheuer Sep 05 '20 at 22:35
  • @MarkusScheuer: thanks Marcus. Do you have any hint regarding the sum over $n$ ? – G Cab Sep 06 '20 at 20:42
  • @GCab: I've added remarks in the comment section of my answer. – Markus Scheuer Sep 06 '20 at 20:48
0

There might be a simpler approach.

Let N be the number of trials and P(N) be its probability given the conditions above, then:

$$P(N) = \sum_{S \in S'} p^5\prod_{j \in S} q^{j} p_{j}$$ where S' is all integer partitions (and their other possible permutations with no repeated duplicate elements) of N-50 including zeroes with fixed length 45 and N>=50.

And in general, if want to find the distribution of N given that there in M successes and the presence of m successive successes, then:

$$P(N) = \sum_{S \in S'} p^m \prod_{j \in S} q^{j} p_{j}$$

where S' is all integer partitions (and their other possible permutations with no repeated duplicate elements) of N-M including zeroes with fixed length N-m and N>=M.

P.S. It is not a closed for solution but it is useful enough and better than simulation.

Sima Yi
  • 54