Memoryless property of the discrete geometric distribution

Question

I came across this proof of the memoryless property of the discrete geometric distribution which defines the criterion of memoryless as:

\begin{equation}\label{memoryless-def} \boldsymbol{\operatorname{P}}(X \ge s+t | X \ge t) = \boldsymbol{\operatorname{P}}(X \ge s) \end{equation}

However the above clearly does not hold if we think of the geometric distribution as counting number of "tosses" till first success (including the success). To see that, set $s=t=1$ in which case the criterion becomes:

\begin{equation}\label{memoryless-def-ex} \boldsymbol{\operatorname{P}}(X \ge 2 | X \ge 1) = \boldsymbol{\operatorname{P}}(X \ge 1) \end{equation}

This makes no sense if we also include the success "toss" as the probability on the left hand side is obviously less than 1 whereas the probability on the right hand side is exactly 1.

Upon further investigation I realized that the proof uses the following formula for the probability of $X$ being greater than or equal to $x$:

\begin{equation} \boldsymbol{\operatorname{P}}(X \ge x) = (1-p)^x \end{equation}

which shows that the proof is assuming that the geometric distribution only counts the failures till the first success (excluding the first success).

I then came across this Wolfram article that clearly states that there are two different definitions of memorylessness depending on how we define the geometric distribution. However, none of the offered definitions are identical to the above (though I have no doubt they're equivalent).

At this point, I understand all proofs and formulas but I have three questions:

what is the most accepted defintion of geometric distribution and what is the assocated definition of memorylessness
can someone help me understand intuitively why one definition of memorylessness works in the one case but not the other?
couldn't we equivalently define memoryless as follows: In the case of the geometric distribution where we also count the successful toss:

\begin{equation} \boldsymbol{\operatorname{P}}(X = s+t | X \gt t) = \boldsymbol{\operatorname{P}}(X = s) \end{equation} And In the case of the geometric distribution where we only count the failures:

\begin{equation} \boldsymbol{\operatorname{P}}(X = s+t | X \ge t) = \boldsymbol{\operatorname{P}}(X = s) \end{equation}

It would seem to me that the above definitions are equivalent and perhaps simpler and more intutive.

The memoryless property for a 1-shifted geometric random variable is based on the strictly exceeds condition, $X>t$. — Graham Kemp, Jun 02 '18 at 08:27
@GrahamKemp thanks; strictly exceeded on all parts? I.e. P(X>s+t | X>t) = P(X > s) ? — Marcus Junius Brutus, Jun 02 '18 at 15:54

Memoryless property of the discrete geometric distribution

0 Answers0