1

I was trying to find the formula for selecting the kth best person, like out of 100, the first best is 100, then the second best is 99. P(K = 100) is given everywhere in the internet, but how do we find P(K=k) where 1 <= k < n. I tried doing it, but I think it is wrong, the probabilities do not add up to 1.

My Solution

$\displaystyle P( k) =\frac{r}{n}\sum _{i=r+1}^{k}\frac{1}{i-1} \cdot \frac{\binom{n-i}{n-k}}{\binom{n-1}{n-k}} \ =\ \frac{r\cdot ( k-1) !}{n\cdot ( n-1) !}\sum _{i=r+1}^{k}\frac{1}{i-1} \cdot \frac{( n-i) !}{( k-i) !}$

This other solution is exactly similar to mine, yields the same answer, but in a slightly different notation.

$\displaystyle P( k) =\frac{r}{n}\sum _{i=r+1}^{k}\frac{1}{i-1} \cdot \prod _{j=0}^{n-k-1}\frac{n-i-j}{n-j-1}$

Please let me know where I am going wrong.

My Approach (summarized by ChatGPT)

The Probability Distribution needed (through simulation of selection process). So, I need to derive a PMF that outputs values similar to this image.

The probability Distribution given by mine,Tolstoyevsky's and Henry's formulas

Thank You

Adi
  • 11

1 Answers1

1

My calculation is that with $n$ candidates and an initial block that you interview without selection of size $m$, then the probability of selecting candidate rank $k$ (for $m < k \le n$) is the sum over possible maximum rankings $x$ in the initial block of the product of

  • the probability $x$ is the maximum ranking in the initial block, which is $\dfrac{x \choose m}{{n \choose m}}-\dfrac{x-1 \choose m}{{n \choose m}}=\dfrac{x-1 \choose m-1}{{n \choose m}}$
  • the probability $k$ is the first of the better candidates to be interviewed after the initial block, which is $\frac{1}{n-x}$ since there are $n-x$ better candidates each equally likely to be seen before the others and $k$ is one of them if $k>x$.

That gives

$$P(k)=\sum_{x=m}^{k-1} \frac{x-1 \choose m-1}{{n \choose m}(n-x)}$$ which can be written as $$P(k)=P(k-1)+\frac{k-2 \choose m-1}{{n \choose m}(n-k+1)}$$ starting at $P(m)=0$ since the best candidate in the initial block of size $m$ is rank at least $m$.

For example, with $n=100$ and $m=37$, that gives $P(38)\approx 4.6\times 10^{-30}$ and $P(96)\approx 0.015$ and $P(100)\approx 0.371$. Summing $P(k)$ over $k$ gives exactly $0.63$ which, as you observe, is less than $1$. This should be expected as there is probability $\frac mn$ $(0.37$ here$)$ that the maximum ranking in the initial block is $x=n$ and so no later better selection can be made.

Here are $10^5$ simulations with $n=100$ and $m=37$ using R. They fit my calculations after allowing for simulation noise. In particular, in about $37\%$ of cases no candidate is selected.

secretaryselect <- function(numbercandidates,initialblocksize){
  ranks <- sample(numbercandidates)
  bestinitial <- max(ranks[1:initialblocksize])
  selectable <- (1:numbercandidates) > initialblocksize & ranks > bestinitial
  return(ifelse(any(selectable), ranks[min(which(selectable))], NA))
  }

set.seed(2024) numbersims <- 10^5 sims <- replicate(numbersims, secretaryselect(100,37)) table(sims, useNA="ifany")/numbersims

79 81 82 86 87 88 89 90 91 92

0.00001 0.00001 0.00001 0.00004 0.00007 0.00014 0.00013 0.00053 0.00065 0.00127

93 94 95 96 97 98 99 100 <NA>

0.00209 0.00430 0.00795 0.01391 0.03053 0.06203 0.13511 0.37025 0.37097

Henry
  • 169,616
  • The function you provide, and I derived, provide the same output. But I have strong feeling that both our functions are wrong. I don't know if I can upload image here, please check my question I have added a new image, which shows the true distribution, that I got through simulation. And yes, I get the sum from my formula and yours approx. 0.63 and not 1. But I did not understand what you were saying. – Adi Jun 27 '24 at 09:55
  • BTW, I guess, I forgot to consider another possibility in my approach, where, k-1 element is not in the initial rejection list, but still we select k. For example, r = 3, n= 6, k =6 {1,2,4,3,6,5} or {1,2,4,6,3,5} Both these cases, also should be considered, which I am not considering. – Adi Jun 27 '24 at 10:10
  • @RevepoGaming Somehow your simulation gives $P(1) \approx 0.0032$. How is it not $0$? You do not select anybody in the initial block or anybody whose rank is worse than the best in the initial block. So there should be no possibility of selecting the worst ranked individual as they will be worse than the best in the initial block. – Henry Jun 27 '24 at 15:42
  • Let n=7, r =2, and k=1, P(k) means selecting 1, given that we initially reject 2 and we have total 7 people. These are some example cases - [7, 2, 5, 6, 3, 4, 1], [4, 7, 6, 2, 5, 3, 1], and so on. In all these cases, we end up selecting 1. – Adi Jun 27 '24 at 18:32
  • @RevepoGaming - that is not how I think of the classic secretary problem. You are suggesting that if the best ranked candidate is in the initial block (probability $\frac mn$) then the final candidate is selected (and is equally likely to be any of the $n-1$ other candidates). So your version would add $\frac {m}{n(n-1)}$ to each of the $P(k)$ when $k<n$ and this would make $\sum_1^n P(k) =1$ rather than $1-\frac mn$. For $n=100,m=37$ it adds $\frac{37}{9900} \approx 0.0037$ ninety-nine times, and for example $P(96)$ increases from about $0.015$ to about $0.019$, close to your simulation. – Henry Jun 27 '24 at 20:12
  • yeah, I completely forgot this case, I was considering adding this only for k < m, but I need to add for all cases except k = n. And still I feel that I am missing out on the case that k-1 element is not in the initial rejection list, but still we select k. Do you observe that it is being considered ? So far this is what I considered $\displaystyle P(k-1 \text{ element in } r) = \frac{r}{i-1}, \ P(k \text{ element is in } i \text{th position })= \frac{1}{n},\ P(k+1 \text{ to n elements in }i+1 \text{ to } n) = \frac{\binom{n-i}{n-k}}{\binom{n-1}{n-k}}$ – Adi Jun 27 '24 at 22:23
  • and after your observation, $\ \displaystyle P(n \text{ in } r \text{ and } k \text{ is in the last}) = \frac{r}{n} \cdotp \frac{1}{n-1}$ – Adi Jun 27 '24 at 22:31
  • @Adi (you seem to have changed your user name): The case where neither $k$ nor $k-1$ are in the initial block, and the maximum of the initial block is less than $k-1$, with $k$ selected before $k-1$, is taken into account in the calculation of $P(k)$. It is part of the $\frac1{n-x}$ in my second bullet point. – Henry Jun 28 '24 at 01:31
  • Yes, I understand it is in your formula, but I never considered for that in mine, still my derived function also gives same output as yours. – Adi Jun 28 '24 at 04:04