I have $n$ birth and death processes. Each of them are machines that are either working or failed. When working, the time until next failure is exponential with rate $\lambda$. And when any machine fails, the time until recovery is exponential with rate $\mu$. Now, I observe $n$ of these machines for a time period, $t$. I'm interested in $M(t)$, the maximum number of machines that are failed at any instant within the interval, $t$. What is $E(M(t))$? How does it grow with $t$? Is it $O(t)$, or $O(\log(t))$ or none? My intuition says it can't be $O(1)$ since when we give the machines a longer time, more of them will tend to fail together, but also shouldn't be as bad as $O(t)$.
EDIT: thought about it, we can consider dividing $t$ into small intervals so that each interval, any machine is either working or down. The probability of working will be $p=\frac{\lambda}{\lambda+\mu}$. Then, the number of machines down in any interval becomes binomial. Now, we're taking about taking the max of $k$ binomials and as shown here: Expected value of the maximum of binomial random variables, this increases as $\sqrt{\log(k)}$. Interesting to see if someone can come up with the exact expression. Also, I didn't understand how @cdipaolo came up with the "asymptotically correct bound" in that case.