I have been thinking about these problems for a while and think I might have found a way to partly answer them. These problems deal with estimating the birth and death rate of a system in different scenarios.
Scenario 1:
- Suppose at time=0 my population is $k_1$. At some final time =t, my population is $k_2$ ( $k_1, k_2$ are positive integers)
- At each time point from (0,T), I assume that there is a probability of $p_1$ that the current population can increase by $j$ units and a probability of $p_2$ that the current population decrease by $l$ units (assume $p_1, p_2, j, l$ are all constant)
- I only observe the system at time=0 and time=t (i.e. I know the exact values of $k_1, k_2$.
- Assuming that the population can never go below 0, what are the most likely values of $p_1, p_2, j, l$ ?
Scenario 2:
- Suppose at time=0 my population is $k_1$. At some final time =t, my population is $k_2$ ( $k_1, k_2$ are positive integers)
- At each time point from (0,T), I assume that there is a probability of $p_1$ that the current population can increase by $j$ units and a probability of $p_2$ that the current population decrease by $l$ units (assume $p_1, p_2, j, l$ are all constant)
- I observe the population of time =0 and time=T ... as well as at some times $t_i$ (e.g. suppose I have intervals 0,1,2,3,4,5,6,7,8,9,10. I observe the population at times=0, 5,6,9,10 ... at time=0 population is $k_1$, at time=5 population is $a$, time= 6 population is $b$, time = 9 population is $c$, time=10 population is $k_2$)
- Assuming that the population can never go below 0, what are the most likely values of $p_1, p_2, j, l$ ?
My Answers:
For Scenario 1, I used a latent variable approach (e.g. EM algorithm/Gaussian Mixture) wrote:
$$ L(p_1, p_2, j, l) = \sum_{j=0}^{J} \sum_{l=0}^{L} \sum_{n_1=0}^{T} \sum_{n_2=0}^{T} \left[ p_1^{n_1} \cdot (1 - p_1)^{T - n_1} \cdot p_2^{n_2} \cdot (1 - p_2)^{T - n_2} \right] \cdot I(n_1 \cdot j - n_2 \cdot l = k_2 - k_1) $$
- $p_1^{n_1}$ represents the probability of $n_1$ increases in the population. $p_2^{n_2}$ represents the probability of $n_2$ decreases in the population.
- $(1 - p_1 - p_2)^{T - n_1 - n_2}$ represents the probability of the remaining time intervals during which the population neither increases nor decreases. This happens with probability $1 - p_1 - p_2$, and there are $T - n_1 - n_2$ such intervals.
- $I(n_1 \cdot j - n_2 \cdot l = k_2 - k_1)$ is an indicator function that takes values 1 if the condition is true else 0. This is to ensure that the total number of decreases and increases respect the initial and final population
- However, I am not sure if this Likelihood Function prevents the population from going below 0 and some intermediate time point. I am also not sure if a combinatorial/multinomial term is needed in the likelihood
I think its more difficult to write the likelihood function for Scenario 2, even though Scenario 2 has more information compared to Scenario 1.
I think this question can be broken down into two parts: A likelihood function for the times where we have full information, and a likelihood function for the times where we have missing information:
$$ L(p_1, p_2, j, l) = L_{obs}(p_1, p_2, j, l) \cdot L_{unobs}(p_1, p_2, j, l) $$
From this point on, I think we can use a similar approach as in Scenario 1:
$$ L_{obs}(p_1, p_2, j, l) = \prod_{i=0}^{N_{obs}-1} \sum_{n_1=0}^{t_{o_{i+1}} - t_{o_i}} \sum_{n_2=0}^{t_{o_{i+1}} - t_{o_i}} \sum_{j=0}^{J} \sum_{l=0}^{L} \left[ p_1^{n_1} \cdot (1 - p_1)^{t_{o_{i+1}} - t_{o_i - n_1}} \cdot p_2^{n_2} \cdot (1 - p_2)^{t_{o_{i+1}} - t_{o_i - n_2}} \right] \cdot I(n_1 \cdot j - n_2 \cdot l = k_{t_{o_{i+1}}} - k_{t_{o_i}}) $$
$$ L_{unobs}(p_1, p_2, j, l) = \prod_{i=0}^{N_{unobs}-1} \sum_{n_1=0}^{t_{u_{i+1}} - t_{u_i}} \sum_{n_2=0}^{t_{u_{i+1}} - t_{u_i}} \sum_{j=0}^{J} \sum_{l=0}^{L} \left[ p_1^{n_1} \cdot (1 - p_1)^{t_{u_{i+1}} - t_{u_i - n_1}} \cdot p_2^{n_2} \cdot (1 - p_2)^{t_{u_{i+1}} - t_{u_i - n_2}} \right] \cdot I(n_1 \cdot j - n_2 \cdot l = k_{t_{u_{i+1}}} - k_{t_{u_i}}) $$
I have been thinking about how to correctly write the likelihood functions for both Scenario 1 and Scenario 2 for quite some time and find myself getting lost/confused. Can someone please help me write these correctly?