I was reading this tutorial on expectation maximization, and in section 4 the author mentions that it is difficult (impossible?) to differentiate the marginal log likelihood.
I am referencing section 4 where it says:
"We note that there is a summation inside the log. This couples the Θ parameters. If we try to maximize the marginal log likelihood by setting the gradient to zero, we will find that there is no longer a nice closed form solution, unlike the joint log likelihood with complete data. The reader is encouraged to attempt this to see the difference."
Here is the link to the tutorial (section 4). http://pages.cs.wisc.edu/~jerryzhu/cs838/EM.pdf
Why is this difficult to differentiate exactly? I have seen several arguments in other discussion about EM. Is it simply awkward or actually impossible?