I understand the Kullback-Leibler divergence well enough when it comes to a probability distribution over a single variable. However, I'm currently trying to teach myself variational methods and the use of the KL divergence in conditional probabilities is catching me out. The source I'm working from is here.
Specifically, the author represents the KL divergence as follows:
$$ \operatorname{KL} (Q_{\phi} (Z|X) || P(Z|X)) = \sum_{z \in Z} q_{\phi} (z|x) \log\frac{q_{\phi} (z|x)}{p(z|x)} $$
Where the confusion arises is on the summation across $Z$. Given that $z \in Z$ and $x \in X$, I would have expected (by analogy with conditional entropy) a double sum here of the form:
$$ \operatorname{KL} (Q_{\phi} (Z|X)||P(Z|X)) = \sum_{z \in Z} \sum_{x∈X} q_{\phi} (z|x) \log\frac{q_{\phi} (z|x)}{p(z|x)} $$
Otherwise, it seems to me that KL is only being calculated for one sample from $X$. Am I missing something basic here? And if my intuitions are off, any tips on getting them back on track would be useful––I'm teaching myself this stuff, so I don't have the benefit of formal instruction.