1

This paper, Asymptotic theory of information-theoretic experimental design, studies Bayesian experimental design where in each round $n$, the experimenter selects a stimuli $X_n$ that maximizes mutual information between the parameter $\theta$ and observations $Y_n$ given previously collected data, i.e., $$X_n=\arg\max_{x} ~~\text{I}(\theta;Y\mid \{x_i,y_i\}_{i=1}^{n-1},x).$$ In the proof of Lemma 1, the author suggested that by a uniform law of the large numbers argument , we have $$\log f_{N}(\theta) \approx \sum_{i} D_{K L}\left(\theta_{0} ; \theta \mid x_{i}\right),$$ (the term on the right is the expectation of that on the left, where the expectation is taken under the true parameter $\theta_0$), where $\theta_0$ is the true parameter and $$f_{N}(\theta)=\prod_{i=1}^{N} \frac{p\left(y_{i} \mid x_{i}, \theta\right)}{p\left(y_{i} \mid x_{i}, \theta_{0}\right)},$$

$$D_{K L}\left(\theta_{0} ; \theta \mid x\right) \equiv \int_{Y} d p\left(y \mid x, \theta_{0}\right) \log \frac{p\left(y \mid x, \theta_{0}\right)}{p(y \mid x, \theta)}.$$

My question is that why the above step holds here since the $Y_i$s are clearly not i.i.d. and I cannot see how to apply the uniform law of large numbers?

In addition, it would be nice if someone can provide some materials, other than the paper here, on the consistency of Bayesian experimental design using mutual information as the utility function.

Thanks in advance!

RobPratt
  • 50,938
Qcer
  • 49

0 Answers0