Our goal is predicting the value $y$, given a vector of values $(x_1, \dots ,x_n)$ for some $n\geq 1$.
Binary random variable $Y$ is distributed uniformly on the set $\{0, 1\}$, so $P(Y = 1) = P(Y = 0) = 1/2$.
Each random variable $X_i$ is also binary and takes values from the set $\{0, 1\}$. Its conditional distribution is determined by the parameter $p_i = P(X_i = y\mid Y = y)$. Hence, $X_i$s are conditionally independent.
We can assume $$p_1 \geq p_2 \geq \dots \geq p_n\geq 1/2\text{.}$$
Our examples in the data are generated as follows:
- $y$ is drawn according to the distribution of $Y$
- $x_i$ is drawn according to the distribution $X_i$ conditional to $y$
However, the value $y$ is hidden from us, we only know the values $x_i$. Given these values, we predict the value $\hat{y}(x_1, \dots, x_n)$.
What is the maximal expected accuracy (over the possible imputs $(x_1, \dots , x_n)$), i.e., $$\max_{\hat{y}}\;E_{X_1, \dots,X_n}P_Y[\hat{y}(X_1,\dots ,X_n) =Y]?$$
I managed to solve this for $n = 1$ and $n = 2$. The key observation is the following: $$ P(Y = y\mid \vec{X} = \vec{x}) = \frac{P(Y = y)}{P(\vec{X} = \vec{x})}P(\vec{X} =\vec{x}\mid Y = y) = \frac{P(Y = y)}{P(\vec{X} = \vec{x})}\prod_i P(X_i = x_i \mid Y = y )\text{,} $$ where $\vec{X} = (X_1, \dots, X_n)$ and $\vec{x} = (x_1, \dots, x_n)$. Since $P(Y = y) = 1/2$ and $P(\vec{X} = \vec{x})$ does not depend on $y$, we will maximize the probability $P(Y = y\mid \vec{X} = \vec{x})$ if we maximize the value of $$ \prod_i P(X_i = x_i \mid Y = y) = \left(\prod_{i:\; x_i = y} p_i\right)\left(\prod_{i:\; x_i \neq y} (1 - p_i)\right) \quad (*) $$
Solutions:
$n = 1$:
The expression $(*)$ has value $p_1$ for $y = x_1$ and $1-p_1$ for $y = 1 - x_1$. Since $p_1\geq 1 - p_1$ by our assumption, it is optimal to predict $\hat{y}(x_1) = x_1$ and the answer to the question is $p_1$.
$n = 2$:
Now, we give the values of the $(*)$ in the table: $$\begin{array}{c|c|c|c} x_1 & x_2 & y = 0 & y = 1 \\ \hline 1 & 1 & (1-p_1)(1-p_2) & \color{red}{p_1 p_2} \\ 1 & 0 &(1-p_1)p_2 & \color{red}{p_1(1-p_2)}\\ 0 & 1 & \color{red}{p_1(1-p_2)}& (1 - p_1)p_2 \\ 0 & 0 & \color{red}{p_1 p_2} &(1-p_1)(1-p_2) \\ \end{array}$$ By taking into account $p_1\geq p_2 \geq 1/2$, we can show that the value of $y$ that corresponds to the red coloured $(*)$-value, is the optimal one. Again, we can see that $\hat{y}(x_1, x_2) = x_1$. The answer to the question is also again $p_1$.
Problems:
E.g., $n = 3$:
Now, the table would have eight options for $\vec{x}$. The lines, such as $\vec{x} = (1, 1, 1)$ are easy, since $(1 - p_1)(1 - p_2)(1 - p_3)\leq p_1 p_2 p_3$ because $1 - p_i \leq p_i$. However, the optimal value for $y$ is not clear for the input vector $\vec{x} = (1, 0, 1)$.
Thoughts:
Is it always optimal to predict $\hat{y}(x_1, \dots, x_n) = x_1$? By doing so and proving that $P(x_1 = y) = p_1$ (I can show that), we can assert that the answer to the question is at least $p_1$. Maybe, there is some more elegant approach than computing the red values in the table.
EDIT:
Is it always optimal to predict $\hat{y}(x_1, \dots, x_n) = x_1$?
When $n\geq 3$: not necessarily. With $p_1 = 0.8$ and $p_2 = p_3 = 0.7$, we can predict the right value of $y$ with probability $\doteq 0.825$ (result bases on simulation).