I'm reading about the linear cryptanalysis of an SPN and I have some questions about the practicality of this. The example I'm looking at is from 3.3.3 of Stinson's Book and I believe the same example is given in these notes (pg.13 for the diagram)
First, a question of practicality. The SPN is comprised of many independent S-boxes. In both Stinson's book and the notes we seem to have the linear approximation for each S-box (in consideration). How would one get this information in practice? It seems like an attacker would need to know both the input and output of each S-box in order to get the linear approximation. But in the attacks, the only information known is the plaintext-ciphertext pairs. So, for instance how does one get that $S_{2}^1$ can be approximated by $U_5^1\oplus U_7^1 \oplus U_8^1 \oplus V_6^1$?
Here, the $U^1$'s are in the inputs into the 1st round S-boxes and the $V^1$'s are the outputs.
Second, you would need to know for instance that $S_2^2$ feeds into $S_2^3$ and $S_2^3$. Wouldn't this require knowledge of the permutation used by the SPN? If so, how does an attacker get this knowledge.
Lastly, how does these linear approximations allows us to get any of the subkey bits. I do not find the explanations clear enough. Does anybody have a reference that explains this clearly?