Well the shift rows is just one operation within a round, and there are key bits mixed into each round so this does not necessarily introduce a weakness, and no such weakness has been discovered.
The shift rows, together with mixcolumns and subbytes introduces very efficient diffusion in AES, increasing its strength. There are related questions and answers here, for example see this answer.
At a higher level, Kerchoff's principles say the algorithm should be public and only the key is secret, for proper cryptanalysis.
As to why $GF(2^8)$ as opposed to larger fields, it's large enough to have been amenable to nearly exhaustive (during design time) linear/differential and other types of cryptanalysis by computational (not just theoretical) methods.
It also helps in the development of a modular design philosophy, where the designers first started with the Square cipher, and had a conceptual $4\times4$ arrangement of Sboxes; the structure is also illustrated in the answer I linked.
Finally, implementing algebraically complex operations on a larger field is more costly, in terms of number of instructions. Also, let's say you design an Sbox using mainly non algebraic considerations, then you'd want to obtain, say, the algebraic normal form, from the Sbox input/output table. In a large field, this would computationally more expensive, see here for details.
The book "The Design of Rijndael" by Daemen/Rijmen gives a careful account of the design principles used. The AES proposal document is also useful in this regard.