The most basic thing is that requiring a code to be linear (making it a vector space) allows you to apply linear algebra to it. This gives you all sorts of convenient methods for doing things concerning dimensionality and construction of generator matrices and parity check matrices. Simply put, adding more structure to the code makes it more predictable and convenient to analyze. A code that is some random subset of $F^n$ does not enjoy such nice properties.
The study of cyclic linear codes is an even more interesting connection. You can easily prove that every length $n$ linear code is an ideal in the ring $R=F[x]/(x^n-1)$. Since $F[x]$ is a principal ideal domain, $R$ is a principal ideal ring, and therefore every ideal in it is generated by a single polynomial. But such a polynomial must be a divisor of $x^n-1$, and those divisors can be exhaustively computed. Moreover, once you do find the generator polynomial $g(x)$, you automatically have a "parity check polynomial" which generates the dual code: $(x^n-1)/g(x)$.
There is even a connection to be illustrated between the roots of such polynomials and the distance of the code. That is what is used when designing BCH codes.
Another really beautiful result is that of the McWilliams identities, which produce polynomials determining the distribution word weights in linear codes.
If you decided to study convolutional codes (sometimes introduced as codes produced by a linear-shift-register), you find out that (nonexotic ones at least) the shift-register can be encoded in a polynomial, and that certain properties of the code can be determined from the polynomial.
Coding theory and cryptography is a pretty good introduction to using algebra in coding theory.
For advanced learning, I think a very useful resource on coding theory is Huffman and Pless's Fundamentals of Error correcting codes.
Another one I used which had really cool parts on the discrete Fourier transform and linear complexity was in Blahut's Algebraic codes for data transmission.