Normally, test vectors, in particular test vectors for intermediate values, are to be found with whatever is the "official specification" for that algorithm. MARS has not been blessed with a standard (few algorithms are), so the "official specification" is what IBM distributes, in particular the MARS package (compressed archive).
This package contains the specification itself as a PDF file (in MARS/Algorithm/mars.pdf), sample implementations, and test vectors (in the MARS/Test_Values/ directory). In particular, have a look at the ecb_ivt.txt file, which contains some "intermediate values".
As a more generic rule, when faced with low-quality or inexistent test vectors ("low-quality" meaning, e.g., that there are some hexadecimal bytes but without clear endianness rules), the usual method is to find a "reference implementation" and sprinkle printf() statements throught its code, so as to get the intermediate values. I recommend using the Java code, which, in practice, will run you into much less trouble than any C code (that's the benefit of the strictness of the Java platform).
The NESSIE project was a big effort at analyzing many cryptographic algorithms, and it produced a large amount of test vectors for all those algorithms. This is a good source (but MARS was not submitted as "NESSIE candidate" so no luck there, in the case of MARS).
If all else fails, you can always send an email to the function author. Most function authors are researchers who will gladly help you out.
On the question of endianness, in particular in C, I strongly recommend against trying to play games with the memory representation, such as using a union between a sequence of bytes and an integer type. Such games limit portability and, more importantly, tend to break in awful ways in the presence of an optimizing compiler (because of strict type aliasing analysis). Instead, use generic encoding and decoding "functions", which you can later on optimize with inline functions and macros, and possibly inline assembly, if you detect a machine and compiler on which you know a smarter and cheaper implementation is possible. As an illustration, have a look at sphlib, which is a library implementing many hash functions in C and Java. In the C code, all the endianness "smartness" is concentrated in the sph_types.h file, which defines uniform names for the integer types and encoding/decoding functions (e.g. sph_dec32be() for 32-bit big-endian decoding).