in my head, i thought "hey, that small string that represents the huge one (the hash) looks pretty neat! if it doesn't represent any other huge ones of same size, then we can say it became, for all effects, a compression of that huge one! but that can't be right, or else we wouldn't need compression algorithms!!".
now i read some of all this back again... and i still don't know exactly what a hash is!
but...
the hash is just an index
looks like the perfect analogy.
the hash tries to index any data in the least amount of bytes necessary.
so the algorithms simply try to give "page numbers" for each possible combination that we would create, while ignoring combinations that are unlikely to exist, so we don't waste page numbers on things that should probably never exist.
a hash could still generate a collision free string
but it would probably need to be too big for any practical purposes. some say it can't be any smaller than the original string!
vs compression
on another note...
we can easily compress infinite amounts of non-random data into a few bytes.
for instance, an "infinite amount of zeroes" could be represented by "0~" for instance.
the whole issue with compressing lies basically in finding patterns in what appears to be random... (thus, the hutter prize, which contributes for A.I. research by simply focusing on compression...)
as for hashing, it doesn't matter. the infinite amount of zeroes would still output the same sized string as everything else being hashed.
prevent confusion
hashes are related to (and often confused with) checksums, check digits, fingerprints, lossy compression, randomization functions, error-correcting codes, and ciphers. although the concepts overlap to some extent, each one has its own uses and requirements and is designed and optimized differently.
the hash functions differ from those concepts above mainly in terms of data integrity: hashing have no intention to keep data.
tl;dr;
in a way, we do data compression to get smaller size of the same data (which can then be decompressed to get to virtually the same data, depending if we do it lossless or not).
and we do hashing to improve finding data ķwhich can easily be confused with decompressing because we get from something small to something huge, but the hash doesn't contain any of the original data).
completely different beasts.