16

I was wondering why most "normal/unsafe" crypto hashes like SHA-256, SHA-512, Whirlpool, RipeMD-160, MD5, etc. are HEX encoded.

But most "secure" crypto hashes (KDF' ) like bcrypt and scrypt are Base64 encoded. Why?

somewhere I heared that Base64 shortend the string for like 20%. Isn't that extremely bad for password hashed during iterations and makes them less collision resistent?

And if Base64 is really for some reason more secure, then why does Argon2 output HEX encoding?

Mike Edward Moras
  • 18,161
  • 12
  • 87
  • 240
Richard R. Matthews
  • 4,545
  • 9
  • 31
  • 49

4 Answers4

33

The algorithms themselves just output binary (i.e. bytes) if you read their specifications. It's the implementation in API's and applications that output the hexadecimals and/or base64.

Sometimes there are also ad hoc standards / common practice that specifies a certain output format. This is for instance the case for the output of the bcrypt password hashing algorithm. In that case it's not just the hash that is displayed but also the type of algorithm, number of iterations and if course salt.

Base64 is more efficient than hex, while hex allows developers to easily see the value of the encoded bytes. The value of the bytes as well as the amount of bytes are just easier to see in hex; the amount of stored bytes is for instance simply half of the displayed hex digits. However for textual formats or indeed larger hash values base64 may be chosen for its efficiency (~33% overhead for base64 vs 100% for hex, assuming each character occupies one byte).

The command line utilities md5sum, sha1sum and their successors have always kept to outputting hex; it's to be expected that hex is therefore more likely to be output by applications that want to remain compatible.


Note that I've changed the case of the terms "Base64" and "HEX" in this answer to lowercase to be compatible with RFC 4648: The Base16, Base32, and Base64 Data Encodings which tries to standardize the encodings. It only uses the uppercase variant in the title. "Hex" is an abbreviation, not an acronym, so all uppercase does not make sense.

Personally I prefer all uppercase for hexadecimals; people recognize the upper part of letters / digits more easily, so it makes sense to use it as default (and on all my old computers the characters were also in uppercase, so they are in most debuggers).


Note that many (online) tools do not clearly specify the input / output format. In that case it makes sense to look for better tools rather than trying to find out what kind of format the tool accepts.

Maarten Bodewes
  • 96,351
  • 14
  • 169
  • 323
29

Using Base64/HEX has nothing to do with security of a hash algorithm.

Base64 and HEX are ways to represent binary data, which is the actual output of a hash algorithm.

Base64 is shorter simple because it uses a larger charset than HEX. (64 characters vs 16 characters)

Besides, algorithms like SHA-256 and SHA-512 are only "unsafe" when used for password hashing(or similar scenarios). In fact, bcrypt/scrypt/PBKDF2 are simply based on these normal algorithms, but make use of some techniques (salt, many iterations with MAC, …), to construct a algorithm that is secure for password hashing.

DDoSolitary
  • 436
  • 3
  • 4
0

Base64 and Hex are just two different representations of the same value. Base64 uses more characters and therefore the representation is shorter.

However, the Hex representation is unambiguous except for upper or lower case letters. This is not true for Base64. Bcrypt, for example, uses its own Base64 encoding, which increases the implementation effort considerably and is an additional source of errors.

If no representation is specified by the algorithm, the hex representation is longer but also less error-prone.

BeloumiX
  • 995
  • 9
  • 19
0

When you look at saving the hash to a SQL server DB, I would also suggest that you use Hex, rather than Base64.

With SQL Sever the default configuration will ignore case (case insensitive), and because Base64 uses letters of different case as part of its encoding (byte value of 'a' is not equal to 'A'), if you would use your hash to find or match a hash in the DB (using the WHERE hash = 'inputValue'), it would mistakenly match a value where the case is different.

Like the original question and answers agree on Base64 requires less storage space as a VARCHAR, but if you have to decide between hash collision safety vs. storage, the former may be preferred for security or accuracy.

Note: If you extract the hash value from your DB using the username and match the password hash in code you also mitigate the case insensitive problem. So in this case Base64 would make sense as the potential problem is mitigated.

Wasted_Coder
  • 101
  • 2