Because of how many ways there are to implement a hardware design there is not a specific comparison protocol that I am aware of. The main comparison points are power, area, and throughput, and comparisons are generally done on the same process, 32nm GF SOI for example. With a given process, area can be measured in GE or mm$^2$.
The algorithm itself will usually be the limiting factor when it comes to clock speed, and this may not be an issue if the throughput is still good. Unrolling loops uses more area but can increase throughput dramatically. Constants like those in SHA2 use a large amount of area, wheras those in SHA-3 use very little.
An implementation can be tuned to be better at a specific trait, such as power usage or speed, or tuned to have a good combination of two of them at the expense of another. Throughput/area or throughput/watt then become the usual target for a given implementation.
Area Power Throughput
A 1mm^2 10mw 1.00Mb/s
B 1mm^2 30mw 2.45Mb/s
C 3mm^2 10mw 2.45Mb/s
D 3mm^2 30mw 6.00Mb/s
E 3mm^2 90mw 14.7Mb/s
F 6mm^2 90mw 44.1Mb/s
Each implementation has an advantage and a cost. D has a 100% increase in throughput/area and throughput/watt over A, but at triple the area. B and C have the same throughput, but one uses less area and the other less power. For a given area, triple the power budget gives a 2.45X increase in throughput. Because the 3mm area implementation is able to radiate more heat thanks to increased surface area, it can be pushed beyond the power limits of the 1mm area implementation. Despite using 1/3 the area and 1/9 the power, A is only 14.7 times slower than E. F is 3X faster than E at the same power, with only double the area used.
So how do you compare them? Which one is the best? The high area implementation obviously has the best throughput, and the best throughput/watt, but you may not have the area to spare, or the power. Would one consider A more efficient than E or the other way around? E has 63% better throughput/watt than A, but at the same wattage A has 22% better throughput/area.
In the end the a good comparison can only be made by keeping the process a constant, then deciding which implementation is better or more efficient based on the requirements of the final product.