Articles
Properties of Secure Hash Functions
By Erhan Kartaltepe
Recent discussions on NIST's secure hash function contest reminded me of the cryptography lectures I gave at UTSA. One of the hardest concepts my students had grasping was in defining properties of a secure cryptographic function, mostly because of the number theory, but also in realizing the differences between three properties of a secure hash function: preimage resistance, collision resistance, and second preimage resistance.
Primer on Secure Cryptographic Hash Function Properties
In dealing with cryptography, it helps to remember that a message to a cryptographic function is a number. For example, "hi" in ASCII is represented as the bit string 00000110 00000111. Evaluated as a 16 bit number, "hi" is equal to 210 +29 + 22+21+20, or 26729. Thus, hash("hi") is better thought as hash(26729). Also, in cryptography, "hard" is defined as it taking a reasonable adversary super polynomial time to complete the task. In hash functions, the ideal case is, for a hash function with a range of size 2k, it should take the adversary 2k/2 or 2k-1 attempts before he or she finds the solution.
Preimage Resistance:
A hash function is preimage resistant if, given a hash value h, it is hard to find any message m such that h = hash(k, m), where k is the hash key.
This is the most usual property developers think of when they think of a cryptographic hash function. Unlike an encryption, there should be no "dehash" function. A good preimage resistant function should be "hard" to invert. An example of a hash function that is not preimage resistant is h = hash(k, m) = m mod 2k, since it is very easy to invert the function, after guaranteeing that for any value of h, a message of size m can be found (basically, every message of the form h + x 2k, where x is an integer.
Collision Resistance
A hash function is collision resistant if, given two messages m1 and m2, it is hard to find a hash h such that h = hash(k, m1) = hash(k, m2), where k is the hash key.
What this says is that given complete control over picking any messages you want, it should be "hard" to find two of them such that have the same hash value the same hash. An example of a hash function that is not collision resistant is hash(k, m) = 4, since all hashes result in 4, making it 100% likely that two messages will have the same hash. Since hash functions have an infinite domain space (that is, a hash function should take any message of any size), but a finite range space (for example, the SHA-256 algorithm has a range space of 256 bits), a good collision resistant hash function should have each hash value be about as evenly distributed as possible. For example, given a hash function with a range space of 2128 and a message m, any number between 0 and 2128 - 1 should have the same chance (1 out of 2128) of being hash(k, m). One last sub-property would be there would be no "hints" given in the hash function. Thus, even a change of one bit in the message should have large changes in the output (ideally, about half the bits).
Second Preimage Resistance
A hash function is second preimage resistant if given a message m1, it is hard to find a different message m2 such that hash(k, m1) = hash(k, m2), where k is the hash key.
This is the toughest of the three for students in cryptography to get their head around. On the surface this seems to be an easier version of collision resistance, because it seems that we have extra information to use. We only need one more message rather than two. In actuality, this is actually the birthday paradox in action. Second preimage resistance is a much harder standard for a hash function to achieve than collision resistance.
Birthday Paradox
The birthday paradox basically asks two questions. "How many people does it take before the odds of having two people with the same birthday are 50% or better?" and "How many people does it take before the odds of having another person with the same birthday as you are 50% or better?" The first is asking for two random people (think m1 and m2), while the second is asking for one given one already (think m2 given m1).
How many randomly chosen people would it take before the odds that any two of them have the same birthday is greater than 50%? This is a hard problem to solve, but it turns out it is very easy to solve its opposite. For example, for a group of one person the odds that two of them do not have the same birthday is 100%, but for a group of two, the odds are 1*(1 - 1/365), or roughly 99.8%. From this, we can see that the odds of them sharing a birthday are 100% - 99.8%, or 0.2%. For three, the odds are 1 - (1*(1 - 1/365)* (1 - 2/365)), or 0.9%, and so on. This formula is 1 - product(1 - n/365, 1, n) After 23 people, the odds go over 50%. This is collision resistance.
However, given a particular birthday (for example, March 4), the number of people needed to get better than even odds is different. This is known as the "Same Birthday as You" question. The formula is more straightforward, 1 - 364n/365. For two people, the odds are the same, but it drops off rather quickly. For n=23, for example, the odds are only 6.1%, as opposed to 50.7%. It turns out that a group of roughly 253 people are necessary to have the same birthday, much larger than before. Figure 1 shows the graph of the two problems. The solution to the "Same Birthday as You" question is exemplifies second preimage resistance.
Figure 1. "Birthday Problem" (p) vs. "Same Birthday as You" (q) as the Population Increases(1)
Differences Between the Three Properties
The birthday paradox expresses succinctly the counterintuitive nature of second preimage resistance. Although you are given both a hash function and one message and asked to find another, this is actually a harder problem to solve than given just the hash function and finding two messages that satisfy these properties.
Of course, then, it is possible for a collision resistant function to not be second preimage resistant, but not vice versa. It is also possible for a preimage resistant function to not be second preimage resistant, and vice versa.
This turns out to be easy to prove. Let's take a simple hash function H1: {0,1}8 → {0,1}8 | H1(m) = 0. This says that a hash function H has a domain of one byte (a string of eight bits), a range of one byte, and that H1 always returns 0. Thus, H1(0) = 0, H1(1) = 0,..., H1(255) = 0. This is clearly preimage resistant because given the hash function H and its output, it is impossible to know what m is (since every m hashes to zero). However, it is easy to find two values m1 and m2 that hash to the same value, since again, they all hash to zero. This property means H1 is not second-preimage resistant.
Taking the other extreme, let's take a hash function H2: {0,1}8 → {0,1}8 | H2(m) = m. Thus, H2(0) = 0, H2(1) = 1,..., H2(255) = 255. This is certainly second-preimage resistant because given m1, it is impossible to find m2 such that H2(m1) = H2(m2), because each input maps to a unique hash-there are no collisions. However, it is easy to find an m given H2(m), since H2(m) = m. Thus, H2 is not preimage resistant.
Conclusion
Of course, it's important to realize that hashes are "digests", and not "encryption". Encryption is a two-way operation. That is, given a message m, an encryption algorithm enc and its corresponding decryption algorithm dec, dec(enc(m)) = m. Hash functions are different-there is (ideally) no way to "decrypt" a hashed message, per preimage resistance. Because of this, it is common to say that hash functions are one-way operations.
Resources
http://en.wikipedia.org/wiki/Cryptographic_hash_function
Introduction to cryptographic hash functions.
http://www.csrc.nist.gov/groups/ST/hash/sha-3/index.html
The official NIST SHA-3 algorithm competition.
http://www.damninteresting.com/?p=402
A great article on the birthday paradox and its complexity.
Works Cited
1. Wikipedia. Birthday Problem. Wikipedia. [Online] 11 26, 2007. [Cited: 11 26, 2007.] http://en.wikipedia.org/wiki/Birthday_problem.
