# Limited Entropy Dot ComNot so random thoughts on security featured by Eloi Sanfèlix

22Nov/091

## Crypto Series: Cryptographic hash functions – SHA-2

So far, we've looked at block and stream ciphers in this series, including examples of each of them. Before going into asymmetric crypto I want to explain a little bit about cryptographic hash functions and some of their applications. We'll look at hash functions in general and at the SHA-1 hash function as an example.

Note that I'll often skip the 'cryptographic' adjective throughout this series of posts, but I'll always refer to cryptographic hash functions and not to regular hash functions. And as usual, this is by no means complete but just tries to give a basic understanding of what hash functions are and how they usually look like.

I must say I never studied hash functions too deeply, so this stuff will serve as a reminder for me as well. If something is not as accurate as you'd hope for, let me know in the comments ;-).

Cryptographic Hash functions: properties

A cryptographic hash function is defined as a series of operations over an input message of arbitrary length, producing an output of fixed length (hash or message digest) such that a change to the message would not come unadvertised. It should be easy to compute a hash function from a message, but given a hash value it should be infeasible to find a message that would produce that value. Further, given a message it should be infeasible to find a second message producing the same message digest and as I stated before, it should be infeasible to modify a message without modifying its hash value.

Therefore, the desired properties of a cryptographic hash function are as follows:

• Preimage resistance: given a hash value h, it should be infeasible to find a message m with h=hash(m). Otherwise the function would be vulnerable to preimage attacks
• Second preimage resistance: given a message $m_1$ it should be infeasible to find a second message, $m_2$ which provides the same message. I.e., given $m_1$, it should be difficult to find $m_2$ such that $h=hash(m_1) = hash(m_2)$. Otherwise, the function is said to be vulnerable to second preimage attacks.
• Collision resistance: It should be difficult to find two messages with the same message digest. Obviously, given a hash function with output size of n bits, if you try $2^{n}+1$ messages, you'll get two of them with the same hash. The theory behnd birthday attacks tells us that for a n bit hash function we'd have to try out about $2^{n/2}$ inputs to find a collision. That number is called the birthay bound.

Typical structure of a hash function

A hash function typically consists of a compression function which takes blocks of a fixed length as input and produced blocks of a fixed length (the output length of the hash function). Additionally, the output of the previous block is fed back to the input so that the next block depends in all the previous blocks. Otherwise, the hash function would be looking at the last block only 😉

Merkle-Damgard construction

The structure shown is known as the Merkle-Damgård construction, and most popular hash functions are based on this construction. However, alternative structures exist and many of the proposals for the SHA-3 contest are based on different constructions.

The SHA-2 family

Although MD5 and SHA-1 are way more popular, I decided to take a look and describe here the structure of the SHA-2 family of hash functions. The reason for this is that MD5 was broken a while ago, first by dr. Wang's team and later by a group of researchers including dr. Benne de Weger. I already talked about it here, although it's only in Spanish. You can see the hashcalc project's page if you don't read Spanish ;-).

Further, SHA-1 is very similar to MD5 and the same sort of problems usually apply to it. Therefore, I chose to look at the next family of hash functions, the SHA-2 family. This includes several hash functions with different output lengths: SHA-224, SHA-256, SHA-384, and SHA-512 where the number defines the number of output bits.

SHA-256 and SHA-512 use 32 and 64 bit words respectively, while SHA-224 and SHA-384 are just truncated versions of them. In the remaining of this section I'll explain SHA-256 since SHA-512's structure is basically the same but with different word size and initial values.

Bascially, the input message is divided in 512 bit blocks $M_i$, and is padded with additional information that includes the length of the original message. Then, for each of these blocks a message schedule is run which produces 64 variables $W_t$.

These 64 variables are processed with the compression function shown in this picture, where variables a..f are initialized according to the standard:

SHA-2 Compression function

After this processing, the intermediate hash value is computed as the addition (modulo 32) of the variables a..f and the previous intermediate hash value. This process is run for each message block and finally yields the message digest.

Of course, this is a very high level description of the algorithm. If you want to know the details, see the FIPS 180-2 standard publication.

The SHA-3 contest

Currently, an open contest is being held by the NIST to create a new hashing standard, SHA-3. Currently, the contest is in its second round and there are 14 second round candidates. The Second SHA-3 Candidate Conference is planned for August 2010 and the idea is to publish a revised Hash Function Standard by 2012.

More information on the contest and the submissions can be found in the NIST Hash competition website.