Limited Entropy Dot Com Not so random thoughts on security featured by Eloi Sanfèlix

22Nov/091

Crypto Series: Cryptographic hash functions – SHA-2

Posted by Eloi Sanfèlix

So far, we've looked at block and stream ciphers in this series, including examples of each of them. Before going into asymmetric crypto I want to explain a little bit about cryptographic hash functions and some of their applications. We'll look at hash functions in general and at the SHA-1 hash function as an example.

Note that I'll often skip the 'cryptographic' adjective throughout this series of posts, but I'll always refer to cryptographic hash functions and not to regular hash functions. And as usual, this is by no means complete but just tries to give a basic understanding of what hash functions are and how they usually look like.

I must say I never studied hash functions too deeply, so this stuff will serve as a reminder for me as well. If something is not as accurate as you'd hope for, let me know in the comments ;-).

Cryptographic Hash functions: properties

A cryptographic hash function is defined as a series of operations over an input message of arbitrary length, producing an output of fixed length (hash or message digest) such that a change to the message would not come unadvertised. It should be easy to compute a hash function from a message, but given a hash value it should be infeasible to find a message that would produce that value. Further, given a message it should be infeasible to find a second message producing the same message digest and as I stated before, it should be infeasible to modify a message without modifying its hash value.

Therefore, the desired properties of a cryptographic hash function are as follows:

  • Preimage resistance: given a hash value h, it should be infeasible to find a message m with h=hash(m). Otherwise the function would be vulnerable to preimage attacks
  • Second preimage resistance: given a message m_1 it should be infeasible to find a second message, m_2 which provides the same message. I.e., given m_1, it should be difficult to find m_2 such that h=hash(m_1) = hash(m_2) . Otherwise, the function is said to be vulnerable to second preimage attacks.
  • Collision resistance: It should be difficult to find two messages with the same message digest. Obviously, given a hash function with output size of n bits, if you try 2^{n}+1 messages, you'll get two of them with the same hash. The theory behnd birthday attacks tells us that for a n bit hash function we'd have to try out about 2^{n/2} inputs to find a collision. That number is called the birthay bound.

Typical structure of a hash function

A hash function typically consists of a compression function which takes blocks of a fixed length as input and produced blocks of a fixed length (the output length of the hash function). Additionally, the output of the previous block is fed back to the input so that the next block depends in all the previous blocks. Otherwise, the hash function would be looking at the last block only 😉

Merkle-Damgard construction

Merkle-Damgard construction

The structure shown is known as the Merkle-Damgård construction, and most popular hash functions are based on this construction. However, alternative structures exist and many of the proposals for the SHA-3 contest are based on different constructions.

The SHA-2 family

Although MD5 and SHA-1 are way more popular, I decided to take a look and describe here the structure of the SHA-2 family of hash functions. The reason for this is that MD5 was broken a while ago, first by dr. Wang's team and later by a group of researchers including dr. Benne de Weger. I already talked about it here, although it's only in Spanish. You can see the hashcalc project's page if you don't read Spanish ;-).

Further, SHA-1 is very similar to MD5 and the same sort of problems usually apply to it. Therefore, I chose to look at the next family of hash functions, the SHA-2 family. This includes several hash functions with different output lengths: SHA-224, SHA-256, SHA-384, and SHA-512 where the number defines the number of output bits.

SHA-256 and SHA-512 use 32 and 64 bit words respectively, while SHA-224 and SHA-384 are just truncated versions of them. In the remaining of this section I'll explain SHA-256 since SHA-512's structure is basically the same but with different word size and initial values.

Bascially, the input message is divided in 512 bit blocks M_i , and is padded with additional information that includes the length of the original message. Then, for each of these blocks a message schedule is run which produces 64 variables W_t .

These 64 variables are processed with the compression function shown in this picture, where variables a..f are initialized according to the standard:

SHA-2 Compression function

SHA-2 Compression function

After this processing, the intermediate hash value is computed as the addition (modulo 32) of the variables a..f and the previous intermediate hash value. This process is run for each message block and finally yields the message digest.

Of course, this is a very high level description of the algorithm. If you want to know the details, see the FIPS 180-2 standard publication.

The SHA-3 contest

Currently, an open contest is being held by the NIST to create a new hashing standard, SHA-3. Currently, the contest is in its second round and there are 14 second round candidates. The Second SHA-3 Candidate Conference is planned for August 2010 and the idea is to publish a revised Hash Function Standard by 2012.

More information on the contest and the submissions can be found in the NIST Hash competition website.

31Dec/081

CA falsa, ¿se rompe Internet (o su PKI) ?

Posted by Eloi Sanfèlix

Ayer día 30 se publicó en el CCC una vulnerabilidad en la PKI de Internet, explicando cómo se había conseguido tener un certificado perfectamente válido para la mayoría de los navegadores web pero con datos falsos. Se obtuvo un certificado firmado por una autoridad de certificación (CA) de confianza en la mayoría de navegadores, al que se le pudieron cambiar los datos de identidad: la CA firmó un certificado, pero se obtuvo otro derivado para el que la misma firma sigue siendo válida y los datos se han modificado arbitrariamente.

El ataque fue publicado por un grupo de investigadores, contando entre ellos con Benne de Weger, profesor de la TU Eindhoven que me dio clase de Cryptography 2 el año pasado.

¿Cómo es posible esto?

Básicamente, se ha usado la misma técnica que para la predicción de presidentes de los EE.UU. de la que ya hablé hace un tiempo. Se ha obtenido un certificado válido para un sitio aparentemente legal en una CA de confianza, y luego se ha creado un certificado que genere colisiones en MD5.

El método exacto es, como ya dije, tener un prefijo del certificado a elección, una sección aleatoria para hacer colisionar algún resultado intermedio del hash MD5, y después una parte final común a ambos certificados. Puesto que a partir de la colisión, ambos documentos son idénticos, el hash MD5 de éstos será el mismo.

Aun así, no es tan fácil buscar la colisión, puesto que hay ciertos elementos que varían entre un certificado y otro y se deben predecir. Por ejemplo, el número de serie del certificado o el periodo de validez de los certificados que nos va a dar la CA.

Para el ataque, los investigadores probaron con diferentes CAs y trataron de predecir estos valores. Finalmente, seis CAs comerciales les dieron certificados que pudieron usar para crear colisiones.

¿Y qué significa esto?

Básicamente, alguien podría crearse su certificado firmado por una de estas entidades para usos maliciosos. Por tanto, podrían hacer un MiTM en conexions SSL sin que vieramos que hay un intruso porque el certificado usado sería aceptado por nuestro navegador sin rechistar.

Ahora bien, es esto realmente un problema taaaan grave? Para empezar, para la mayoría de los usuarios el cartel de 'el certificado no es válido' solo significa algo así como 'tienes que darle a aceptar para entrar'. En esta situación, ni certificados ni leches, no merece la pena perder el tiempo en crearte un certificado válido si los usuarios van a pasar de los mensajes y entrar igualmente.

Además, el problema no es de la PKI sino de MD5. La solución pasa por irnos a otro sistema de hash como SHA-256 o SHA-512, y usar IDs aleatorios para los certificados firmados. Todo lo que se pueda predecir puede llevar a problemas, ya lo vimos con DNS y lo vemos ahora con los certificados.

Como dicen en layer 8:

What I’m here to say is, I don’t really think this matters all that much except to security researchers.  Here’s why:  normal users’ trust has very little to do with certificates.

Otro sitio donde intentan tranquilizar es en securosis. Y además veo que Verisign ya ha arreglado el entuerto en su RapidSSL.

Si queréis información técnica de cómo funciona esto exactamente, aquí lo explican todo con detalle. Yo seguro que me he dejado algo... pero solo quería decir que no es taaan malo y que se veía venir después de que publicaran los anteriores ataques a MD5. Y lo mismo con SHA-1, si se sigue usando probablemente acabará en algo muy similar a esto.