## Crypto Series: Authentication codes

This time we'll treat two well known techniques used to solve a common problem in cryptography: authentication. To put it simple, authentication is the process of establishing an identity or a message's origin.

To achieve this using symmetric cryptography, two basic mechanisms exist. The first of them, commonly referred to as *Message Authentication Codes *(MAC), is based on using block ciphers with a shared key between the party claiming an identity (or sending a message) and the party verifying the identity or the origin of the message.

The second one, known as *Hashed Message Authentication Codes *(HMAC) is based on the use of a hash function together with some shared key. In the remaining of this post, I briefly describe the basic idea behind these two ways of assuring message authentication.

**Message Authentication Codes using block ciphers**

A common way to authenticate messages is to use a block cipher, such as DES, in a mode of operation which makes the latest encrypted block dependent on both the key and all the previous plaintext blocks. For instance, one can think of using 3DES in CBC mode to create a MAC over a message: encrypt the message in CBC mode using 3DES with the shared key, get the last output block and attach it to your original message.

When the recipient gets the message with its MAC, it does the same operation: encrypt each block using CBC mode and takes the last block. The result is compared against the MAC attached to the message: if there is a match, the sender of the message must have known the key (unless the encryption used is broken).

Despite being one of the most popular techniques for MAC generation (if not the most popular), CBC-MAC has some security problems and other techniques exist. For instance, you can take a look at Special Publication 800-38B by NIST.

**Hashed Message Authentication Codes**

HMAC is a standardized way of using hash functions for authentication purposes. The idea is to incorporate the usage of a key into a hash function, in such a way that the resulting hash could not be produced without knowing the key.

The obvious choices of prefixing the message with the key or appending the key after the message before computing the hash have security problems (see Stop using unsafe keyed hashes, use HMAC by Nate Lawson). Therefore, a slightly more complex structure was invented to avoid such problems.

The HMAC construction is defined as follows:

Where opad (outer pad) is the constant 0x5c...5c and ipad (inner pad) is the constant 0x36...36. These constants, as well as the key, are of the same length as the hash function's block length.

With this, one would follow the same approach as with any MAC: compute the HMAC value for the given message, and send it attached to the message. The recipient will perform the same computation, and if it matches the one attached to the message he will conclude that the message was sent by someone who knows K (which is hopefully only the person/entity he shared it with 😉 ).

This concludes my introduction to authentication codes. If you are looking for a good security analysis on HMAC functions, wait for Nate's post because I'm sure it will be very interesting.

## Crypto Series: Cryptographic hash functions – SHA-2

So far, we've looked at block and stream ciphers in this series, including examples of each of them. Before going into asymmetric crypto I want to explain a little bit about cryptographic hash functions and some of their applications. We'll look at hash functions in general and at the SHA-1 hash function as an example.

Note that I'll often skip the 'cryptographic' adjective throughout this series of posts, but I'll always refer to cryptographic hash functions and not to regular hash functions. And as usual, this is by no means complete but just tries to give a basic understanding of what hash functions are and how they usually look like.

I must say I never studied hash functions too deeply, so this stuff will serve as a reminder for me as well. If something is not as accurate as you'd hope for, let me know in the comments ;-).

**Cryptographic Hash functions: properties**

A cryptographic hash function is defined as a series of operations over an input message of arbitrary length, producing an output of fixed length (hash or message digest) such that a change to the message would not come unadvertised. It should be easy to compute a hash function from a message, but given a hash value it should be infeasible to find a message that would produce that value. Further, given a message it should be infeasible to find a second message producing the same message digest and as I stated before, it should be infeasible to modify a message without modifying its hash value.

Therefore, the desired properties of a cryptographic hash function are as follows:

**Preimage resistance**: given a hash value*h*, it should be infeasible to find a message*m*with*h=hash(m)*. Otherwise the function would be vulnerable to preimage attacks**Second preimage resistance**: given a message it should be infeasible to find a second message, which provides the same message. I.e., given , it should be difficult to find such that . Otherwise, the function is said to be vulnerable to second preimage attacks.**Collision resistance**: It should be difficult to find two messages with the same message digest. Obviously, given a hash function with output size of*n*bits, if you try messages, you'll get two of them with the same hash. The theory behnd birthday attacks tells us that for a*n*bit hash function we'd have to try out about inputs to find a collision. That number is called the birthay bound.

**Typical structure of a hash function**

A hash function typically consists of a compression function which takes blocks of a fixed length as input and produced blocks of a fixed length (the output length of the hash function). Additionally, the output of the previous block is fed back to the input so that the next block depends in all the previous blocks. Otherwise, the hash function would be looking at the last block only 😉

The structure shown is known as the Merkle-Damgård construction, and most popular hash functions are based on this construction. However, alternative structures exist and many of the proposals for the SHA-3 contest are based on different constructions.

**The SHA-2 family**

Although MD5 and SHA-1 are way more popular, I decided to take a look and describe here the structure of the SHA-2 family of hash functions. The reason for this is that MD5 was broken a while ago, first by dr. Wang's team and later by a group of researchers including dr. Benne de Weger. I already talked about it here, although it's only in Spanish. You can see the hashcalc project's page if you don't read Spanish ;-).

Further, SHA-1 is very similar to MD5 and the same sort of problems usually apply to it. Therefore, I chose to look at the next family of hash functions, the SHA-2 family. This includes several hash functions with different output lengths: SHA-224, SHA-256, SHA-384, and SHA-512 where the number defines the number of output bits.

SHA-256 and SHA-512 use 32 and 64 bit words respectively, while SHA-224 and SHA-384 are just truncated versions of them. In the remaining of this section I'll explain SHA-256 since SHA-512's structure is basically the same but with different word size and initial values.

Bascially, the input message is divided in 512 bit blocks , and is padded with additional information that includes the length of the original message. Then, for each of these blocks a *message schedule* is run which produces 64 variables .

These 64 variables are processed with the compression function shown in this picture, where variables a..f are initialized according to the standard:

After this processing, the intermediate hash value is computed as the addition (modulo 32) of the variables a..f and the previous intermediate hash value. This process is run for each message block and finally yields the *message digest*.

Of course, this is a very high level description of the algorithm. If you want to know the details, see the FIPS 180-2 standard publication.

**The SHA-3 contest**

Currently, an open contest is being held by the NIST to create a new hashing standard, SHA-3. Currently, the contest is in its second round and there are 14 second round candidates. The Second SHA-3 Candidate Conference is planned for August 2010 and the idea is to publish a revised *Hash Function Standard *by 2012.

More information on the contest and the submissions can be found in the *NIST Hash competition* website.

## Crypto Series: Mifare Crypto1

Let's go back into Cryptography. This time I'll tell you how the (in)famous Crypto1 cipher works. It is used in the Mifare Classic RFID tags, typically used for building access control but also for many other systems such as the Oyster Card in London, the OV-Chipkaar in The Netherlands, etc.

We won't talk about the protocol details, nor about how the published attacks work. You'll find a couple of interesting links at the end though ;-).

Note: Images obtained from the papers linked at the end of the post.

**The Crypto1 cipher**

Crypto1 is a proprietary stream chiper from NXP found in the RFID tags from the Mifare Classic family. At first, it was studied by Karsten Nohl reverse engineering the chip itself. This information was published in the CCC 07, although not many details about the cipher were published.

In parallel, the Radboud Universiteit from Nijmegen was studying this kind of cards and with the help of the information published at CCC completely reverse engineered the cipher and published the details. Let's see how it works then...

## Linear Feedback Shift Registers (LFSRs)

In this post I'll provide a very simple description of linear feedback shift registers (LFSR for short). Further, we'll see how they are used to create *stream ciphers*. And all these things without going into mathematical details, for which I refer the interested reader to documents such as the *Handbook of Applied Cryptography* or this LFSR Reference.

**Shift Registers**

A shift register is basically a construction with interconnected several memory cells, where every cell stores one bit. So, the value of these cells conforms the so-called *state* of the register. When the register steps from one state to the next one (usually at each clock tick), the new state is created by simply shifting the bit in a cell to the cell next to it. Thus, the right-most bit *goes out* of the register, and a new bit *goes into* the left-most cell.

In this picture we can see an implementation of a 4 bit shift register:

We can see an input line (*Data in*), 4 points where one can read the current state (Q1-Q4) and a clock input, which governs the register telling it in which moment it should step into the next state.

**Linear Feedback Shift Registers (LFSRs)**

Well, once you know what a shift register is, it is fairly straightforward to understand how a LFSR works. We just take the previous register and set the input as a linear combination of the different cells. Since there is a loop which *feeds* the register based on its previous state, we have *feedback*. Further, since this feedback is based on a linear function, then we have *linear feedback*, hence the name :).

**LFSRs' use in cryptography**

So far, you probably have guessed that the main use of an LFSR in encryption systems is generating a series of *pseudo-random* bits to be used as a *key stream* in a stream cipher.

The idea is to generate a stream of bits with the minimum repetition possible, i.e. with maximal period. For its study, the connections in an LFSR are usually represented as a *polynomial* and the properties such a polynomial needs to meet to achieve maximal period are analyzed.

Basically, we need to get the LFSR to run through all its possible states before going back into the first one. So, if we have 16 bit registers, we'd want to have the LFSR pass through the 2^16-1 states before cycling back to the first one. And yes, I said 2^16-1 instead of 2^16 because the zero state should never appear. Otherwise the LFSR will never leave this state, since the feedback function is linear. For the curious readers, an LFSR will have maximal period if its *generating polynomial* is a so-called *primitive polynomial* (I'm pretty sure this name will ring a bell for some of you guys, although maybe not as a happy memory :P).

Setting aside the study of these polynomials, which involves somewhat comlex maths (to my non-mathematician opinion 😆 ), an LFSR by itself should not be used as a *key stream* generator because its properties make it fairly predictable. In fact, given an *n* bit LFSR, obtaining *2n* bits of its output it is possible to recover the *generating polynomial* and be able to decrypt any subsequent text.

Therefore, LFSRs are not directly used in crypto, but they are generally used in one of these modes:

- Nonlinear combination of LFSRs: the output from several LFSRs is combined in a non-linear fashion to obtain a key stream.
- Nonlinear filter generator: the output is generated from a non-linear combination of the state.
- Clock-controlled generators: In this mode, several LFSRs step based on some rules, instead of stepping for every clock cycle.

With this kind of constructions it is possible to improve LFSR's properties for the creation of *secure* stream ciphers. And that's it for LFSRs from my side, for more information refer to these references:

*Handbook of Applied Cryptography*

## Crypto Series: Introduction to stream ciphers

Today we're gonna step a little further in our Crypto series. We'll see the main properties of the so-called *Stream Ciphers*, how they work and some things that should be taken into account when they are used.

Later in this series, we'll see how Linear Feedback Shift Registes (LFSR) work and we'll see one of the most used stream ciphers, together with an example of wrong usage.

Keep on reading to learn more about this class of ciphers.

## Crypto Series: DES and AES Visualization

Thanks to previous posts in this series we already know how the DES and AES ciphers work. Now we'll use ryptool to get a better understanding of these algorithms and see how every one of the described steps works.

To this end, we first launch Cryptool and then go to *Indiv. Procedures > Visualization of Algorithms > DES* to see the DES algorithm. We'll get a window which will show the different steps in the algorightm, together with navigation controls that we can move somewhere else if they are placed on top of part of the animation due to our screen resolution.

This image shows one of the steps in the DES visualization:

For AES we have two options. The first of them, which can be found in *Indiv. Procedures > Visualization of Algorithms > AES > Rijndael animation, *provides a Flash animation explaining every step in the AES cipher.

The second one, accessible through *Indiv. Procedures > Visualization of Algorithms > AES > Rijndael Inspector, *allows us to see the AES state in every intermediate step throughout the algorithm. This would allow us to check an hypothetical AES implementation:

As you can see, once again Cryptool proves itself as a valuable educational tool for Cryptography, both classical and modern algorithms.

## Understanding RHUL’s SSH attack

Last week a rumor about a *0 day* exploit on OpenSSH being actively exploited arose. In some places I found links to Bugtraq's note about the OpenSSH attack published in a paper of the Royal Holloway University of London back in November 2008.

The paper, Plaintext Recovery Attacks Against SSH, describes an attack which provides knowledge of **32 bits** from an **arbitrary** ciphertext block from an **SSH **connection when **CBC mode** is used. Personally, I didn't read the paper when it was published, I just took a quick look at it and I didn't feel like reading it completely.

However, when I saw the generated stir around this issue I thought it was time to take it again. And this post is the result: an attempt to explain how the attack described in this paper works.

**SSH Binary Packet Protocol**

SSH BPP is the protocol in charge of defining the binary packet structure of SSH, which supports the encrypted packets that conform an SSH connection. A data packet in an SSH connection is encoded as follows:

The *Length* field ( 4 bytes) indicates the size of the remainder of the packet. The *Padding length* field (1 byte) encodes the length of the final padding, which makes the packet a multiple of the block size. After this field, the message is added, and finally the padding, which has to be between 4 and 255 bytes of random data.

Two cryptographic operations are applied to this message: an encryption and a MAC. The MAC is computed over a sequence number which is never transmitted (it is kept by the two ends of the communication) concatenated with the original message depicted above. The encryption is applied to the original message only.

Therefore, when SSH recieves a packet, the first thing it has to do is decrypting the first ciphertext block to obtain the message length. Then, it waits to decrypt as many bytes as the packet indicates, in order to decrypt them and check its integrity with the MAC, which provides message authentication and protects your SSH sessions against modification while they are traveling from client to server.

**SSH BPP's problem**

** **

The problem with SSH BPP described in the paper is a consequence of the combination of several factors. On one side, we have that OpenSSH returns different error messages for different situations: when the decrypted length does not pass some *sanity checks* (e.g. it is not a multiple of the block size) and when the computed MAC does not match with the valid one. Therefore, observing the error messages one can obtain information on what caused the interruption of the connection.

On the other hand, we have that the first block indicates how many bytes the server waits for before calculating the complete MAC. Therefore, assuming the *sanity checks* over the message length are passed, an attacker could inject a data block, and then inject block by block until the server says "*Hey, wait, the MAC does not match!".*

At that point, the attacker knows that after decrypting the injected block with the server's key, the result contains in its right-most 32 bits a value equal to the number of blocks injected in the connection.

But now the complicated part starts, because the attacker needs to link these obtained 32 bits with the 32 bits that the original packet held. If the packet was encrypted with a completely unrelated key, then knowing that decrypting it under a different key gives a certain value provides basically no information on the original message.

But here CBC mode comes to play. We already know from our previous block ciphers post that this mode performs an XOR of the previous encrypted block with the current plaintext block before encrypting it with a fixed key.

Let's assume we want to obtain data from a previous packet , which is part of the current SSH connection. After decrypting it we would have:

But when we inject it into the connection, assuming the previous ciphertext block is known, , then the SSH server will compute this:

So, it will decrypt the injected block, and will XOR it with the previous block. This result will be regarded as the first block of a packet, and therefore its initial 32 bits will tell the server the packet length.

Thus, we can start injecting new blocks and observing the reaction of the SSH server. Once it returns a MAC error, this means that the initial 32 bits of contain the number of bytes we injected so far.

Further, from the previous two equations we can obtain the following relation:

Where the left-most 32 bits of all values at the right side of the equation are known, and the value at the left side of the equation is what we wanted to obtain. In this way we can get to know the left-most 32 bits of the target block.

**Implications of this attack**

So, we know how the attack works... now it's time for asking ourselves whether we should be worried about it or not. In principle, the attack is not too complex and allows the retrieval of 32 arbitrary bites of an SSH connection with a probability of . This probability comes from the conditions that need to be satisfied in order to pass the sanity checks after decrypting the injected block.

This means that, on average, an attacker would succeed one out of 262.000 times, forcing the client to reconnect so many times to the server. I'm pretty sure anyone would be tired after 3 consecutive attemps and would think that something is wrong ;-).

Further, the attacker needs to perform a *man in the middle* to be able to inject data into the connection. In local area networks it's not a big deal, but over the Internet it gets more complicated.

And even then, an attacker would have 32 bits of data, 4 ASCII characters of your password... which I hope has some more than that :D. Therefore, in my opinion the attack does not have practical implications, although it's always good to update to a patched version and/or use a mode different than CBC.

What this attack actually does is contributing as an interesting example of how important details are in crypto applications and protocols. If the protocol would not depend on a length field which needs to be decrypted for waiting such a number of bytes, or would not use CBC mode or simply would not return additional information about errors (simply closing the connection indicating that *something *was wrong, regardless of what this *something* is), none of this would be possible.

## Requirements for (secure) Electronic Voting

Some time ago kuasar told me about an initiative named *Partido de Internet* (PdI for short). The idea is to create a *political party* which would vote every law proposition and every initiative in the parliament depending on the results of an individual electronic election. So, affilates would vote in a per-initiative basis through the Internet, and the representatives of the PdI in the parliament would vote according to the results.

Setting aside the political implications, whether I (or you) think this is a good idea or not, I want to talk here about electronic voting. As soon as she mentioned it to me, I started thinking *"Wait... this is not so easy. You know, we want elections to be secure, one doesn't want everyone to know what he voted, but he wants his vote to be counted in the right way... there are several requirements which are not so easy to meet"*.

So I asked her about how would they implement it... and the answer was that probably using the e-DNI, the spanish electronic id card, which of course provides digital signatures. Yeah, that's right, you could use such a device to implement an electronic voting scheme... but there is quite a lot of thinking involved in order to make it right!

Let's start here a series of posts for brainstorming about e-voting. I'll start setting up what I (and the literature I have from last year's subjects about crypto protocols 😉 ) think an electronic voting scheme needs to provide. These are the requirements I can see, but maybe you can think of some other so feel free to comment on it!

**Privacy or****Anonymity**: A voter wants his vote to remain anonymous. There should be absolutely no way to link a voter with its vote, neither by other voters or by the election authorities.**Eligibility**: Only voters who are eligible for voting should be able to vote, and only once. A legitimate voter should not be able to cast two different votes, and of course a non-legitimate voter should not be able to vote.**Fariness**: The results can only be obtained at the end of the elections... so that other voters cannot be influenced.**Verifiability:**The outcome of the elections needs to be fair, i.e. the results have to be equal to the votes casted by the voters.**Individual verifiability:**An individual voter can verify that his vote was actually counted.**Receipt free:**A voter cannot prove that he voted for a given party. This way one cannot be coerced to vote for a given party.

Some of these requirements are more important than others, some of them are really required for any fair electronic voting system that one can think of, and others are just desirable. In subsequent posts we will look at some electronic voting protocols and try to see which requirements they meet.

Before finishing, one thing is clear: meeting the requirements is not trivial. For instance, vote privacy and verifiability seem to pose a contradiction... how come one can only vote if he is eligible to it, but a vote cannot be revealed to anyone? We'll see some ways of doing it in some time ;-).

## Crypto Series: Advanced Encryption Standard

Last time I wrote about the DES cipher, so today (yes, you guessed it) I'm writing about how the AES works. AES was created as a result of an open contest proposed by the NIST. In 1997, the NIST announced their wish to have a new encryption standard which would substitute the *Data Encryption Standard** *and was to be called AES: Advanced Encryption Standard.

Several researchers submitted their proposals to the AES contest, but the winning candidate was the so called Rijndael cipher. This cipher was originally created by two Belgian cryptographers, Joan Daemen and Vincent Rijmen, who submitted it to the AES selection process.

The other finalists were Twofish (Bruce Schneier and others), Serpent (Ross Anderson and others), MARS (the team included Don Coppersmit) and RC6 (Ron Rivest [the R in RSA :-p ] and others).

After the contest, the NIST published AES as a FIPS standard, and since then the AES cipher has been extensively used and analyzed. In the remaining of this post we see how AES works, as we did with DES in the previous post.

NOTE: Just as in previous entry, images are taken from Wikipedia. Let me know if you try to read the post and they don't work anymore.

## Crypto Series: Block Ciphers – Data Encryption Standard (DES)

The Data Encryption Standard ( DES ) was designed by IBM in 1973 as a submit for a call for proposals by the National Bureau of Standards of the United States. There was some controversy regarding to the involvement of the NSA in the development of the cipher, especially to the *mysterious* S-boxes and the reduced key size used, but years later it was shown that the S-boxes used where more resistant to Differential Cryptanalysis than if they had been selected at random.

The algorithm was approved as a FIPS standard in 1976, and revised up to three times in 1988,1993 and 1999. The last revision FIPS-46-3 describes the 3DES extension as a method to enlarge the key size of the DES cipher by using 3 DES operations in a row, encrypting the first time, decrypting the second time, and encrypting again the third time. This was done in order to withstand an efficient brute force attack published in 1998.

After the break (click *Read more!*) we'll see how it works and the main components of the algorithm.

NOTE: All images in this post are directly linked to Wikipedia. If the images are not visible anymore, let me know in the comments and I'll post my own version of the images.