Limited Entropy Dot Com Not so random thoughts on security featured by Eloi Sanfèlix


HW Security: fault injection techniques

So you read my last post and were left wondering how the heck you would be able to inject temporary faults into hardware devices? Here is your answer 🙂

In that post, I explained how to extract keys from cipher implementations assuming we could somehow inject faults during the execution of the cipher. Besides DFA attacks, I also said we could achieve something similar to what we do with software protections (i.e. modifying control flow, bypassing checks, etc.) using fault injection techniques. I thought it was worth giving a few examples of how to inject faults in real hardware to complete the picture.

When hardware is designed, it is engineered to work under certain conditions of temperature, input voltage ranges, clock frequencies, etc. The hardware is tested under those conditions and is supposed to function in that range... and there are no guarantees that it will operate correctly if you bring it outside them.

I guess you follow my reasoning already 🙂 So if we want to inject faults into hardware, a good place to start looking is exactly in those gray areas around the operating conditions. Of course, we want the chip to be functioning properly most of the time, and we want it to fail at the precise moment at which it is computing something sensitive (say a secure boot check, or an RSA-CRT signature). Thus, we probably need to have some control over the timing, and inject the fault only temporarily.

In this post I am introducing from an intuitive perspective three ways of injecting faults: voltage, clock and laser/optical glitching.

Voltage glitching

The first example I want to touch on is that of voltage (or VCC) glitching. In this case, we typically run the chip at its nominal voltage (say 3.3V), and whenever we want to inject a fault, we drop voltage down to e.g. 1V.

Example of voltage glitching. The supply voltage is set to 0.8V during a short moment of time.

At this moment, the input voltage to certain gates within the chip will be too low due to the lack of supply voltage. Thus, these gates will receive an input voltage which is below the threshold that indicates whether the signal is a zero or a one, no matter what value it was supposed to be.

Then we increase the voltage again to the nominal voltage of 3.3V, and we have a functioning chip that just failed to execute one of its operations. For instance, it failed to execute a conditional jump and fell through to the code that we wanted to have executed.

The trick here is to find the proper parameters for the glitch: voltage drop (do we go to 0V, to 0.4V, to 1V ...), length of the glitch (a few nanoseconds, a few microseconds?) and the timing. Typically, if voltage drop and length of glitch are too small, the chip will function properly. If they are too large, it will just die (mute, reset, or maybe even physically damaged). Of course, if the timing is wrong then the attacker will never see the effects he wants to see.

As a protection against this kind of glitching, most modern smart cards and some embedded devices incorporate voltage sensors that detect whether the voltage went below a certain value or not. However, this attack is still effective against a wide range of products.

Clock glitching

Clock glitching is similar to VCC glitching in the sense that it affects another critical parameter of the chip that can be controlled by the attacker. In this case, what we do is injecting spurious clock cycles that are way shorter than the original clock cycle.

Example of clock glitching. A very short spurious clock cycle is inserted at the beginning of a normal cycle.

Since the internal logic of the chip operates based on its clock, a short clock cycle will trigger a new operation before the results of the previous one were completely computed or propagated through the device.

Imagine you have to multiply two values, and then add a third value to them. Normally, multiplying values takes longer than adding them up. Thus, the clock frequency for a chip that only performs these two operations would be long enough for the multiplication to occur and its result to be ready at the input of the next stage, since that is the critical operation.

Now, if I tell you to start adding up before you received the multiplication result, you will be using invalid (old?) data instead of the correct result. Thus, you will fail at computing the correct result.

Clock glitching exploits exactly that situation. Again, finding the right parameters in this case is the key to success.

As for hardware level protections, frequency sensors as well as using internally generated clocks (using on-die oscillators) are generally the most common ways to protect against clock glitching.

Additionally, fast clocks make these attacks less practical for attackers, since they need to inject even faster clock cycles and synchronize their attack at a higher speed.

This is why clock glitching is less effective nowadays: most high-end smart cards use their own on-die clocks, and embedded systems require much higher clock speeds.

Optical glitching

After clock and VCC glitching, we move to the real king of current fault injection attacks. Optical fault injection, or most commonly Laser fault injection, uses a light beam to inject faults into semiconductor devices.

How is this possible? Well, light (physicists, don't kill me!) basically consists of a number of photons carrying a certain amount of energy. Roughly, when these photons reach a semiconductor (typically silicon in electronic devices), their energy is absorbed by the semiconductor.

Given enough energy, electrons that would otherwise be still within the semiconductor will start to move, creating current. So, for our chips, this means that some of the transistors in the chip will actually switch when they should not!

The big difference between this fault injection technique and the previous ones is that in this case we actually have spatial selectivity (or resolution): we can choose which parts of the chip we attack by pointing the laser beam to them.

Of course, this is very powerful but at the same time it adds extra complexity to the attack: now you need to find the sensitive spots in the chip. As before, there are a number of parameters one needs to play with in order to successfully inject faults: glitch timing, glitch length, wavelength of the injected light and amount of energy injected.

Also, this attack is semi-invasive: we need to open up the chip package so that the light radiation can reach the die. Otherwise, the light will be blocked by the package or the plastic around the smart card die. Thus, this attack provides additional power at the cost of additional complexity, as usual.

In terms of hardware level protections, this is also the most difficult attack to prevent. Typically light sensors are scattered around the chip, but manufacturers cannot place sensors everywhere (that's expensive!) so there is always open spots.

At the end of the day, fault injection protection requires a combination of hardware and software prevention and detection mechanisms: typically sensors at the hardware level and double-checks and redundancy at the software side.

Due to the difficulty of completely preventing this kind of attacks, fault injection attacks are nowadays one of the main threats to secure hardware. Additionally, this difficulty together with the physical nature of the attacks also means that simulating them is typically not enough to assure appropriate protection levels, making fault injection testing key for secure hardware.

Posted by Eloi Sanfèlix