Fault injection attacks on microcontrollers: clock glitching tutorial

There is a lot of published papers with information about practical attacks using glitching on cryptographic devices or embedded systems in general. These papers are usually detailed in the process of glitching but not in the setup they use to inject the glitches. They just say at most what kind of FPGA (or commercial station) are using and what “glitching capabilities” they get (frequency, resolution, etc).

If you look for schematic and code to replicate the attacks on these papers, you will not find too much. Almost nothing is published so the reader might think that glitching is something complicated and not easily to perform without specialized and expensive equipment so a false illusion of security against these attacks is perceived.

However, the truth is that glitching can be done with simple and cheap hardware as it has already shown, for example, with the XBOX360 glitch hack or the unloopers that jeopardized the pay-tv smartcards in the mid-00’s.

Today I am going to show you how to clock-glitch for less than 15$ on equipment.

Note: Giant is as far as I know the only published tool useful not only for glitching but also for side-channel analysis. Besides being very affordable compared to other options, the estimated 300€ cost can dissuade some people to play with it. It also features a FPGA that can also be a barrier for people not used to VHDL.

WHAT IS A FAULT INJECTION ATTACK?

A fault injection attack is a procedure to maliciously introduce an error in a computing device in order to alter the software execution. Two effects are mainly wanted with the fault injection: avoid the execution of an instruction and corrupt the data the processor is working with. These are often used to compromise the security of embedded devices by bypassing security checks or leaking the private keys.

There are several types of Fault Injection Attacks. The most common and easier ones to implement are the clock and voltage glitching but there are many others like optical, EM, heat or radiation glitching.

CLOCK GLITCHING ATTACK

A clock glitch is a sudden increase of the system clock frequency for a very short moment.

In a digital integrated circuit (microcontroller, processor, FPGA…), the clock signal is not distributed evenly and doesn’t reach every point at the same time due to the diferences in the distribution paths like length, capacitance in traces and transistor gates, etc.

The maximum working frequency specified by the manufacturer is the one that assure that the clock signal going to reach every register properly. Working beyond that limit would make the IC not to operate properly and thus unusable. However, if we force the IC to work beyond that frequency limit only for the period of one instruction, we make  the CPU not to execute correctly just that single instruction but the next instructions are processed correctly once the clock is back to the normal frequency.

The most wanted effect  when “clock glitching” is to avoid the execution of an entire instruction. However, other effects can be achieved like load or store corrupted data. I recommend to read the paper “An In-depth and Black-box Characterization of the Effects of Clock Glitches on 8-bits MCUs” (Balasch, Gierlichs and Verbauwhede) that details the different effects of a clock glitching in an Atmel according to the lenght of the glitch.

The typical application of the clock glitch attack is to avoid the execution of a jump instruction modifying the program flow.
Have you ever cracked a computer software? Have you  changed a JE instruction for a JNE to bypass a protection? This is something very similar. By injecting the clock fault, we can modify “on-the-fly” the instruction to avoid to take the branch.

OUR SETUP

Thanks to their speed, most of the research and commercial stations use FPGAs to generate the glitches. A high frequency clock signal is divided by the FPGA to obtain the desired working frequency. When a glitch is needed, the division value is changed to obtain a faster clock. Modifying the signal path is possible add some delay to change the clock phase as needed.
I personally would suggest the Xilinx Spartan3 because it is cheap and features a Digital Clock Manager (DCM) module that allow to synthesize different clock frequencies and modify with precision their phase. There is no need to go for a bigger and faster FPGA, as the Spartan3 would give you glitches up to 150MHz, enough for attacking most of the microcontrollers

I am aware that not everybody know how to work with FPGAs, so I decided that for educational reasons I will use a microcontroller to show in this article how clock glitching works.

This host microcontroller will run a software that generates a clock signal by switching an output IO port in a constant working frequency. This generated signal is the clock used by the target system we are attacking.
To inject a glitch, we need the host microcontroller  to switch the IO port as fast as possible. Because the host microcontroller needs two instructions to generate a clock cycle (one instruction to set to 1 the port, another to set to 0), the maximum frequency of the generated clock used to feed the target is half the operating frequency of the host microcontroller.

To clarify this with an example, let’s suppose we are using as a host microcontroller a Microchip PIC 18F running at 32 MHz and 8 MIPS. The fastest clock that can generate is 4 MHz. The resolution or minimum period of the glitch  is  1/4MHz = 250ns and any glitch we generate can only have multiple of this minimum period (500 ns, 750ns, 1 us…).
As I said before, to inject successfully a clock fault we need to generate a glitch using a clock frequency faster than the maximum operating frequency of the target system. In fact, we probably will need to go even two or three times faster to be successful.

Is obvious than using this setup based on a PIC we are going nowhere as most of current microcontrollers work at least at 20MHz.

We need a fast microcontroller! Something that can generate at least a 40MHz clock signal. PICs, AVRs, MSP430… are not fast enough. We need an ARM Cortex!

In the past years several manufacturers like Texas Instruments, NXP or STM have been releasing cheap development boards to promote their ARM SOCs. Some of these boards have been very sucessful due to their aggresive prices and good performance. I will use one of these boards as a host system to generate the glitches. More exactly, I will use the NXP LPCXpresso board for LPC1769 (www.nxp.com/demoboard/OM13000.html) designed by Embedded Artist (http://www.embeddedartists.com/products/lpcxpresso/lpc1769_xpr.php).
For a short time this board was sold for 12€ as an introduction promotion but still can be bought for 20€.

The LPCxpresso board is powered with a LPC1769 Cortex M3 microcontroller running up to 120MHz. Thanks to the fast-GPIO bus that features the LPC1769, it is possible to generate a 60 MHz clock signal to feed the target. Thus, the glitch resolution is 1/60MHz = 16.66 ns

LPCxpresso board for LPC1769

LPCxpresso board for LPC1769

Other famous cheap board is the TIVA C Launchpad from Texas Instruments. It has a Cortex M4 but running at 80MHz, so only a 40MHz signal can be generated, probably enough to glitch 20MHz targets. The code I show here is easily adaptable to use this board or any other Cortex board.

Now we have to choose a target to attack.
I chose to experiment with my favourite 8-bits microcontrollers family, the Microchip PIC. Specifically I chose the PIC 16F88, but these experiments are repetible in other Microchip PIC microcontrollers (I tested successfully on PIC 12F675, 12F683, 16F84, 16F628, 16F648 and 16F876) and on many microcontrollers from other manufacturers (I also tested with success the attack on Atmel AVR).

The 16F88 operates according to the datasheet at a maximum frequency of 20MHz. The reality is that the microcontroller can go further and reach almost 30MHz under optimal conditions but the manufacturer keeps a safety margin in the specifications and round the maximum speed to 20MHz. In order to produce any effect on target when glitching, we need to work with glitches faster than those 30 MHz.

The PIC is connected to two leds that we are going to use to show in which state the PIC is and the clock input and reset signal are connected to two GPIO of our ARM board as shown in the schematic.

The ARM board is connected to two push buttons that will be used to trigger and launch the glitch.

Glitch tutorial setup - schematic

Glitch tutorial setup – schematic

Glitch tutorial setup

Glitch tutorial setup

THE SIMPLEST EXPERIMENT EVER

Let’s try a very simple experiment to understand the inner working of glitching.

Our target microcontroller is going to be trapped in an infinite loop and we will break that loop with a glitch.

This is the main part of the source code in our target PIC:

As you can observe, the PIC enters in a infinite loop in the instructions 8 and 16. The LED1 is on when microcontroller is the first infinite loop and LED2 is lit during the second infinite loop. The goal of this experiment is to bypass the GOTO loops with a clock glitch.

Note: you can download this and the rest of the sources of this article here
https://github.com/RamiroPareja/FaultInjectionTutorial

Remember! If you want to try this code and burn it in a PIC, be sure that you configure it to use the EC (external) oscillator, the MCLRE is ON, LVP is ON and disable the watchdog and the Brown-out reset.

The ARM firmware generate a 15 MHz clock signal to feed the target but when PB1 button is pressed, it generates a glitch of 4 cycles at 60 MHz.
Because we have to switch the GPIO very fast, at the limit of the microcontroller, there is no place for a C compiler. This routine has to be done in ASM.
There is no free cpu time to do the PB1 debouncing in software, so I use the second button (PB2) to “load” the glitch. PB2 prepares the glitch and PB1 injects it.

This is the core part of the ARM firmware, where the clock and the glitch is generated:

Observe how the CPU cycles are perfectly measured to generate a square clock signal with the desired frequency. If you modify the source, be sure to keep the correct timing adding or removing instructions that waste CPU cycles.

In the Cortex M3 architecture the NOP instruction doesn’t mean that the microcontroller has to wait one instruction cycle. The processor might remove under the NOP from the pipeline and no delay is produced.
To be sure that the processor waits one instruction cycle, I repeat the last GPIO access instead of using a NOP.
Most of the times, the glitch has to be just one cycle long. If we make it longer of one cycle, the CPU will not only ignore the execution of the desired instruction but also the next ones.However, because the PIC architecture uses 4 clock cycles to execute each instruction (except branches), I will use a four cycles length glitch.

This is a capture with the oscilloscope of the generated glitch. Sorry for the low resolution:

Clock glitch

Clock glitch

Note: because my oscilloscope has only 100MHz of bandwith, much less than the bandwith of a 60 MHz square signal, in this capture the clock has been slowed down 4 times.

As the target is executing all the time the same GOTO instruction, we can launch the glitch whenever we like because it is going to “hit” the GOTO instruction. No synchronization is needed. However, if the loop were executing some logic in every iteration, we should synchronize the glitch to be produced just in the moment the target is executing the GOTO. I will show this later.

This video shows the process of the attack:

At the beginning, the 16F88 is stuck in the first infinite loop. The green light indicates so.
When the left button is pressed, the clock glitch is injected and the PIC skips the branch instruction breaking the loop. The orange LED indicates that the PIC is now in the second infinite loop.
Pressing the right button prepares the glitch and the left one inject it once again. The green light shows that the attack has been sucessful  because we are again in the first infinite loop.

Around 0:09 a new attack is executed and the two LEDs are lit. This occurs because the glitch affected the GOTO instruction exiting the loop and then affected the BCF LED1 instruction, so the LED1 is not switched off.

As you can see in the video, not all the glitches are successful. Sometimes you need several tries before having a result. One reason for this is because the glitch is not synchronized in any way and is not being executed in the optimal moment. As we will see later, the best results are obtained if the glitch is synchronized to be injected in the third clock cycle of the GOTO instruction.

PLAYING WITH THE  VOLTAGE

The voltage can be crucial when injecting a clock glitch.

To show the effect of the voltage, I modified the system clock of the ARM board to run at 80 MHz instead of 120 MHz. The rest of the program is intact.
With this new configuration, the PIC is glitched with a 40 MHz (25 ns) instead of a 60 MHz signal.

At 5 volts, the attack is not working. However as I lower the voltage, I get more chances to glitch successfully the microcontroller. At 2.7 volts – much lower than the 4 volts recommended in the datasheet as minimum voltage – the attack works in almost every try.

The voltage affects to the propagation delay of the signals. Sometimes varying the voltage and pushing it to the lower limit can help in the attack. Be aware that some microcontrollers have their own internal voltage regulator, so they will not be affected by voltage changes (or at least not so much).

Note that we are talking about playing with the power voltage but  keeping it constant all the time, not about lowering the voltage during a short period of time (voltage glitch). I will talk about voltage glitching in another article.

Temperature is another externally controllable factor that can affect the propagation of the signals and help us with the glitches

 TO DO: Write about the physical reasons behind the voltage effect.

A MORE REALISTIC SCENARIO

The previous example was very simple. Let’s try again but using a more realistic scenario.

In this case, we are going to attack a supposed system that ask the user for a PIN code and after three tries, it is blocked and unusable. We are going to assume that the the system is already locked because there is no more tries left and we want to unlock it bypassing the check.

The number of available tries are stored in EEPROM. After the power up, the microcontroller check this number and if it is zero, it goes to sleep. We want to bypass the branch instruction that send the microcontroller to the sleep routine after checking that no more PIN tries are left.

Here is a extract of the target source:

Now we can’t launch the glitch randomly as we did in the previous example because if we do that, the odds of corrupting and crashing the system are high. Of course… we can do “trial and error” and pray to hit the branch instruction but the proper way to do it is synchronizing the host and the target to glitch exactly in the GOTO instruction. We need to know how many clock cycles pass from a recognible event (trigger) to the moment we need to inject the fault. As trigger we can choose a input event (e.g. pressed button or an input command on a bus) or an output event (e.g. a LED that is switched on).

In this case we are going to use as a trigger the reset signal generated by our host system. The host system will keep low the RESET signal of the 16F88 to put it in a reset state. Once the reset is released and the PIC start to run, the ARM has to count the number of cycles before executing the glitch attack.
To bypass the “tries check” we have to skip the execution of the GOTO GO_SLEEP instruction at the line 6. From the source of the target firmware  (the complete one, not the extract posted up there) we know the number of instructions from the beginning of the program to this GOTO. We have to wait 14 instruction cycles. That is 56 clock cycles.

After some experiments I came to the conclusion that in order to skip the GOTO instruction the best result occurs when a glitch is injected in the clock 3, 4 and 5 of the instruction (remember! a branch opcode in the PIC architecture needs 8 clock cycles to execute!), so we should wait 58 cycles and not 56.

The next source is an extract of the ARM code that inject a 3 cycles glitch after releasing the reset and waiting 58 clock cycles (3rd cycle of the GOTO instruction):

PB2 will put the target microcontroller in the reset state by holding low the reset line and pressing PB1 will execute the attack by waiting the indicated number of cycles (line 7) and clock glitching 3 cycles.

This attack would unlock the target system allowing us to try another PIN but in a real scenario we still had to bypass the PIN check and a second glitch attack should be executed.

Observe this capture of the generated glitch:

clock glitchThe glitch is injected 58 clock cycles after the reset.

Note: the long low period in the first clock cycle after the reset and the glitch is due to the duration variability of the branch instructions in the Cortex M3 architecture.

DETERMINING WHEN TO INJECT THE GLITCH

In the last example we had the original source that runs on the target so it is easy to determine when we have to inject the clock glitch. However, in the usual scenario the source is not available and it can’t be recovered from the target. We have two approaches here to try to glitch in the appropriate moment.

First of all, we can try a kind of “brute force”. We can inject the glitch in a particular clock cycle and check if something happens. If nothing occurs, we move to the next cycle and try again. At some point we will glitch the instruction we were looking for and see the expected effects.
We can also use an external event like a change of state in a I/O or a bus to narrow where our instruction of interest is located.

This method can be automatized and be very useful for a fast and simple vulnerability check. Unfortunately it can also be tedious and if our attack requires to skip two or more GOTOs instead of one, it could be impossible to complete in a reasonable time.

Another approach is to use power analysis techniques to determine which clock cycle has to be glitched.

I will write about this in another moment, but basically the idea is that if we measure the power consumed by the microcontroller, the execution of different instructions will produce different consumption patterns. Because – in most microcontroller architectures – the branch instruction needs to clean the execution pipeline and takes two instruction cycles, the “power signature” of a GOTO instruction is easy to discern from other instructions.

Also, if the processor is running on a infinite loop because a security check fails, the auto-correlation of the power signal will show how many instructions the loop is and can help to make a guess about where to glitch.