## Correlation Power Analysis vs AES

Recently, I spent some time learning about power analysis attacks. This has been an interesting journey with many interesting applications. The theory behind this is simple – that different operations (or – crucially – the same operation with different data) consume differing amounts of power, and that by measuring power consumption of a device over time, we can determine (with the help of some statistical trickery) what the device is doing.

All the code for the below, as well as the targets, are downloadable from github.com/CreateRemoteThread/fuckshitfuck. This code is provided as-is, and is a working copy – it may break unexpectedly.

# Setup / Acquiring Traces

The first step of our journey is to acquire power traces across a target device. To do this, I built a simple ATMega328p target board:

This is a standard ATMega328p with a voltage regulator running at 3.3v, and a bypass cap to help stabilize the power supply (this realistically didn’t do shit). UART is broken out for communications, and an debug header is present for editing firmware quickly. Two extra debug wires are present, used to trigger the oscilloscope (or simply provide a second informational channel).

Crucially, a 1.5 Ohm shunt resistor is present from the ground pins of the target to the power supply ground. This is used to measure current consumption by the microcontroller.

The board runs a very simple AES target – you can load a key with ‘k’, and encrypt data with ‘e’. When the ‘e’ command is sent, the following steps are taken in order:

1. a trigger line is increased to a logical high
2. AES encryption is performed with the currently loaded key
3. the trigger line is set to logical low
4. the result is printed to uart out

A single power trace looks something like this:

Along with the trace, we should optimally record the plaintext for the next step of this attack.

# Power Consumption Modelling

Our attack targets the power consumption of the first round, exploiting two fundamental properties (in the lack of mitigating factors):

• The first operation of AES is to xor the plaintext with the key, then use the result to index a known array (the s-box), forming an intermediate value. Therefore, as we known the plaintext, we can know this intermediate value for every possible key byte.
• We assume that more “1”‘s in the intermediate value means very slightly more power is consumed during the first round.

In isolation, this information is not immediately helpful – our measurement is not accruate enough to extract the key byte through analysis of a single trace. However, if we take, hypothetically, 3 traces, we can begin forming a hypothesis:

• The plaintext is assumed known
• If key[0] is byte 1,
• the intermediate value of key[0] for trace[0] is 0x01
• the intermediate value of key[0] for trace[1] is 0x11
• the intermediate value of key[0] for trace[2] is 0xFF
• Trace 2 will consume more power than trace 1, which consumes more power than trace 0
• If key[0] is byte 2,
• the intermediate value of key[0] for trace[0] is 0xFF
• the intermediate value of key[0] for trace[1] is 0x11
• the intermediate value of key[0] for trace[2] is 0x01
• Trace 0 will consume more power than trace 1, which consumes more power than trace 2
• A similar model is performed for every key[0] from 0 to 255
• We have taken 3 traces of the target with an unknown key. Whichever key guess’s hypothesis most strongly corresponds with the 3 actual traces with unknown key, is the key guess.
• This is inherently resistant to noise, as it’s a statistical attack – it’s not realistic to expect 100% correlation strength, but 60% correlation where everything else is 4% is an extremely strong indicator of correctness

This approach scales infinitely, limited only by time (and memory): the attacker may capture as many traces as he or she pleases.

# Correlating Traces

A mathematical tool, known as Pearson’s Correlation Coefficient, can be used to quantify how similar one set of traces is (or rather, it’s delta from the mean of all traces) to a hypothesized key byte’s set of traces. This is well expressed by Wikipedia:

It is quite simple to understand once expressed in code. With a little bit of Matplotlib, it is possible to graphically represent Pearson’s correlation coefficient:

Each line in the plot represents a byte of the key, with clear spikes, representing strong correlation, for one byte value (the correct guess) for each position of the key. Even from a low number of noise-tainted initial captures, the results are extremely clear: it is possible to use this method to extract an AES key from a set of power traces, having an encryption oracle.

The same technique is possible against the final round of AES (i.e. only requiring a known ciphertext), but I have not demonstrated this. This may be possible to execute against AES decryption, but I also have not demonstrated this – this may be useful in attacking MAC implementations.

In conclusion, this was a very interesting attack to be able to demonstrate, and the practical applications of this attack method are limitless. I will follow up with a post on a related attack, differential power analysis.