Over the past few days, I have spent some time reviewing side-channel analysis of the Keeloq cipher. This post documents the results of this review.
This work is not “new” per se: various forms of side-channel attack against Keeloq have been carried out in the past. Some public resources around this are below:
In addition, the 25C3 presentation on breaking Keeloq is worth watching, as is (albeit slightly beyond the scope of this post) this presentation at RuhrSec 2017.
To begin with, let’s take an overview of the Keeloq algorithm:
Put simply, Keeloq “mixes in” a single bit of key material from each round – therefore, in theory, we should be able to use this as a distinguisher for a simple DPA attack, as set forth in marc-invalid’s work.
To start, I took an HCS301-based entry system from Aliexpress. Opening a remote, I was able to locate the HCS301 chip, which I could extract with a little bit of hot air. I placed a 10 Ohm shunt in the VCC line, wrote a simple AVR trigger, and reviewed my initial power trace:
On the trace, some “blocks” are clearly visible, including one in the middle which appears to be a cryptographic block. The RF synchronisation pulses can be seen on the right: unfortunately, this revealed nothing. A tip from the NewAE forums pointed me to using a larger shunt resistor, which together with using the LC filter from the ChipWhisperer’s CW308 board did the trick, and gave me clean traces:
This looks much more like the power trace in the 2008 paper, and from here, I can make a guess at the location of the cryptographic block. Note that since 2008, this chip completes the Keeloq encryption a bit quicker, and is finished in <20ms from the trigger point, suggesting some implementation change.
Onwards, to Key Recovery!
I initially attempted to go down the marc-invalid path of identifying the ciphertext output via a TVLA test, but this didn’t produce the expected correlation peaks (TVLA against any bit of ciphertext output):
No matter – we can identify that our power trace has the same general layout as the 2008 paper, albeit with some timing differences, so let’s assume that the encryption operation is in the same general place:
We begin our exploration by performing a CPA attack against the Hamming distance of the last intermediate value of Keeloq to the final value, in the final round of decryption with a guessed key but (LSB of the key guess below, turns out setting the maximum key guess value to 2 breaks reporting functionality elsewhere):
Unfortunately,the raw difference in correlation value isn’t that huge, and this introduces the additional complexity of having to backtrack when a bit is guessed incorrectly (in the 2008 paper, this is when all candidates for the next key guessing round show low correlation values). However, we can note that given the round-based nature of Keeloq, the Hamming distance between a “guessed” intermediate value and the actual intermediate value should broaden: that is, the error from a single incorrect guessed bit of Keeloq should cascade through subsequent rounds, resulting in a different intermediate values (and different Hamming distance in the state register) over the next few rounds.
In truth, the actual difference is tiny, but we should be able to correlate this with known ciphertexts and known intermediate values to make this difference more significant.
We can capitalise on this by grouping Keeloq rounds into blocks of 8 bits of each, and perform a correlation analysis, correctly picking the actual key against a reprogrammed test chip correctly, with a stronger degree of confidence:
This result is backed up by continued success on the second round:
We can further confirm this across a variety of independent traces, further rounds, against the VCC line and across a ground shunt, etc.
Interestingly, it is possible to get some TVLA result on intermediate values that lends credibility (imo) to the hypothesis that the round-to-round Hamming distance “cascades” with previous rounds.
Firstly, a TVLA with incorrect partitioning (partitioned by intermediate value of first key byte, all bits incorrect):
And now, with four correct bits:
And now with correct partitioning:
Note that this doesn’t actually identify the specific leakage point (or rather, indicates that there is a discernable, statistically significant difference of means across a large part of a trace), but can give some indicating by comparison whether a Keeloq bit is correct. Frustratingly, DPA (using the same partitioning code as above) doesn’t seem to work:
The work is not yet complete, and there are still tooling improvements to make. This technique fails eventually, due to clock drift – the clock is inconsistent within a single trace, as well as between traces.
The marc-invalid fork of ChipWhisperer provides some clues to implementation, allowing the user to “slice” each trace and fit them onto a new trace, aligning each “slice” with the corresponding slices in other traces via their peak or minimum points – it’s my intention to implement something similar, and experiment with various peak extraction techniques.
There is also the matter of why DPA doesn’t work (and “fails hard”, not just ghost peaks). I wonder if it’s got to do with the fact that DPA is instantaneous (and the “raw” maximum difference of means is larger for an incorrect guess, than the correct guess).
All the code used above can be found here, and sample traces for Keeloq here (known key starts with 0x18). Note that you’ll need to edit support/attacks/Keeloq.py to enter fragments of the known key to replicate the above.