This post is a follow-up to my last post regarding correlation power analysis (CPA). A second technique, differential power analysis (DPA), can also be used to analyze power traces to extract information.
The specific attack I will illustrate below is the “difference of means” attack. The code is again available at https://github.com/CreateRemoteThread/fuckshitfuck.
Unlike CPA, DPA relies on a mechanism called the distinguishing function. Put simply, this is a true-false hypothesis, used to separate power traces into two groups (i.e. the two groups are somehow distinguishable by power). For example, a distinguishing function might be
“during the first round of AES, the last bit of the s-box output / intermediate value is 1. at some time-point in the first round of AES. this causes a different amount of power to be consumed than if the value is 0.”
Now, using the power traces we have gathered, and knowing the plaintexts, we run the distinguishing function to separate power traces into two “buckets” for each given possible key.
Difference of Means
We then take the mean trace of each bucket, and subtract them from one another (and make all points of this “difference of means” trace an absolute value). Let’s pretend we’ve done this for all possible key values across 10000 traces. At this point, there are two outcomes:
- If our hypothesis for key is correct, there will be X traces in Group 1 where the distinguishing function is true, and Y traces in Group 2 where the distinguishing function is false.
- The mean of Group 1 will, at a time-point, consume more or less power than the mean of Group 2, representing the power used to move the final bit of the intermediate value into memory / register / etc.
- The “difference of means” will have a spike, representing the above time point.
- If our hypothesis for key is incorrect, the distinguishing function is effectively a random sorting function.
- The mean of Group 1 will be similar to the mean of Group 2
- The “difference of means” will not have a distinguishable spike.
We can then easily plot the greatest difference of means for a given distinguishing function, for each key hypothesis:
(The label on the Y axis is incorrect, it should be the “Maximum Difference of Means”). Each line in the plot represents a given byte position in the key – the x-axis represents the actual key guess from 0 to 255, and the y hypothesis represents the maximum point in the difference of means, when the distinguishing function is run for that key hypothesis.
It is clear that both of these attacks are incredibly powerful, and can extract information from minute differences in power consumption through statistical trickery. I look forward to exploring more of these attacks and related attack scenarios.