This is my keylogger. There are very few like it – this one is mine.
Over the past few weeks (and on and off before that), I have done some work on electromagnetic keylogging – this has met with some success. This post will document my progress so far, and collate some thoughts on future work in this area. All the code can be downloaded at my Github – due to the extreme work-in-progress nature of this project, I recommend you review the available branches before experimenting (or get in touch and let’s work together!)
To explain this keylogger’s operation, we must look into the actual operation of a keyboard. A modern keyboard is comprised primarily of two sheets of flexible plastic, with conductive traces printed on them. The traces on the two sheets of plastic intersect at locations corresponding to the physical keys – when a key is pressed, a contact is made between the two plastic sheets, allowing a conductive path to form between a row and a column of the scan matrix.
A microcontroller scans the matrix by setting each pin in a column to high for a period of time, and monitoring the voltage of each of the rows:
Viewed through another lens, a microcontroller activates one of num_columns antennas intermittently, signalling that it is scanning. Previously, Martin Vuagnoux and Sylvain Pasini exploited this property to determine the presence of keystrokes via delays in the scan cycle caused by the microcontroller buffering a USB packet or creating a PS2 message (not sure of the name of these). In my experimentation, I was unable to reliably observe this delay, but I was able to (visually!) observe the difference in magnitude of leaked energy – this was enough to perform imperfect keylogging on a target with training data (and presumably, without).
To measure and correlate my attacks, I built some simple magnetic pickups – these were simply lengths of wire on a cardboard plate, with some aluminum foil noise shielding on one side (there is insufficient testing to determine how effective this shielding is), as follows:
Some manual analysis was performed to determine the trigger threshold, sample rate and sample count for a keystroke, and some signal processing was done to align the measured signals – from there, two primary paths of correlation became apparent:
- In circumstances where a high current flowed between the column and the row (creating a large magnetic spike), I could isolate only the single spike corresponding to a single row-column scan event, and correlate the magnetic field generated against test data.
- In circumstances where not enough current flowed to reliably trigger on a single column-row pair, it appears possible to continuously capture signal, align the captured signals in a known way and correlate that against training data.
In some cases, feature extraction and signal processing techniques can be used to clean up the captured signal to reduce background noise: though I found when capturing single row-column scan events, I had to be careful to avoid information loss. Continuous visualization with matplotlib helped here.
A further improvement was to increase the fidelity of the information I captured, using two coils instead of one. Triggering was still done on a single coil (this was reliable enough on a test keyboard), but correlation was performed on both channels of data. This was enough to correctly capture words – though with any such approach, particularly under the time-limited sampling constraint of typing, the results are imperfect, but a limited test case (continuous recapturing of training data is continuously a pain in the fucking ass) shows promise:
The arrays above represent the “key guess”, and a weighted sum of correlation coefficients of scan events vs training data (“correct” hits with >0.9 correlation are weighted more heavily).
Observe further the below two two-channel signal plots representing a full scan cycle, the first with the “J” key held down during capture:
And a second, without:
The difference is already as clear as day, and is consistent across samples, some trivial signal analysis can clean up the traces and do the necessary correlation to identify the key.
A “Combined Arms” Approach
In truth, the core of this project is not in the electromagnetic pickup – the principle of signal correlation should be applicable to almost any side channel, with some interpretation where appropriate. Most empowering perhaps is the potential to use different inputs to solve the detection vs identification problem referred to by John Monaco in his work, “SoK: Keylogging Side Channels“.
Particularly in cases where a keypress does not generate a significant enough event to trigger off alone, a secondary channel, such as sound or vibration picked up via a smartphone, could allow us to reduce the amount of data we need to sample. Alternatively, a “SAD Match” trigger (as per the ChipWhisperer) could feasibly be used, though greatly increasing the cost of the equipment required for the attack, ChipWhisperer or no.
I am also keen on investigating the potential interpretation of this signal in the frequency domain – if I am able to reduce the effective sample rate (currently sampling at ~125MSPS, requiring the use of an oscilloscope at the very least), perhaps by using higher-frequency harmonics, it may be possible to greatly reduce the cost of this attack. Most importantly, I am keen to explore the fundamental relationship between the fundamental frequency and it’s harmonics in a way that it may be applicable to other “loud” electromagnetic leakage sources.
I am excited by the progress made in this project, and as my understanding of the theory behind this class of attack improves, the vast horizons opening up ahead; a warm reminder of the heady days of my youth, learning for the first time that impacting a user’s view of a web application was known as “cross-site scripting”. If you are interested in working together on this, please do leave a note – I am most keen to keep exploring this area.