Effective Electromagnetic Keyboard Side-Channel on a (kinda) Budget

Update: a primitive GUI is now available to help remove the last false positives from signal matching. To use this, execute keyboards/classifier-read.py with the “–gui” argument to use this, code is in keyboards/ui.py.

This week, I made good progress on electromagnetic keylogging (in particular, distinguishing keystrokes from a high-current-leakage keyboard). Just as the traveller may miss the forest for the trees, so I was blinded by my own mistaken interpretation of the data at hand – depending on key, a series of unique peaks is identifiable *after* the initial strong electromagnetic leakage pulse signifying a closed circuit within the scan matrix.

I will not yet claim to remotely understand the physics behind what I presume is the decay of the magnetic field over time (and by extension, the way this phenomena translates into measured electrical signal over time) – but it is  enough to know the behavior exists, and how to apply it to recovering keystroke data.

Observe:

It is immediately apparent that the “unknown” signals of “0.npy” and “2.npy” are basically time-shifted versions of “p” – however, the initial pulse (the trigger) must be exempted for this to work. This is uniquely the case for every letter of the target keyboard I have tested thus far, forming an electromagnetic signature for each key, independent of position information from the scan matrix.

From here, the bulk of the work is in signal processing. I know little of the subject, but my stumbling about in Python has gotten a somewhat viable solution, as follows:

  • Firstly, extract the peaks from each signal. This is an inherently fuzzy problem, so we approximate by extracting signals from the 99.95 percentile. We only accept peaks which are a minimum distance apart (the maximum point in the proximity of an  identified peak). A set of this is saved as training data.
  • We perform an equivalent extraction against each identified key pulse in the “unknown” signal.
  • For each pairing of unknown signal and training data, we cull the longer signal to match the shorter signal, by discarding signal points of lowest absolute value. The assumption is that these peaks are swallowed by noise.
  • We “match” a key when the following conditions are met:
    • The position of the spikes is within a tolerance limit
    • The correlation coefficient of the values of both trace-peak sets is within a tolerance limit
    • The difference of absolute value of each equivalent peak-pair is within a tolerance limit (perhaps this and the time-position of each spike could be converted to a distance, and the entire thing fed to something starting with import tensorflow :P)

Of this, the peak extraction was the most finicky to get right – I ended up playing with a number of approaches before I landed on the current one, which seems fairly resistant to noise:

While imperfect (though I am semi-competent at Python, I am no data scientist), this generates extremely strong matches with relatively low false-positive and low false-negative rates. An improvement over previous matching efforts is that we can now generate true-negatives, where a sample simply matches nothing in our training dataset.

Still, p<unknown>ssw<unknown>rd really doesn’t leave too much to the imagination – and due to our ability to measure our certainty, for the instances where we’re unsure, we can lay the traces on top of one another like tracing paper and manually determine the best match. We can also improve this by saving all matches for a given unknown signal, and sieving a wordlist for common passwords / using language features to improve our signal matching.

During the investigation, I also explored whether unique frequency components were present in the decaying magnetic field, but I am currently unable to use a frequency-based interpretation to perform reliable matching. Observe, a matching signal overlaid on the correct training data in the frequency domain (via a power spectral density estimate using scipy.signal.welch):

Now, observe the same, though with an unmatched training data set:

This isn’t to rule out a frequency-based side-channel, simply that I don’t know how to do it – I am eager to explore other interpretations of the same data!

It is crucial to note that while conceptually reliable, this attack is severely affected by electromagnetic noise: a typist working by the light of the aurora should be all but immune to this. To successfully correlate training data, we must ensure there are no noise sources which can either cause extraneous peak detection, or missing peaks. Preferrably, training should be done with the same noise background as actual signal capture.

This attack is quite cheap to carry out, requiring only a readily available off-the-shelf USB oscilloscope (PicoScope 2206B) and two loops of thin wire. Best of all, due to per-keystroke triggering, no external storage is required for the attack. All the code required for the attack is available on Github – and sample data is available here. Enjoy!

Posted in Bards, Computers, Jesting | Leave a comment

Writeups – Elecduino, python_jail (NCTF)

This weekend, I participated in NCTF. This was a beginner CTF, and it was good both as a morale boost, particularly as other research wades into unexplored territory, as well as a refresher on basic skill. It is always a delight to taste a few fleeting moments of focus, against the lifeless static of life.

For completeness, I will present some of the challenges solved below:

Elecduino

This challenge was presented as a link to a Tinkercad project. The Tinkercad website presents a particularly chaotic Arduino Uno setup:

I’m not really sure what this is intended to do, but the code gives away the game – this is some manner of bit-banging on port 12, 10 and 4. We can create a C stub, replacing the GPIO activations with printfs, depending on the pin number activated.

This quickly bears fruit, revealing a short binary-encoded flag:

A little bit of wrangling with the flag format later, and the flag is ours.

python_jail

This challenge was presented as a host/port combination. On connecting, we are presented with the source code to a challenge:

This is a reasonably “light” jail (but a refreshingly Python3 one): only a few keywords are blocked. We can easily escape this, with the following payload:

To break this down a bit, we first use the Python “__builtins__” global (and for future reference, if __builtins__ is destroyed, try reload(__builtins__). __builtins__ doesn’t allow us to directly subscript a member, but we do this with __dir__, allowing us to get the exec function.

Once this is done, we exec a small secondary payload to import os, and then run shell commands – all the while taking care to avoid the blacklisted keywords in the challenge, giving us the flag.

Thankyou to the NCTF team for organising this event, I benefited from it. Condolences on the unexpected downtime halfway, and best wishes on any future events you organize. See you all in the Pwn2Win CTF next weekend.

Posted in Bards, Computers, Jesting | Leave a comment

Preliminary Notes on Electromagnetic Keylogging

This is my keylogger. There are very few like it – this one is mine.

Over the past few weeks (and on and off before that), I have done some work on electromagnetic keylogging – this has met with some success. This post will document my progress so far, and collate some thoughts on future work in this area. All the code can be downloaded at my Github – due to the extreme work-in-progress nature of this project, I recommend you review the available branches before experimenting (or get in touch and let’s work together!)

Background

To explain this keylogger’s operation, we must look into the actual operation of a keyboard. A modern keyboard is comprised primarily of two sheets of flexible plastic, with conductive traces printed on them. The traces on the two sheets of plastic intersect at locations corresponding to the physical keys – when a key is pressed, a contact is made between the two plastic sheets, allowing a conductive path to form between a row and a column of the scan matrix.

A microcontroller scans the matrix by setting each pin in a column to high for a period of time, and monitoring the voltage of each of the rows:

(credit: http://www.pykett.org.uk/picscannermidi.htm)

Viewed through another lens, a microcontroller activates one of num_columns antennas intermittently, signalling that it is scanning. Previously, Martin Vuagnoux and Sylvain Pasini exploited this property to determine the presence of keystrokes via delays in the scan cycle caused by the microcontroller buffering a USB packet or creating a PS2 message (not sure of the name of these). In my experimentation, I was unable to reliably observe this delay, but I was able to (visually!) observe the difference in magnitude of leaked energy – this was enough to perform imperfect keylogging on a target with training data (and presumably, without).

Implementation

To measure and correlate my attacks, I built some simple magnetic pickups – these were simply lengths of wire on a cardboard plate, with some aluminum foil noise shielding on one side (there is insufficient testing to determine how effective this shielding is), as follows:

Some manual analysis was performed to determine the trigger threshold, sample rate and sample count for a keystroke, and some signal processing was done to align the measured signals – from there, two primary paths of correlation became apparent:

  • In circumstances where a high current flowed between the column and the row (creating a large magnetic spike), I could isolate only the single spike corresponding to a single row-column scan event, and correlate the magnetic field generated against test data.
  • In circumstances where not enough current flowed to reliably trigger on a single column-row pair, it appears possible to continuously capture signal, align the captured signals in a known way and correlate that against training data.

In some cases, feature extraction and signal processing techniques can be used to clean up the captured signal to reduce background noise: though I found when capturing single row-column scan events, I had to be careful to avoid information loss. Continuous visualization with matplotlib helped here.

A further improvement was to increase the fidelity of the information I captured, using two coils instead of one. Triggering was still done on a single coil (this was reliable enough on a test keyboard), but correlation was performed on both channels of data. This was enough to correctly capture words – though with any such approach, particularly under the time-limited sampling constraint of typing, the results are imperfect, but a limited test case (continuous recapturing of training data is continuously a pain in the fucking ass) shows promise:

The arrays above represent the “key guess”, and a weighted sum of correlation coefficients of scan events vs training data (“correct” hits with >0.9 correlation are weighted more heavily).

Observe further the below two two-channel signal plots representing a full scan cycle, the first with the “J” key held down during capture:

And a second, without:

The difference is already as clear as day, and is consistent across samples, some trivial signal analysis can clean up the traces and do the necessary correlation to identify the key.

A “Combined Arms” Approach

In truth, the core of this project is not in the electromagnetic pickup – the principle of signal correlation should be applicable to almost any side channel, with some interpretation where appropriate. Most empowering perhaps is the potential to use different inputs to solve the detection vs identification problem referred to by John Monaco in his work, “SoK: Keylogging Side Channels“.

Particularly in cases where a keypress does not generate a significant enough event to trigger off alone, a secondary channel, such as sound or vibration picked up via a smartphone, could allow us to reduce the amount of data we need to sample. Alternatively, a “SAD Match” trigger (as per the ChipWhisperer) could feasibly be used, though greatly increasing the cost of the equipment required for the attack, ChipWhisperer or no.

I am also keen on investigating the potential interpretation of this signal in the frequency domain – if I am able to reduce the effective sample rate (currently sampling at ~125MSPS, requiring the use of an oscilloscope at the very least), perhaps by using higher-frequency harmonics, it may be possible to greatly reduce the cost of this attack. Most importantly, I am keen to explore the fundamental relationship between the fundamental frequency and it’s harmonics in a way that it may be applicable to other “loud” electromagnetic leakage sources.

I am excited by the progress made in this project, and as my understanding of the theory behind this class of attack improves, the vast horizons opening up ahead; a warm reminder of the heady days of my youth, learning for the first time that impacting a user’s view of a web application was known as “cross-site scripting”. If you are interested in working together on this, please do leave a note – I am most keen to keep exploring this area.

Posted in Bards, Computers, Jesting | Leave a comment

Writeups – coffee_break (SECCON), cobol_otp (Hack.lu)

Over the past two weeks, I have spent some time diving back into CTF challenges. My skills have decayed over so long of not doing this (or the challenges could have gotten harder, but let’s get real, it’s skill decay), a timely reminder of the need to commit to deliberate practice.

Nevertheless, two of the challenges solved will be presented below.

coffee_break

This challenge is presented as a crypto challenge, with a Python encryption oracle, and an encrypted flag. You can download the original challenge here. On inspection, we note that the encrypt function is ultimately a trivial character substitution cipher: with this keyspace, it’s simply faster to brute force each character rather than wrangling a decryptor.

We then feed this intermediate decrypted value to AES decrypt, giving us the flag:

Thanks to the SECCON CTF team for organising this – I had the presence of mind to grab a few binaries for the road, and will hopefully have the opportunity to test myself against them as time goes on.

cobol_otp

This challenge was presented as a COBOL file and accompanying output, which you can download here and here. The goal was to work out what input was fed to the Cobol program to get the output.

From initial inspection, the actual encryption is just XOR, but the key is unknown. We start by using the flag format to derive the first five letters of the key (xor out to “flag{“). We can extend the key with zeros to work out likely key lengths, then tweak one character of the key at a time, based on likely words in the flag.

The solution is fairly simple, which you can download here.

Thankyou to the hack.lu team for organising this event. I’m a little frustrated by my inability to solve the no-risc-no-future challenge, stymied at the last minute by non-working shellcode (when I could have just used pwntools shellcraft shellcode) – I’ll chalk this up as a lesson learned.

Posted in Bards, Computers, Jesting | Leave a comment

Quick Note: WiFi Bash Bunny

This weekend, while sulking over my lack of forward progress in smartcard power analysis, I added WiFi control to a Bash Bunny I’ve got lying around. Somehow, Google turns up nothing on the topic (surely I’m not the only person to think of this), so I’ll place the steps to modify a Bash Bunny here. To do this, you’ll need:

  • Bash Bunny
  • ESP-01 module
  • A way to program the ESP-01 module
  • 2x 10K Resistors (not strictly needed, but incase you want field upgrades).

Cracking open a Bash Bunny reveals a fairly standard USB-dev-board piece of kit. The bulk of the case is empty space, held up by a block of gel:

UART is helpfully broken out, as well as a 3.3v power supply, so this should be an easy job. An initial UART connection revealed that this ran actual Debian, which is nice. I figure there’s probably some open source WiFi-to-serial bridges for backdooring routers and whatnot on Github, so I got to searching:

The most popular solution seems to be jeelabs/esp-link, which is way, way overengineered for what we need. About 5 minutes of Arduino later, and we’ve got a much more lightweight, single-file no-bells-and-whistles WiFi bridge, which you can download here.

Before we wire this up, we need to get an idea of power consumption. Plugging the Bash Bunny into a USB power meter shows a peak of approximately 200mA at 5V (and a running current of maybe 150mA). From this website, we can see that the ESP draws approximately 500mA, but at 3.3V. Ignoring loss from… random components, this is just enough to fit under the 500mA / 5V power draw permitted by modern USB.

Now, we flash the firmware to the ESP-01 and connect it to the Bash Bunny. We use a “permanent on” configuration with 2 pull-up resistors on RST and CH_PD. You can wire these up directly to VCC if you want, but the resistors are useful if you want to update the ESP firmware later. The final product looks like this:

There’s enough space to hide the entire setup inside the original case, if you prefer.

Plugging this in gives us a shell on the actual device. Note the local echo – you’ll need to turn this off in your client.

Enjoy!

Posted in Bards, Computers, Jesting | Leave a comment