Trace Alignment!

Over the past week, I have progressed my work in power analysis of smartcards. By replacing my original power shunt (1.5R) with a higher resistance (4.7R) alternative, I’m now able to retrieve significantly higher resolution power traces, even at 1.8V. Through the lens of a software low pass filter, we can clearly see the rounds of some cryptographic operation (suspected AES, rest of trace snipped for brevity):

Unfortunately, this isn’t quite ready for further analysis:

I spent some time investigating the possible strategies for realigning the traces time-wise. Two approaches were identified:

  • Firstly, the Sum of Absolute Difference approach can be used. A “window” of traces is selected, and is shifted back and forward until the sum of absolute differences with a reference window is minimized. This is the approach taken by ChipWhisperer control software.
  • Alternatively, the window can be shifted back and forth until the correlation coefficient between it and a reference window is maximised, similar to a CPA key matching attack. This is probably the easier of the two approaches to independently implement, with numpy.corrcoef doing the heavy lifting for us

Both strategies should produce roughly the same results, though I suspect a Sum-of-Absolute-Difference match will result in more fine-grained control over exactly how much a trace can differ from a reference, though this may result in false negative matches when low-frequency fluctuations in power supply present themselves (aka why USB, why), causing a sum of absolute difference to exceed a chosen maximum threshold.

Both are relatively simple to implement, and the tooling is now available as part of the fuckshitfuck toolkit, as Note that this tool only implements “soft matching”: that is, it will use the selected matching strategy only up to a point, if it can’t find a match within a certain number of samples, it will discard the offending trace.

Window selection is also crucial, particularly when looking at cryptographic operations with repeating rounds that are similar in execution. For the below trace, I took a two-pass approach: firstly, I matched the “prefix” (second “column”) loosely, matching to a minimum correlation coefficient of 0.7. I then matched the third/fourth columns more tighly, with 0.95 minimum correlation – this managed to mostly (visual inspection) eliminate the temporal misalignment of the remaining traces.

Also, the tooling’s a bit slow (though it’s questionable whether this is a problem for a one-off preprocessing activity). Initial attempts at multi-threading the code met with failure, due to what I suspect is an IO bottleneck (in other words, a single huge memory mapped array of traces).

My attempts at massaging out a working Ki+OPc value from the smartcard’s MILENAGE (I suspect? I don’t think I can really “check” unless I go work for Gemalto or GCHQ), but thin rays of hope shine through the signal-to-noise ratio. Observe:

What exciting times we live in!

Posted in Bards, Computers, Jesting | Leave a comment

ESP-on-a-PMOD Runtime FPGA Configuration

Following on from my work yesterday, I needed something to quickly reconfigure the edge triggering FPGA (i.e. to use it as a generic smartcard trigger, if without support for true ISO7816 pattern triggering). My solution was to stick a PMOD connector onto an ESP-01 and just bit-bang it over a clocked serial line:

Simply put, the ESP-01 stands up a temporary wireless access network and a webserver. For all the shit-talking Arduino gets, I’d sure as fuck rather spend 15 minutes writing this in Arduino with HL functions than 3 days trying to debug why DHCP isn’t working.

The webserver serves three endpoints:

  • /, which displays a usage message
  • /configure?io=X&clk=Y, configuring the IO edge count and clock edge count
  • /program, which bit-bangs IO and CLK parameters to the FPGA

The ESP-FPGA communication is a simple fixed-length clocked signal, sampled on each rising edge of a dedicated clock line. It’s extremely slow, holding each signal for 40ms, but as a delay-tolerant configuration activity I don’t really care, an LED on the FPGA board is quickly repurposed to serve as a “programming indicator”.

A trivial Verilog state machine allows us to receive this incoming logic, and set the FPGA’s internal registers accordingly. For future reference and re-use:

always @(posedge prog_clk)
    if(prog_reset == 0)
        prog_shift <= 0;
        prog_wait_state <= 1;
        if(prog_wait_state == 1)
            if(prog_io == 1)
                IO_EDGE_TARGET <= 0;
                prog_state_io <= 1;
                prog_state_clk <= 0;
                CLK_EDGE_TARGET <= 0;
                prog_state_io <= 0;
                prog_state_clk <= 1;
            prog_wait_state <= 0;
            if(prog_state_io == 1)  // start shifting bits in
                IO_EDGE_TARGET <= IO_EDGE_TARGET + (prog_io << prog_shift);
            else if(prog_state_clk == 1)
                CLK_EDGE_TARGET <= CLK_EDGE_TARGET + (prog_io << prog_shift);
            prog_shift <= prog_shift + 1;

For now, the ESP will always take 32-bit arguments and send the full 32-bits, but there is room for later optimisation.

Posted in Bards, Computers, Jesting | Leave a comment

Adventures in ISO7816 “Smart” I/O Triggering

I recently wanted to build something to help me trigger off ISO7816 traffic. This led to a week of learning-through-failure, and this post talks through some of the learning experiences, and provides a solution for anyone else attempting to solve the same problem (not a true “smart” trigger, but at least something to land you in the right general place).

An initial design constraint for me was to not use a host emulation approach: while it would be simple to build a “smartcard proxy” which sent the appropriate trigger whenever I wanted, I think this would be extremely limiting in cases where the smartcard tries to verify the host via something like baud rate support or similar.

ISO7816 communication can be summarized as a synchronous one-line serial protocol with a non-standard baud rate that may change over the lifetime of a session (derived off the clock line). It can be decoded using UART, but you’ll need to experiment with baud rate to get a clear reading (the following traffic was at 149K).

Furthermore, there are quirks like the client will “echo” one byte of a command back to the reader, typically before arguments are provided, which may not be visible at first glance.

Upon initially facing the problem, I immediately entered a fit of madness, and decided I would use an ATMega168’s external interrupts to count IO edges. I suspect primarily wanted to do this:

This was a learning adventure in using external interrupts on the ATMega168/328, which I had not actually done before. In a nutshell, they can be used by configuring two registers:

  • EICRA, which configures when an interrupt should occur (rising edge, falling edge, any change).
  • EIMSK, which configures which interrupts are permitted

The actual interrupt routine is defined as a special function (in the below example, for interrupt 0).

ISR (INT0_vect)
PORTB ^= (1 << PINB1);
PORTD ^= (1 << PIND0);

While aesthetically pleasing in it’s own hot-glue-and-duct-tape way, the ATmega168 solution is unfortunately a bit too slow for this work. At 3.3V, we’re able to run at 8Mhz at best, which isn’t enough to do clock-cycle-level triggering on something running at just over >4Mhz. We can’t really use a higher voltage, otherwise at 5V, we’ll miss the 1.8V logic signal (and I didn’t have any level shifters handy, and a grand total of 3 2N3904’s left).

On rethinking the problem, I took the more sane approach of using an FPGA for this task. I started with a simple edge counter, which worked fine for static traffic – but I needed to be able to send somewhat variable traffic to the target for the task at hand. I dug out my old workhorse Arty board and a Saleae for debugging, and got to work:

I settled on a hybrid approach, where I first counted rising edges on the I/O line to get me “close”, then counted clock edges wherever variable data (but fixed-length data) was present. This can be represented by the following logic diagram:

This unfortuantely resulted in a stack of errors around the “multi-driven nets”. To debug this error, I could refer to the Schematic, under “Open Elaborated Design” on the navigation menu in Vivado 2018.3. This opens up a schematic representing which inputs drive which outputs:

A correct flow graph looks like this, with your inputs driving all outputs (i.e. connected left/right). Any outputs which are driven multiple times (which Vivado turns into driven once, and ignored) should stick out pretty quickly.

I tackled this hurdle by fixing my code to use scard_clk as a “Master External Clock” controlling the sampling of all other inputs, and then using state machine model to drive state transitions between waiting for IO edges and waiting for clock edges. Truly, FPGA programming is always a breath of fresh air and fresh thinking onto a problem.

The result is a nice, clean consistent trigger, down to the clock cycle (lines are CLK/IO/trigger):

15 minutes of SPA (what a fancy name for “looking at it”) later, and we are able to identify the 14 rounds of the first full-size software AES operation (in this case, testing of a supplied AUTN parameter as part of MILENAGE – what exciting times we live in!).

The code is available in the “x/” directory of As a future improvement, I’m keen to make a re-usable, on-the-fly configurable core (though this seems to be a rock-and-a-wierd-place choice between convenience vs overhead – I will study the chipwhisperer source code for clues).

I am keen to hear more about other people’s approaches to this problem – I am sure there are more elegant solutions out there. If you have a different implementation strategy, please do get in touch (or just comment below).

Posted in Bards, Computers, Jesting | Leave a comment

On “Hardware Hacking” Tools

I got to thinking this weekend – with the advent of one-click shopping, it’s incredibly easy to stack shiny tools which basically do the same thing… and then you always end up writing custom code to do something just slightly out of reach of existing tools.

Still, while it is convenient to have a variety of these tools available, it’s a good learning experience (and generally more productive) to write your own code to do something, once you’re done prototyping with a BusPirate or similar.

I want to provide some thoughts on the common tools available, as well as some unusual alternatives down the bottom. As always, the focus is actually hacking at the thing instead of what to type to make openocd work, so take the below with a appropriate serving of salt.

This post isn’t a dig at any of these tools. I respect the effort that has gone into their development and production. To each their own.


Price: $29.95 USD (Sparkfun)

The BusPirate is an FTDI USB controller attached to a PIC uC. Some custom firmware bit-bangs common protocols (it has to – it remaps the same GPIO’s depending on mode). Of note, the default flywire assembly (the test clips thingy) sucks: I’ve never used the BusPirate without a multimeter testing which test clip connects to which IO pin, every fucking time.

The firmware is pretty decent – my favourite feature is the ability to simply type in data to send via say, SPI: instead of writing code, you can simply use a menu-driven system to enter SPI mode, and type something like [0xFF 0x12 0x34] and it will send the bytes, and handle chip select (the angle brackets do this). An auxiliary pin you can manually toggle is always handy as well if you need to violate some specs.

Of note, the BusPirate has level shifting circuitry, allowing you to safely interface with a variety of targets.

GoodFET (Facedancer lol)

Price: $49.95 USD (Adafruit)

Just imagine the MAX3421 is a bunch of GPIO’s broken out.

The GoodFET (and it’s descendants) are based on an MSP430 controller, tied to an FTDI USB controller to handle host communications. The Facedancer ties the MSP430 to a MAX3421E USB controller (thus it’s role as USB swiss army knife). The MSP430 is loaded with a basic OS, and a number of “apps” baked into the firmware.

These “apps” communicate with things the MSP430 is attached to, sometimes containing logic, mostly a passthrough proxy. In the case of the Facedancer, the host sends data to the MSP430, which mostly passes them straight across to the MSP430 via it’s SPI interface, and grabs a reply. While not as easy to prototype on as the BusPirate, you can get this going in a few lines of Python and maybe a half-hour of reading datasheets.

The MSP430 is surprisingly pleasant to code for using free tools, and I managed to add some GPIO triggering without lightning the board, my laptop or my person on fire, and have it work the first try.


Price: $89 USD (Hak5)

The logical successor to the GoodFET, the GreatFET is implemented on a more up to date LPC core, with integrated USB capability, but otherwise offering the same general capability as the MSP430.

I don’t have one, so I haven’t played with the firmware – but if the GoodFET is any indication, the GreatFET should be just as excellent in terms of usability.


Price: From $45 USD (

You may notice that the Shikra is a surprisingly bare-bones device. Infact, only one IC is present on the device, an FTDI USB controller. That’s right, the Shikra is a FTDI breakout board, except with less pins broken out.

This becomes more obvious as you read the Xipiter page for how to use the device. To use the Shikra to dump an SPI Flash rom, you use the following command.

flashrom -p ft2232_spi:type=232H -r spidump.bin

Reading through the documentation some more, a small EEPROM is also available for configuration data (VID/PID, descriptor strings, etc).


Price: $196 USD (Converted from Euro, Lab401)

The HydraBus is an STM32 devkit – in my opinion, this is a beefier BusPirate (with support for more protocols), minus the level shifters. The USB controller is again on-board. This also has an SD card for storing data, though it doesn’t seem that easy to actually interface say, SPI operations, to SD card (without some custom firmware).

Again, a menu-driven firmware system is used (similar to the buspirate), but the menu is much, much larger here.

I have one, but I haven’t played with it (primarily because there’s too many alternatives), but STMCube can generally help kick-start development with STM* family microcontrollers if you want to start from scratch.

For a more in-depth review, take a look at this.

These tools undoubtedly serve their purpose, and the last time I needed to pass an SPI command to a target IC, I reached for a buspirate instead of opening Atmel Studio (or insert tool of choice here), same as you.

Now, with that out of the way, let’s take a look at some alternative options…

FT2232 Mini Module

Price: $27 (Digikey)

Basically a Shikra, with more pins broken out. Anything you can do with the Shikra, you can do with this, and with less worry about damage to your USB connector because you can use a regular USB cable.


Price: $less-than-a-coffee

Another alternative is to simply use a microcontroller – the ATMega328p is my go-to out of familiarity, particularly when you need a project to have a limited amount of smarts (e.g. “send this thing via SPI, check the results pass this ruleset, beep at me if it doesn’t, otherwise perform logic X, loop”).

With a $10 spare parts ZIF programming jig – or an Arduino board – and a library of sample code in nice, familiar C, you can be up and running in minutes. The bare minimum circuitry is (I think) one resistor for the reset pullup – you can run this off an internal clock as well as a crystal, configurable via fuses.

While this doesn’t have built-in USB support, it has UART, making it perfect for interfacing with other tools (e.g. a chipwhisperer front-end).

PSoC Dev Kit

Price: $17 USD (rs-online, CY8CKIT-059 variant)

The PSoC is a unique line of microcontrollers – you can think of them as a microcontroller ring-fenced by an CPLD, In effect, this lets you create logical functionality (like UART), then arbitrarily map the I/O to any compatible physical pin. The two are then independent – if you want to remap the pins later, you can via the PSoC Creator IDE.

Unfortunately, the software is a bit clunky, and the autogenerated code for logical functionality can be a bit special (in that you need to work with PSoC a bit to learn how these things are named, and after that it’s fine).

FX3 SuperSpeed USB Development Kit

Price: $48 USD (rs-online)

Reading material: Toolset seems Windows-centric.

Potentially the best until last – Cypress sells this as a USB3 development kit, but this seems a bit… fancy for a USB controller, isn’t it? Flip it over, and you discover the pleasant surprise of a fully-featured 32-bit 200MHz ARM9 core.

Multiple power domains are available (unsure how flexible, this is sourced from the datasheet) which should allow flexible interfacing to a range of targets, as well as DMA-based I/O if your name is CNLohr and make this a logic analyzer in defiance of convention.

You can even get addon boards for this development kit (!). For approximately the same price, you can get an expansion board with a Xilinx CPLD (CYUSB3ACC-007) if you want to offload some logic, or high-speed connector boards for both Xilinx and Altera boards, and something about a machine vision interface.

This all comes nice foam-padded magnet-clasp box. As an added bonus, you even get a USB3 controller thrown in you can use if you’re into that kind of thing.

I hope this helps someone choosing their next shiny to buy. If you’ve got thoughts on these products, or if I’ve missed a major feature, please do comment!

Posted in Bards, Computers, Jesting | Leave a comment

USB Descriptor Glitching with Facedancer21

Over the past week, I have been slowly progressing on glitching USB descriptors of a Trezor ONE. I have succeeded in creating a framework for this, and other USB-based glitching attacks, and will document this work for future reference. As background reading, I advise that you review the following resources:


In a nutshell, this attack was performed using a Chipwhisperer back-end to insert a VCC glitch, a Facedancer to handle USB interfacing and some Python scripting to tie it all together.

The Facedancer is a GoodFET core (well, an MSP430 microcontroller) tied to a MAX3421 front-end over an SPI bus, and a FTDI UART IC for host communication. It’s originally designed to interface arbitrarily with USB, and has no support for glitch triggering – but it is easy to modify the MAXUSB app (goodfet/firmware/apps/usb/maxusb.c) to override any command to do what READ/WRITE does, but also pull a pin high.

As an aside: the FTDI protocol is quite simple, and leads me to wonder why more pins are not broken out on the Facedancer for integration with other projects (and in this case, freeing a laptop USB port by driving the UART directly from the ChipWhisperer). I may add these in future.

Note that this isn’t perfect – there’s still significant horizontal jitter between when the MSP430 tells the MAX3421 to send a USB packet, and when vulnerable code executes on the target – but it’s better than nothing.

The final Facedancer modification looks like this:

You’ll note an extra GPIO pin is broken out (second from the right, far right is trigger) – this controls a 2N7000 pin which resets the Trezor after every glitch insertion.

Finally, the Python driver code must be modified to support a variant form of USB transaction, where we simply call IN_Transfer continuously until it falls over or nothing is transmitted. This is described in scanlime’s glitching video, and is included in the source repository listed below.


One of the most significant challenges in this exercise was to correctly time the glitch. To an extent, this is impossible – due to natural jitter on both ends of the USB pipe, it’s impossible to accurately time any glitch perfectly, with a degree of reliability. Instead, my goal was to time the glitch so it landed in the right ballpark. This is compounded by the ChipWhisperer’s clock not being synchronised with the target’s internal PLL.

This is ultimately a small problem that can be solved with some logic analyzer time:

Of course, the actual parameters used will be different for you, depending on your clock generator configuration and USB sender setup.

Using a logic analyzer, I was able to also see when my glitches caused the device to reset, or enter an unrecoverable error state – this helped refine the size of my inserted glitches.

Note that it is also possible to generate glitches by affecting the output (at the other end, when a USB IN transfer occurs, but this (imo) is less likely to affect the actual integrity of the USB packet (just thinking logically, how would you implement it with a discrete USB controller – buffering it and waiting for the host to give you bandwidth is the only sane way to do it).

Target Preparation

Some work also needed to be done on the target, to prepare it for VCC glitching. Primarily, the decoupling capacitors need to be removed. If you are reading the datasheets for this, be careful that you count all the capacitors (including the VBAT ones) – these aren’t all located in one place on the official Trezor schematic.

An additional 2N7000 is used to allow the Facedancer to conveniently reset the device without power cycling the USB.

A power cleaning net is used to provide a relatively stable source of 3.3v power to the CPU: a pair of capacitors is used to clean the power supply (not perfect, but fuck knows it’s better than USB straight through a 3.3v regulator), and a small resistor (10 Ohms) is used to isolate the switching noise of the CPU so we’re better able to analyze our work with an oscilloscope.

Project Control

Finally, a Python script is used to tie together the Chipwhisperer and the script. This is similar to my other glitching experiments, and a copy of the code can be found on my Github. Note that no integration is available at the time of writing with my graphing / project review utilities, I’ll add these as time permits.

Note that you must build and update the Facedancer’s firmware if you want to use this the Python scripts make use of new commands which are not implemented by default.

Alternatively, Kate Temkin’s work might also be useful, particularly if you are using an alternative back-end to the MAX3421.

And with this, we sit back and wait for the glitches to appear:

Posted in Bards, Computers, Jesting | Leave a comment