ESP-on-a-PMOD Runtime FPGA Configuration

Following on from my work yesterday, I needed something to quickly reconfigure the edge triggering FPGA (i.e. to use it as a generic smartcard trigger, if without support for true ISO7816 pattern triggering). My solution was to stick a PMOD connector onto an ESP-01 and just bit-bang it over a clocked serial line:

Simply put, the ESP-01 stands up a temporary wireless access network and a webserver. For all the shit-talking Arduino gets, I’d sure as fuck rather spend 15 minutes writing this in Arduino with HL functions than 3 days trying to debug why DHCP isn’t working.

The webserver serves three endpoints:

  • /, which displays a usage message
  • /configure?io=X&clk=Y, configuring the IO edge count and clock edge count
  • /program, which bit-bangs IO and CLK parameters to the FPGA

The ESP-FPGA communication is a simple fixed-length clocked signal, sampled on each rising edge of a dedicated clock line. It’s extremely slow, holding each signal for 40ms, but as a delay-tolerant configuration activity I don’t really care, an LED on the FPGA board is quickly repurposed to serve as a “programming indicator”.

A trivial Verilog state machine allows us to receive this incoming logic, and set the FPGA’s internal registers accordingly. For future reference and re-use:

always @(posedge prog_clk)
begin
    if(prog_reset == 0)
    begin         
        prog_shift <= 0;
        prog_wait_state <= 1;
    end
    else
    begin
        if(prog_wait_state == 1)
        begin
            if(prog_io == 1)
            begin
                IO_EDGE_TARGET <= 0;
                prog_state_io <= 1;
                prog_state_clk <= 0;
            end
            else
            begin
                CLK_EDGE_TARGET <= 0;
                prog_state_io <= 0;
                prog_state_clk <= 1;
            end
            prog_wait_state <= 0;
        end
        else
        begin
            if(prog_state_io == 1)  // start shifting bits in
            begin
                IO_EDGE_TARGET <= IO_EDGE_TARGET + (prog_io << prog_shift);
            end
            else if(prog_state_clk == 1)
            begin
                CLK_EDGE_TARGET <= CLK_EDGE_TARGET + (prog_io << prog_shift);
            end
            prog_shift <= prog_shift + 1;
        end
    end
end

For now, the ESP will always take 32-bit arguments and send the full 32-bits, but there is room for later optimisation.

Posted in Uncategorized | Leave a comment

Adventures in ISO7816 “Smart” I/O Triggering

I recently wanted to build something to help me trigger off ISO7816 traffic. This led to a week of learning-through-failure, and this post talks through some of the learning experiences, and provides a solution for anyone else attempting to solve the same problem (not a true “smart” trigger, but at least something to land you in the right general place).

An initial design constraint for me was to not use a host emulation approach: while it would be simple to build a “smartcard proxy” which sent the appropriate trigger whenever I wanted, I think this would be extremely limiting in cases where the smartcard tries to verify the host via something like baud rate support or similar.

ISO7816 communication can be summarized as a synchronous one-line serial protocol with a non-standard baud rate that may change over the lifetime of a session (derived off the clock line). It can be decoded using UART, but you’ll need to experiment with baud rate to get a clear reading (the following traffic was at 149K).

Furthermore, there are quirks like the client will “echo” one byte of a command back to the reader, typically before arguments are provided, which may not be visible at first glance.

Upon initially facing the problem, I immediately entered a fit of madness, and decided I would use an ATMega168’s external interrupts to count IO edges. I suspect primarily wanted to do this:

This was a learning adventure in using external interrupts on the ATMega168/328, which I had not actually done before. In a nutshell, they can be used by configuring two registers:

  • EICRA, which configures when an interrupt should occur (rising edge, falling edge, any change).
  • EIMSK, which configures which interrupts are permitted

The actual interrupt routine is defined as a special function (in the below example, for interrupt 0).

ISR (INT0_vect)
{
PORTB ^= (1 << PINB1);
PORTD ^= (1 << PIND0);
}

While aesthetically pleasing in it’s own hot-glue-and-duct-tape way, the ATmega168 solution is unfortunately a bit too slow for this work. At 3.3V, we’re able to run at 8Mhz at best, which isn’t enough to do clock-cycle-level triggering on something running at just over >4Mhz. We can’t really use a higher voltage, otherwise at 5V, we’ll miss the 1.8V logic signal (and I didn’t have any level shifters handy, and a grand total of 3 2N3904’s left).

On rethinking the problem, I took the more sane approach of using an FPGA for this task. I started with a simple edge counter, which worked fine for static traffic – but I needed to be able to send somewhat variable traffic to the target for the task at hand. I dug out my old workhorse Arty board and a Saleae for debugging, and got to work:

I settled on a hybrid approach, where I first counted rising edges on the I/O line to get me “close”, then counted clock edges wherever variable data (but fixed-length data) was present. This can be represented by the following logic diagram:

This unfortuantely resulted in a stack of errors around the “multi-driven nets”. To debug this error, I could refer to the Schematic, under “Open Elaborated Design” on the navigation menu in Vivado 2018.3. This opens up a schematic representing which inputs drive which outputs:

A correct flow graph looks like this, with your inputs driving all outputs (i.e. connected left/right). Any outputs which are driven multiple times (which Vivado turns into driven once, and ignored) should stick out pretty quickly.

I tackled this hurdle by fixing my code to use scard_clk as a “Master External Clock” controlling the sampling of all other inputs, and then using state machine model to drive state transitions between waiting for IO edges and waiting for clock edges. Truly, FPGA programming is always a breath of fresh air and fresh thinking onto a problem.

The result is a nice, clean consistent trigger, down to the clock cycle (lines are CLK/IO/trigger):

15 minutes of SPA (what a fancy name for “looking at it”) later, and we are able to identify the 14 rounds of the first full-size software AES operation (in this case, testing of a supplied AUTN parameter as part of MILENAGE – what exciting times we live in!).

The code is available in the “x/” directory of github.com/CreateRemoteThread/fuckshitfuck. As a future improvement, I’m keen to make a re-usable, on-the-fly configurable core (though this seems to be a rock-and-a-wierd-place choice between convenience vs overhead – I will study the chipwhisperer source code for clues).

I am keen to hear more about other people’s approaches to this problem – I am sure there are more elegant solutions out there. If you have a different implementation strategy, please do get in touch (or just comment below).

Posted in Bards, Computers, Jesting | Leave a comment

On “Hardware Hacking” Tools

I got to thinking this weekend – with the advent of one-click shopping, it’s incredibly easy to stack shiny tools which basically do the same thing… and then you always end up writing custom code to do something just slightly out of reach of existing tools.

Still, while it is convenient to have a variety of these tools available, it’s a good learning experience (and generally more productive) to write your own code to do something, once you’re done prototyping with a BusPirate or similar.

I want to provide some thoughts on the common tools available, as well as some unusual alternatives down the bottom. As always, the focus is actually hacking at the thing instead of what to type to make openocd work, so take the below with a appropriate serving of salt.

This post isn’t a dig at any of these tools. I respect the effort that has gone into their development and production. To each their own.

BusPirate

Price: $29.95 USD (Sparkfun)

The BusPirate is an FTDI USB controller attached to a PIC uC. Some custom firmware bit-bangs common protocols (it has to – it remaps the same GPIO’s depending on mode). Of note, the default flywire assembly (the test clips thingy) sucks: I’ve never used the BusPirate without a multimeter testing which test clip connects to which IO pin, every fucking time.

The firmware is pretty decent – my favourite feature is the ability to simply type in data to send via say, SPI: instead of writing code, you can simply use a menu-driven system to enter SPI mode, and type something like [0xFF 0x12 0x34] and it will send the bytes, and handle chip select (the angle brackets do this). An auxiliary pin you can manually toggle is always handy as well if you need to violate some specs.

Of note, the BusPirate has level shifting circuitry, allowing you to safely interface with a variety of targets.

GoodFET (Facedancer lol)

Price: $49.95 USD (Adafruit)

Just imagine the MAX3421 is a bunch of GPIO’s broken out.

The GoodFET (and it’s descendants) are based on an MSP430 controller, tied to an FTDI USB controller to handle host communications. The Facedancer ties the MSP430 to a MAX3421E USB controller (thus it’s role as USB swiss army knife). The MSP430 is loaded with a basic OS, and a number of “apps” baked into the firmware.

These “apps” communicate with things the MSP430 is attached to, sometimes containing logic, mostly a passthrough proxy. In the case of the Facedancer, the host sends data to the MSP430, which mostly passes them straight across to the MSP430 via it’s SPI interface, and grabs a reply. While not as easy to prototype on as the BusPirate, you can get this going in a few lines of Python and maybe a half-hour of reading datasheets.

The MSP430 is surprisingly pleasant to code for using free tools, and I managed to add some GPIO triggering without lightning the board, my laptop or my person on fire, and have it work the first try.

GreatFET

Price: $89 USD (Hak5)

The logical successor to the GoodFET, the GreatFET is implemented on a more up to date LPC core, with integrated USB capability, but otherwise offering the same general capability as the MSP430.

I don’t have one, so I haven’t played with the firmware – but if the GoodFET is any indication, the GreatFET should be just as excellent in terms of usability.

Shikra

Price: From $45 USD (int3.cc)

You may notice that the Shikra is a surprisingly bare-bones device. Infact, only one IC is present on the device, an FTDI USB controller. That’s right, the Shikra is a FTDI breakout board, except with less pins broken out.

This becomes more obvious as you read the Xipiter page for how to use the device. To use the Shikra to dump an SPI Flash rom, you use the following command.

flashrom -p ft2232_spi:type=232H -r spidump.bin

Reading through the documentation some more, a small EEPROM is also available for configuration data (VID/PID, descriptor strings, etc).

HydraBus

Price: $196 USD (Converted from Euro, Lab401)

The HydraBus is an STM32 devkit – in my opinion, this is a beefier BusPirate (with support for more protocols), minus the level shifters. The USB controller is again on-board. This also has an SD card for storing data, though it doesn’t seem that easy to actually interface say, SPI operations, to SD card (without some custom firmware).

Again, a menu-driven firmware system is used (similar to the buspirate), but the menu is much, much larger here.

I have one, but I haven’t played with it (primarily because there’s too many alternatives), but STMCube can generally help kick-start development with STM* family microcontrollers if you want to start from scratch.

For a more in-depth review, take a look at this.

These tools undoubtedly serve their purpose, and the last time I needed to pass an SPI command to a target IC, I reached for a buspirate instead of opening Atmel Studio (or insert tool of choice here), same as you.

Now, with that out of the way, let’s take a look at some alternative options…

FT2232 Mini Module

Price: $27 (Digikey)

Basically a Shikra, with more pins broken out. Anything you can do with the Shikra, you can do with this, and with less worry about damage to your USB connector because you can use a regular USB cable.

ATMega328p

Price: $less-than-a-coffee

Another alternative is to simply use a microcontroller – the ATMega328p is my go-to out of familiarity, particularly when you need a project to have a limited amount of smarts (e.g. “send this thing via SPI, check the results pass this ruleset, beep at me if it doesn’t, otherwise perform logic X, loop”).

With a $10 spare parts ZIF programming jig – or an Arduino board – and a library of sample code in nice, familiar C, you can be up and running in minutes. The bare minimum circuitry is (I think) one resistor for the reset pullup – you can run this off an internal clock as well as a crystal, configurable via fuses.

While this doesn’t have built-in USB support, it has UART, making it perfect for interfacing with other tools (e.g. a chipwhisperer front-end).

PSoC Dev Kit

Price: $17 USD (rs-online, CY8CKIT-059 variant)

The PSoC is a unique line of microcontrollers – you can think of them as a microcontroller ring-fenced by an CPLD, In effect, this lets you create logical functionality (like UART), then arbitrarily map the I/O to any compatible physical pin. The two are then independent – if you want to remap the pins later, you can via the PSoC Creator IDE.

Unfortunately, the software is a bit clunky, and the autogenerated code for logical functionality can be a bit special (in that you need to work with PSoC a bit to learn how these things are named, and after that it’s fine).

FX3 SuperSpeed USB Development Kit

Price: $48 USD (rs-online)

Reading material: https://github.com/cnlohr/fx3fun. Toolset seems Windows-centric.

Potentially the best until last – Cypress sells this as a USB3 development kit, but this seems a bit… fancy for a USB controller, isn’t it? Flip it over, and you discover the pleasant surprise of a fully-featured 32-bit 200MHz ARM9 core.

Multiple power domains are available (unsure how flexible, this is sourced from the datasheet) which should allow flexible interfacing to a range of targets, as well as DMA-based I/O if your name is CNLohr and make this a logic analyzer in defiance of convention.

You can even get addon boards for this development kit (!). For approximately the same price, you can get an expansion board with a Xilinx CPLD (CYUSB3ACC-007) if you want to offload some logic, or high-speed connector boards for both Xilinx and Altera boards, and something about a machine vision interface.

This all comes nice foam-padded magnet-clasp box. As an added bonus, you even get a USB3 controller thrown in you can use if you’re into that kind of thing.

I hope this helps someone choosing their next shiny to buy. If you’ve got thoughts on these products, or if I’ve missed a major feature, please do comment!

Posted in Bards, Computers, Jesting | Leave a comment

USB Descriptor Glitching with Facedancer21

Over the past week, I have been slowly progressing on glitching USB descriptors of a Trezor ONE. I have succeeded in creating a framework for this, and other USB-based glitching attacks, and will document this work for future reference. As background reading, I advise that you review the following resources:

Overview

In a nutshell, this attack was performed using a Chipwhisperer back-end to insert a VCC glitch, a Facedancer to handle USB interfacing and some Python scripting to tie it all together.

The Facedancer is a GoodFET core (well, an MSP430 microcontroller) tied to a MAX3421 front-end over an SPI bus, and a FTDI UART IC for host communication. It’s originally designed to interface arbitrarily with USB, and has no support for glitch triggering – but it is easy to modify the MAXUSB app (goodfet/firmware/apps/usb/maxusb.c) to override any command to do what READ/WRITE does, but also pull a pin high.

As an aside: the FTDI protocol is quite simple, and leads me to wonder why more pins are not broken out on the Facedancer for integration with other projects (and in this case, freeing a laptop USB port by driving the UART directly from the ChipWhisperer). I may add these in future.

Note that this isn’t perfect – there’s still significant horizontal jitter between when the MSP430 tells the MAX3421 to send a USB packet, and when vulnerable code executes on the target – but it’s better than nothing.

The final Facedancer modification looks like this:

You’ll note an extra GPIO pin is broken out (second from the right, far right is trigger) – this controls a 2N7000 pin which resets the Trezor after every glitch insertion.

Finally, the Python driver code must be modified to support a variant form of USB transaction, where we simply call IN_Transfer continuously until it falls over or nothing is transmitted. This is described in scanlime’s glitching video, and is included in the source repository listed below.

Timing

One of the most significant challenges in this exercise was to correctly time the glitch. To an extent, this is impossible – due to natural jitter on both ends of the USB pipe, it’s impossible to accurately time any glitch perfectly, with a degree of reliability. Instead, my goal was to time the glitch so it landed in the right ballpark. This is compounded by the ChipWhisperer’s clock not being synchronised with the target’s internal PLL.

This is ultimately a small problem that can be solved with some logic analyzer time:

Of course, the actual parameters used will be different for you, depending on your clock generator configuration and USB sender setup.

Using a logic analyzer, I was able to also see when my glitches caused the device to reset, or enter an unrecoverable error state – this helped refine the size of my inserted glitches.

Note that it is also possible to generate glitches by affecting the output (at the other end, when a USB IN transfer occurs, but this (imo) is less likely to affect the actual integrity of the USB packet (just thinking logically, how would you implement it with a discrete USB controller – buffering it and waiting for the host to give you bandwidth is the only sane way to do it).

Target Preparation

Some work also needed to be done on the target, to prepare it for VCC glitching. Primarily, the decoupling capacitors need to be removed. If you are reading the datasheets for this, be careful that you count all the capacitors (including the VBAT ones) – these aren’t all located in one place on the official Trezor schematic.

An additional 2N7000 is used to allow the Facedancer to conveniently reset the device without power cycling the USB.

A power cleaning net is used to provide a relatively stable source of 3.3v power to the CPU: a pair of capacitors is used to clean the power supply (not perfect, but fuck knows it’s better than USB straight through a 3.3v regulator), and a small resistor (10 Ohms) is used to isolate the switching noise of the CPU so we’re better able to analyze our work with an oscilloscope.

Project Control

Finally, a Python script is used to tie together the Chipwhisperer and the script. This is similar to my other glitching experiments, and a copy of the code can be found on my Github. Note that no integration is available at the time of writing with my graphing / project review utilities, I’ll add these as time permits.

Note that you must build and update the Facedancer’s firmware if you want to use this the Python scripts make use of new commands which are not implemented by default.

Alternatively, Kate Temkin’s work might also be useful, particularly if you are using an alternative back-end to the MAX3421.

And with this, we sit back and wait for the glitches to appear:

Posted in Bards, Computers, Jesting | Leave a comment

The Z80 Adventure Part II

Today, I was able to use the Z80 I built last week to boot Grant Searle’s Z80 BASIC ROM without modifications, with access to both banks of SRAM. You can download this ROM file here.

Several core pieces of functionality had to be implemented to enable full operation, which will be described here.

“UART” from Z80

If you have been reviewing datasheets, you will note that unlike modern microcontrollers, there is no hardware serial RX/TX from the Z80. Instead, this function can be implemented in myriad different ways, depending on agreement between the BIOS (a handful of interrupt vectors counts right?) and the system designer – anything from memory mapped IO to in/out on special ports was OK.

In this case, as I wanted to run Grant Searle’s BASIC ROM, my implemention had to be compatible with what the ROM was expecting. The ROM implemented IO using IOREQ access to ports 0x80 (Control) and 0x81. Both are implemented by the ATMega.

Truth told, I didn’t understand the control register implementation, but I didn’t need to – the BASIC ROM only used this register to check for readiness to send and receive, so I faked it with fixed values, which was good enough to support the minimal requirement of allowing the Z80 to send and receive characters.

This leads us to our next problem…

Interrupts

Send and receive functionality is enough for an extremely primitive terminal, but is not enough for the BASIC console to work. The interrupt stub for the BASIC rom reveals why:

In truth, incoming characters were handled by a reset vector at 0x38 (no equivalent poll loops were present in BASIC.ASM).

I handled this by first converting the AVR’s UART to generate interrupts. This can be done by:

  • Setting the RXCIE bit in the UART control register.
  • At build/flash-time, registering the UART Receive Complete interrupt vector
  • After enabling hardware UART at runtime, enable global interrupts with sei()

This done, I then modified the interrupt handler to buffer the stored character and set a global “interrupt pending” flag, such that the interrupt line could be asserted low at the next available clock cycle. A bit of trickery here – as the Z80 does not execute one instruction per cycle, we must hold the interrupt line until the interrupt is acknowledged, requiring the IOREQ and M1 lines to be held low, as per the following diagram:

To do this, I carved out another line from the AVR-Z80 address bus (remember: we need to pull each “unmanaged” address line to 0 so it doesn’t fragment our memory accesses) for M1. I wrote some handler code to assert an interrupt until acknowledgement, then insert a fake clock cycle to return an interrupt vector, incase I want Mode 2 interrupts in future.

By this point, I had enough for the BASIC interpreter to function, which leads us to our next problem…

SRAM / Bus Width

The BASIC ROM (interrupt stub + BASIC interpreter) was all of 8KB. Our newly shrunken address bus simply was not able to write 8KB to memory by itself – nor did I want to forcibly assert control of the WR/RD lines while the Z80 was held in reset.

Instead, I relied on using a small bootloader from the PSoC project, here. I modified this slightly to complete more quickly (as I only needed 0x2000, not 0x4000 bytes), and hard-coded this into the AVR program.

I then wrote a fake IO port at 0xFF as the original author did, serving up the BASIC ROM. Unfortunately, the BASIC ROM was too large to fit into the AVR’s SRAM, so I modified the code further to place both the bootloader and BASIC ROM into program memory, carefully refactoring code as necessary to support the change.

With all this done, BASIC successfully launches:

As an added bonus, the memory-selftest function indicates that both banks of SRAM are recognized by the Z80, and accessible to the user transparently. Hooray!

Posted in Bards, Computers, Jesting | 1 Comment