Flash Glitching to Recovery Bootloaders

Occasionally, one comes across an interesting attack which is beautiful in both it’s simplicity and effectiveness. Earlier this week, I read a post around glitching Flash chips to get to recovery bootloaders (unfortunately, I didn’t save the link), and earlier today I reproduced the attack successfully.

The theory of this attack is rather simple: the purpose of a recovery bootloader is to allow a technicians to repair a device in a “bricked” state: for example, when a key component fails. If we simulate this behaviour, we get to the recovery bootloader.

My experiment was against a donated NetComm N300 wireless router. I started by taking apart the device, looking for the Flash memory chip in use:

This is on the underside of the PCB, where the UART and JTAG interfaces are also reachable. This Flash chip was a Macronix MX15L12835FMI – I proceeded to review the data sheet, to identify the pin out:

To simulate data corruption, I would temporarily ground pin 8 during the boot process, effectively causing data read operations to return incorrect data. The timing for this could be leeched from pin 16 (or really, 15), making this particular attack 100% reliable.

With this plan of attack, I wired up a transistor to the Flash circuit:

I then build a quick-and-dirty platform to demo the attack.

 always @(posedge youre_not_the_boss_of_me_in) # SCLK (or SI?)
 begin
   ctr <= ctr + 1;
   if (ctr == 3'b100)
   begin
     ctr <= 0;
     glitch_status <= 2'b10; 
   end
   if (glitch_status > 0) begin
   glitch_r <= 1'b1;
   glitch_status <= glitch_status - 1;
   end else begin
   glitch_r <= 1'b0;
   end
 end

As you can see, I opted for the “wide net” approach, figuring that if I put glitches all over the slave out line, the kernel would surely be corrupted. My intial attempts met with failure, with the device not producing any output on the UART interface, and indicating an error state via it’s front LED’s (disabling all but two, then turning them off).

I hypothesized that this was due to the fault being injected too early into the Flash chip: that is, the device simply figured it wasn’t worth booting, instead of trying to read Flash and getting corrupted data. I quickly added a manual toggle switch, which I could hit while the device was booting, and was met with success:

From here, we’re able to dump (and modify!) the Jboot bootloader itself, otherwise inaccessible from the operating system:

The ability to write memory and jump to memory is both available – while there is no POC at the time of writing, this is clearly enough to compromise the bootloader and load whatever you want – in theory (with some effort, i.e. more than rewriting the Flash chip in this case), bypassing whatever controls the OS loads later.

You can download the final code used for the project here.

Note that in this particular example, all this is moot – you can enter the recovery bootloader by simply sending a newline during the boot process. Poor choice of target, but all I had at the time.

Posted in Bards, Computers, Jesting | Tagged , , , | Leave a comment

Reference – Setting up Digilent Arty with Vivado 2017.3

This post documents the process needed to program a Digilent Arty board, using the Vivado 2017.3 software with a Verilog “hello world” project. Unfortunately, the Vivado software is confusing at best, and the 2017 version contains a few key differences from the last guide I could find (a 2015 version – eb.dy.fi/2015/11/arty-hello-world/), which had me stuck for a while.

Installation

Firstly, you’ll need to download the Vivado software from the Xilinx website. The WebPack version suffices for this device. Once this is done, install the Board Support files from Digilent, which you can find here. Unzip the file, and copy zipfile\vivado-boards-master\new\board_files\* to vivado_path\data\boards\board_files.

Note that on purchasing an Arty kit from Digilent, you’ll be provided with a slip of paper with a license code for the Vivado software. This (to my understanding) provides you with support for the Design Edition of this software, for one year, locked to the device you’ve purchased – it is not necessary to make use of this at any point.

Creating a Project

Now, open Vivado, and click “Create Project”. Select the “RTL Project” option, and tick the “Do not specify sources at this time” box for simplicity’s sake. Keep clicking through the application until you get to select a “Default Part”, and click “Boards”:

You should see the “Arty” board – select this, and hit “Finish”.

Creating a Module

Our next step is to create our top module. In the Flow Manager (the box on the left), click “Add Sources”, and select “Add or create design sources” when prompted. In the next window, click “Create File”. You should see something like this:

Leave this as Verilog for now, and name it what you want. Hit OK to create the file, then hit Finish. A new dialog box should pop up, allowing you to specify the IO parameters of the file. We want to define some switches for input, and some LED’s for output, as follows:

You may be thinking, “where do I assign IO ports” – don’t worry, we’ll come back to this later – just hit OK for now. Let’s create a very simple Verilog hello world example:

module top1(
 output [3:0] leds,
 input [3:0] switches
 );
 reg [3:0] led_status;
 assign leds = led_status;
 always @(*) begin
 led_status = switches;
 end
endmodule

Once this is entered, save your work, and use the Run Synthesis function on the left (or press F11). A box should pop up, asking you where to save the results – just leave it for now, this will save to the project directory you created right at the start. Once done, select the “Open Synthesized Design” option. You should see something like this:

Mapping IO Ports

Our next task is to tell the software which IO pins map to which ports. Go to the Window menu, and click the “I/O Ports” item. A new tab should appear in the status section at the bottom, and gain focus. It should look like this:

Here, we can define what the “leds” and “switches” actually map to on the board. Here, we can refer to the Arty reference guide (shamelessly copied from http://eb.dy.fi/2015/11/arty-hello-world/):

You can download a more full-featured reference from the Digilent website here. It explains how to interact with the rest of the board.

Let’s map the buttons and the LED’s to their respective pins, and operate them at 3.3v, as follows:

Once this is done, hit “Run Implementation” in the Flow Manager box. When prompted, save the constraints file to whatever name you like.

Note that this process of adding constraints will make our synthesis outdated – when prompted, simply re-run the synthesis process. If all goes well, you should see the following dialog box:

At this point, you can generate the bit stream, and we will proceed to program the device.

Programming the Device

At this point, connect your device to your computer and open the Hardware Manager
(last option in the Flow Manager box). Select “Open Target”, then “Auto Connect”. The “Program Device” button should now be enabled. On clicking this, you’ll be prompted for a bit stream file.

Be careful at this step – this “Open File” dialog box will default to the last place you used it. This may be your last project, and if all your file names are “fuck.v” or “lol.v”, this can result in some sadness.

Select your bit stream file – this will be in projectdir/runs/impl_1 – then hit Program. Your LED’s should go off, but will light up once you hit the corresponding buttons.

Using a Clock Generator

One major change from the IceStorm development environment is the presence of the Clocking Wizard. This generates wrapper code to turn an on-board 100Mhz clock into whatever clock output you want – the maximum (according to Xilinx) is 1.6Ghz. It is possible to tap into this clock signal directly via pin E3, but it’s possible to use the Clocking Wizard IP to make a new clock, as follows.

To add a clock, in the Flow Manager window, click the “IP Catalog” icon. Search for and select the “Clocking Wizard” – you should see something like this:

On double clicking, you’ll see the clock configuration window, as follows (for me, the “Show disabled ports” was ticked by default – but this is irrelevant for now):

In this example, we’ll be using the system 100Mhz clock to drive the pin. Set the CLK_IN1 input to “sys clock”.

At this point, you may be tempted to link the “EXT_RESET_IN” input to “reset”: but don’t. If you do, the clock will be enabled when the reset is disabled (i.e. pressed), and you ‘ll need to hold down the reset button on the board to operate the clock.

Next, we’ll go to the “Output Clocks” option. Here, change the Requested output frequency to what you want:

Then, hit OK. You should see something like this:

Here, leave everything and hit Generate. This won’t actually tie up any in your design, but will generate instantiation code, which lets you “use” the clock: that is, this step generates a Verilog function which you can use to feed in an actual clock. To access this function, navigate back to the Sources view, and expand the new node in the source tree (by default, clk_wiz_0). Open the new Verilog file, and you should see this:

Copy the code at the bottom – think of this as a black-box module you can now use. Now, instantiate this in your code as follows:

module clktest(
 input clk_in1,
 input reset,
 output [3:0] leds
 );
 
 wire lock_led;
 reg [32:1] q;
 reg [3:0] leds_reg;
 wire clk_out1;
 
 
 clk_wiz_0 instance_name
 (
 .clk_out1(clk_out1),
 .reset(~reset),
 .locked(lock_led),
 .clk_in1(clk_in1));
 
 assign leds = leds_reg;
 
 initial begin
 q = 0;
 leds_reg = 4'b0;
 end

always @(posedge clk_out1)
 begin
 q <= q + 1;
 if (q[26] == 1'b1)
 begin
 leds_reg <= leds_reg + q[26];
 q <= 32'b0;
 end
 end
endmodule

Now, follow the above guide to create a bitstream and program your device, and you should see blink.v come to life.

Simulating a Design

It is also possible to use the software to simulate a design. In fairness – this is a big improvement over the Icarus Verilog / GTKWave workflow for the IceStorm environment. To set up a simulation, right click “Simulation Sources” in the Sources window, and add a new simulation source file. In this example, we can use the following simple simulation logic:

module sim_1(
 );
 reg [3:0] buttons;
 wire [3:0] e_leds;
 
 top1 test_target (e_leds,buttons);

initial
 begin
 #50
 buttons <= 4'b1010;
 #100
 buttons <= 4'b1011; 
 end
endmodule

This will cause a pattern of “1010” to be fed to the buttons at every 50 intervals, then a pattern of “1011” to be used at every 100 intervals. Then, click “Run Simulation” in the Flow Manager area, and click “Run Behavioural Simulation”. You should see something like this:

Note that simulating a clock is a little different – while it is possible to right click a clock input signal and use the “Force Clock” option, it may be more logical to write a test bench which simply flips the clock quickly (you can feed this into a PLL / clockgen):

module clktest_tb(

);
 
 reg clk_in1_t;
 wire reset_t;
 wire [3:0] leds_t;
 
 initial begin
 clk_in1_t = 1'b1;
 end
 
 clktest c (clk_in1_t, reset_t, leds_t);
 always @(*)
 begin
 #1 clk_in1_t <= ~clk_in1_t;
 end
endmodule

I hope this post helps you get started with this device. I’m learning this as I go, so if I’ve made a mistake, or designed something in an ineffective manner, please let me know via the comments section 🙂

Posted in Bards, Computers, Jesting | Tagged , , , | Leave a comment

Writeup – Flare-on Challenge 2017 notepad.exe

Over the past few weeks, I devoted a intermittent chunks of time to the 2017 Flare-on Challenge. During the time allocated, I was able to solve 4 challenges (maybe? I got to pewpewboat.exe). I will present my writeup for one of the simpler challenges below.

notepad.exe

This challenge was presented as a Windows executable, which looked similar to notepad.exe. You can download the binary here.

Upon initial inspection in IDA, we can see that this is glaringly different from a regular notepad executable. We can begin by sifting some order from the madness: by converting the initial stack initialization to strings, we can begin to see a few clues:

We can also note an interesting pattern in the initial calls: that one function (let’s call it Function A) is called once, then Function B is called a number of times. Function B seems to be passed a hash of some manner of magic value, and the results are stored on the stack:

At this point, I was suspicious – this smelled awfully like old-school function lookups: and this would explain the fact that other functions in the application appeared to be passed a pointer to the beginning of the “function table”. We can confirm this with some quick analysis in WinDbg. Firstly, we can confirm that sub_10153D0 at 0x01013c59 returns the base address of kernel32.dll:

We then test sub_1015310, and can confirm that it is indeed resolving functions:

We can then proceed down the list, and make our own “function table”, corresponding to what’s loaded at runtime, to help us disassemble the rest of the challenge. We know that the order of functions cannot change, due to static magic numbers being used to reference the functions.

From here, this looks like a stock standard file infecting replicator – the file infection code lies at 0x01014e20: but we note an interesting detour to the side at 0x010146c0. This function plays some fun games with the timestamp of each file:

By following the data trail (i.e. manual WinDbg), we know that this function compares *both* the timestamp in the FileHeader of the current process, as well as the “file to infect” – if they both match a magic value, some data is read from the target file, and written to target.bin (certainly an interesting way to hide the key).

There are 5 such values, and a sixth to trigger the “win” message box, decrypting something with a key from key.bin and displaying it (presumably, the flag). I took CFF explorer and dutifully created 6 copies of notepad.exe, with one of the magic values each in the FileHeader->TimeDateStamp field. I then downloaded the Flare-on Challenge 2016 binaries, and placed them in %USERPROFILE%\flareon2016challenge. Running the binaries in order, and then running the final “win” binary, produces the flag:

Success.

All in all, this was an enjoyable, yet humbling experience. I am thankful for the experience and the lesson – that not having time is no excuse, that it is up to us to make time for what we hold dear (in this case, trying to git gud). Thanks to the FireEye folks for putting on this challenge, year after year – I look forward to exceeding my progress next year.

There will likely be no new post this weekend, due to attending Ruxcon (though I’m chipping away at the Vivado command-line tools – which is probably worth a post on it’s own).

Posted in Bards, Computers, Jesting | Tagged | Leave a comment

FPGA-Driven Glitching (Regular and Crowbar) with Phase-Locked Loop

I spent the majority of this weekend working on improving my knowledge of hardware security, focussing on my experiments with glitching. In my previous posts on the topic, I had investigated how one could use two Arduinos to generate glitches against each other, by cutting the power supply.

In this post, I will cover how to do the same with an FPGA, and how to use a crowbar circuit to generate an alternative style of glitch, which is advantageous for attacking production devices where we cannot (or do not want to) modify the target hardware.

iCE40 HX8K

In these experiments, I will be using the iCE40 HX8K breakout board to drive my glitches:

This board can be purchased for just under $65 each from Digikey at the time of writing – it comes with an inbuilt programmer, and can be programmed with the lightweight, vim-friendly Project IceStorm toolchain. In the above photo, this device is configured to be programmed into volatile memory only (J6 jumpers horizontal, no jumper on J7).

Crowbar Circuit Glitching

We will first create a crowbar circuit to generate glitches against an Arduino board (Arduino Uno-compatible target). Compared to previous glitches where we simply broke the power circuit on a glitch trigger, this circuit will short out VCC to GND. In diagram form:

I couldn’t find the transsitor symbol on draw.io, the little triangles are transistors.

In practice, we can implement this by removing the microcontroller from the Arduino board and inserting it into a breadboard, connecting a minimal set of pins back to the board, and then using a transistor across the VCC and GND lines, as per the following diagram:

All the blue pins should be wired up to their corresponding sockets on the host, and use a transistor (I used a 2N7000 I found on the floor) to create a short circuit when a a trigger signal is high. If all goes well, you should be able to switch on the board and have it work as normal (mostly). Our test code is a simple counting loop:

int x;
int y;
double i = 0;
while(1)
{
  for(x = 0;x < 500;x++)
  {
    for(y = 0;y < 500;y++)
    {
      i += 1;
    }
  }
  Serial.println(i,DEC);
}

The final result should resemble something like this:

Here, the purple and blue wires are VCC, and the orange and grey wires are GND. The red wire is the glitch trigger (and black is common ground for the FPGA): when it’s raised past the transistor’s threshold voltage, the transistor will begin conducting, shorting out VCC and GND. In it’s default state (with the transistor off), the Arduino board should function as normal.

Increasing Clock Speed with PLLs

The next step is to build the control code for this. The control code is simple – to generate continuous glitches, we perform an infinite counting loop – once we reach a sufficiently large number, we pull an IO pin high (to switch on the transistor) for a number of clock cycles, and then we pull the pin low and reset the counting loop until the next glitch.

This code is well suited for an FPGA, but the onboard clock of our board only runs at 12Mhz. We can increase the clock speed dramatically by using a Phase Locked Loop (PLL). In a nutshell, a PLL is a way to generate a signal at a fixed ratio to the input clock. The IceStorm compiler toolkit provides a macro (SB_PLL40_CORE) to configure a PLL in the FPGA, and documentation for the iCE40 FPGA itself describes the parameters (from Page 7 of “iCE40 sysCLOCK PLL Design and Usage Guide”).

We can then construct our PLL initialization macro (by  “construct”, I mean “copy from the Internet and adjust as necessary”):

SB_PLL40_CORE #(.FEEDBACK_PATH("SIMPLE"),
 .PLLOUT_SELECT("GENCLK"),
 .DIVR(4'b0000),
 //.DIVF(7'b0001000), //25Mhz
 .DIVF(7'b0111111), // 48/96MHz
 //.DIVQ(3'b001), //25Mhz
 // .DIVQ(3'b100), //48Mhz
 .DIVQ(3'b011), //96Mhz

.FILTER_RANGE(3'b001), // wfm without PLL is broken
 ) uut (
 .REFERENCECLK(clk),
 .PLLOUTCORE(clk_25),
 // .LOCK(P16),
 .RESETB(1'b1),
 .BYPASS(1'b0)
 );

The actual glitching loop is a trivial counting loop with adjustable glitch width and toggling LED patterns to indicate when glitches occur, as follows:

always @(posedge clk_25) begin
 if (rst) begin
 glitch_reg <= 0;
 ctr_p <= 0;
 ctr_q <= 0;
 end else begin
 ctr_p <= ctr_p + 1;
 if (ctr_p == 32'h00800000) begin
 ctr_q <= 32'h00000fff; // target at 12mhz: how long does this need to be?
 if (blinkptn == 8'b10101010) begin
 blinkptn <= 8'b11110000;
 end else begin
 blinkptn <= 8'b10101010;
 end
 ctr_p <= 0;
 end
 if (ctr_q > 0) begin
 glitch_reg <= 1;
 ctr_q <= ctr_q - 1;
 end else begin
 glitch_reg <= 0;
 end
 end
end

We then program the FPGA, power on the contraption, and if all goes well, you should see glitches appear reliably (you’ll need to adjust to find the correct glitch width, and change the frequency to taste).

(My attempts in) Glitching the ESP8266…

In our previous experiments, I had used an ESP8266 device (a LoLin NodeMCU from the BSides Canberra badge – thanks Silvio!) to drive glitches against an Arduino target. I wanted to see if I could use a crowbar circuit to glitch this target. I first identified the VCC and GND rails, and tried the same circuit as above:

I also tried glitching the voltage regulator, shorting out the output to the ground pin, to little success. My next step was to remove the metal casing: 

Underneath, we can see an ESP8266EX MCU, as well as a NAND Flash, an oscillator crystal (26MHz) and a bunch of decoupling ~things~. At this point, I figured if I removed all the decoupling components, I’d be able to impact the power supply more meaningfully, but this met without success.

Note that one of these is not a decoupling component, and actually connects to the reset pin – I’ll leave identifying which one as an exercise for the reader.

After a little bit of scratching my head, I went back and re-read Colin O’Flynn’s excellent paper on crowbar circuits (https://eprint.iacr.org/2016/810.pdf), and my eye was drawn to an interesting comment:

 This particular MOSFET has a RDS(ON) of 0.035Ω, meaning it would
 be less effective against low-impedance power rails likely to be
 found on high-speed processor boards.

Going back to the datasheet of the 2N7000 transistor, I noticed a potentially fatal flaw:

It is my conjecture that this high of a resistance prevents a crowbar glitch from being effective against the ESP8266, as there is a much lower resistance (well, impedance) path through the MCU – therefore, even when VCC is shorted with GND, enough current flows through the MCU to make this not matter.

Furthermore, I made no attempt to solder directly to the pins of the MCU, as an error-isolating measure, primarily due to laziness. I will try this in future.

When all else fails!

At this point, I wanted to experiment with regular MITM-style glitching against the ESP8266, partly to confirm my own suspicion. The new setup was much simpler:

Here, I used an Arduino board to supply 3.3V directly to a 3.3V input on the target, and used a transistor to intercept the ground rail. With some slight modifications to the glitching code, I could reliably produce glitches here as well:

Strangely, this experiment produced far less arithmetic errors than a similar experiment against an Arduino. I’m not sure why this might be (my glitching is not precise enough? I need shorter glitch pulses?), but I will expeirment with this further when time permits.

Summary

In this article, we covered the basic theory of a crowbar circuit for glitching, and demonstrated it practically against two targets: one successful (Arduino / Atmega328p) and one unsuccessful (BSides Badge / ESP8266EX). This has left us with several points for further investigation, which I am most excited to continue working on.

You can download the control code (as well as board configuration file, and makefile) here.

Bonus!

Next week, I am running a hardware workshop, in conjunction with our friends from Sectalks – it is bitterly sobering that there is so little activity in this area in the local security community about this topic that I, of all people, feel comfortable standing up in front of a crowd and talking about this.

Nevertheless, building community and capability starts somewhere, and here, it starts with a slide pack of destiny:

 

Posted in Bards, Computers, Jesting | Tagged , | Leave a comment

Writeup – Backdoor Pi (Kaspersky Labs CTF)

This weekend, I spent some time participating in the Kaspersky Labs CTF. I was able to solve a few challenges in the time allocated – it is a humbling reminder of how quickly one’s mental acuity can dull, if not exercised relentlessly (or perhaps it’s just old age catching up to me).

From this CTF, I will present my solution to the Backdoor Pi challenge below.

Backdoor Pi

This challenge was presented as a zip file, which you can download here (warning: large file (approx. 96MB)). The challenge claims that this is “parts of the filesystem” from a Raspberry Pi SD Card, and the title implies we are looking for a backdoor.

We can start by extracting the archive into our filesystem. A little manual delving, and we quickly find a suspicious file:

The “file” command says that this is a compiled Python binary, and a quick “strings” shows something about a “fl4g”: a smart way to evade a simple strings check. We can use the “uncompyle” utility, an essential part of any reverse engineering toolkit, to grab source code from this file. From here, we can infer the algorithm:

We can identify the “user” input from /etc/passwd (“b4ckd00r_us3r”), but there is no indication of what the pincode is: no matter, here we can deploy brute force, to quickly identify the pincode, and thus, the flag (“b4ckd00r_us3r:12171337”).

You can download the brute force script here.

Thanks to the Kaspersky Labs team, who put together this CTF – I enjoyed the time I spent playing it (though I am frustrated by my inability to solve the more difficult forensics problems in time – so close, but not close enough), and I look forward to playing again when the opportunity next presents itself.

Posted in Bards, Computers, Jesting | Tagged | Leave a comment