Writeups – nothing, reversing (TWCTF)

This weekend, I took part in Tokyo Westerns CTF 2020. I only managed a few hours on Sunday night, but I was able to solve 2 (warmup) challengs. The writeups are below, for future reference:

Nothing More to Say

This challenge was presented as a 64-bit Linux ELF file and corresponding source code. You can download this here. The challenge text describes it as a format string vulnerabiilty.

Initial inspection shows that the challenge is reasonably simple, but established some constraints:

  • The input is constrained to 0x100 characters. Minus 3*8 characters (for offsets), then minus 3*7 (for “%..$hhn”) means we have an absolute maximum of 0xd3 characters.
  • read() is the input mechanism, so null bytes are OK
  • Input is passed to printf directly.
  • We have unlimited attempts to exploit
  • execstack is enabled – maybe this is an avenue

Initially, I leaked the remote libc by printing the GOT. Using an online libc database, we quickly find this is an Ubuntu libc, with three magic ROP gadgets – unfortunately, overwriting printf with these doesn’t seem to work (constraints not satisfied?)

Unfortunately, on this version of libc, system’s address ends with 0xe0, placing this out of reach of our format string – but on reversing the libc file, we note that system simply proxies the call to 0x4EF50:

While 0xEF is out of reach of our format string, libc base randomisation might cause an overflow to the next byte (depending on where libc is loaded at). Therefore, we can hope the second and third least significant bytes on the offset of libc+0x4ef50 are lower than 0xd3, allowing us to overwrite printf in a single go.

A few tries later, and success:

The resulting script is here. Life is made significantly easier by the fact that null bytes are accepted – we can simply stash our %n pointers at the end of our input, and reference them using static / known indexes (they’re always 33/34/35 – we can pad the input’s length to put them there.

It’s been so long since I’ve seen a format string exploit, it’s good to practice this again, even if it’s a bit unrealistic.

Reversing iS Amazing

This challenge was presented as a Linux binary, which you can download here. Initial reverse engineering shows that this uses the OpenSSL Public Key API to sign the user’s input, then compares the signed input to a static block:

The simplest approach (IMO) is to extract the key and static block from the binary during execution (breaking at BIO_new_mem_buf and memcmp), save it to file and

We can use gdb’s “dump” command to extract memory:

A little reading of the OpenSSL man pages later, and we have a stub executable that can read the private key and signed block from memory, and use RSA_public_decrypt to recover the message digest into readable form:

I also attempted the web “urlcheck” challenge for fun, but wasn’t able to solve it in the time allocated.

Thankyou to the Tokyo Westerns team for putting together another great CTF, I look forward to playing again next year.

Posted in Bards, Computers, Jesting | Leave a comment

Writeups – roppity, slithery (CSAW 2020)

This weekend, I participated briefly in the CSAW 2020 CTF. I solved a number of challenges, and below are the writeups for two (fairly straightforward, unfortunately) challenges, and one I missed because I did something incredibly dumb in the exploitation process.

roppity

This challenge was presented as a 64-bit Linux binary, which you can download here.

Initial reverse engineering showed this to be a straightforward two-stage ROP challenge – leaking the location of something in libc, then directly calling one_gadget. A puts call at 0x4005F5 can be exploited, leaking the GOT – and then helpfully, this continues onto a gets call at 0x400606.

We can use ropper and one_gadget to help us construct the rop chain, and moments later, we have a working exploit:

slithery

This challenge was presented as a host / port combination, which hosted a listening service, purporting to be a Python sandbox, with source code provided. The sandbox can be found here.

Initial exploration showed that we actually have a working exec primitive, but didn’t consistently work. That said, I could print items, and the blacklist didn’t cover itself (i.e. didn’t cover “blacklist”).

The following silly trick worked to get us out of the sandbox, by deleting the blacklist, importing the “os” module then system’ing bash:

It looks like the intended solution was something different, triggering some kind of exception and hitting a second sandbox (not provided):

Thankfully, we never had to deal with this.

grid

This challenge was presented as a 64-bit Linux binary, which you can download here.

Initial reverse engineering showed that this was a mess of C++. The application seemed to allow manipulation of a “grid”, which appeared to be implemented as a linked list, and then “copied” to the stack when the “d” command was used to display the grid:

Using this as an exploit primitive, we can write arbitrary (within reason: cin dies on some characters) content to the stack, including ROP gadgets (here: a call to cout trying to point to the GOT, but notice I can’t enter the full address of the GOT onto the stack):

I spent a little while searching for a GOT leak, but I didn’t realize: you could leak libc content simply by hitting “d” without overwriting anything.

FML.

With something in libc, it’s trivial to then overwrite a return address to hit a one_gadget special and get a shell. No flag this time, better luck next time 🙂

Thanks to the CSAW team for putting together this CTF – I look forward to playing again next year.

Posted in Bards, Computers, Jesting | Leave a comment

EMFI Privilege Escalation on Raspberry Pi

Over the past week, I have been exploring the concept of extending the KERNELFAULT attack to use EMFI, with some success, in the context of setting up further attacks against other targets.

The key questions to answer for me were as follows:

  • By nature, the transient EM pulse must be an order of magnitude shorter than the crowbar glitch employed previously. What does this look like in practice?
  • Is the fault model consistent – are we still encountering the same kind of program corruption as we experienced previously, or are there new faults to deal with?
  • Where are the most vulnerable locations above (or below!) the Broadcom SoC? What effect does different coil configuration have?

This exercised used a ChipSHOUTER and a ChipWhisperer-Lite running a simple counting loop as a timer. The same trigger program was used as the KERNELFAULT experiments – no modification was needed.

An afternoon of brute forcing led us to a successful result:

Of note:

  • The pulse length is significantly shorter than the previous use case, indicating (as expected) that the impact of decoupling capacitors is greatly reduced or removed in this scenario.
  • The fault model was consistent, the same type of crashes was occurring whether I’m VCC glitching or using EMFI.

From a positioning perspective, I noted a different result to Trouchkine et al’s paper (here):

This could be due to a difference in coil selection / manufacturing imperfection in the coil, or maybe it was wind pushing the board slightly. After watching some of Colin O’flynn’s work, I’ve purchased a Panavise clamp to hold the target in place, so I can open a window and not impact the results in future.

No testing has yet been done on the effect of coil choice, but I expect this to have a similar impact to moving the position of the coil: the shape and direction of the generated magnetic field would differ, so naturally, the results would too.

Some improvements were made to the classification script, which can help us identify where the successful glitches are occuring (red diamond marks indicating where code is executing down the commit_creds path – there’s some missing “bad crashes” here to remove clutter):

Interestingly, I was able to force execution down the commit_creds path when glitching using an ext_offset of approximately 200 (same as the VCC glitching experiment which previously succeeded), but was unable to wrangle this into a successful privilege escalation (crashes in commit_creds: I suspect this timing represents a before-and-after of the ns_capable_setid call in sys_setresuid.

This experiment required minimal code changes, everything is in Github here in the experiments/pi folder.

Much, much more work is to be done here, and I look forward to extending the attack against other complex SoC’s.

Posted in Bards, Computers, Jesting | Leave a comment

Writeups – Basics, Beginner (Google CTF 2020)

Last weekend, I spent a little bit of time on Google CTF. I solved 2 challenges – just goes to show how out of practice I am, use it or lose it! – and I’ll present the writeups below, for reference.

Beginner

This challenge is presented as a Linux ELF file. You can download this here.

This file requests a flag from stdin, and verifies it using some unusual instructions:

Due to my unbelievable penchant for laziness (on the subject of which – I recently read that surely we should measure success by our ability to be idle in this age of plenty, with which I strongly agree), I decided to solve this using symbolic execution. Unfortunately, it turns out angr had changed since I last used it, so a little bit of Google later, and we’ve got a new working “easy brute force” script, which you can download here.

Of note, there are multiple solutions to this challenge, but only one viable flag – flag input constraints were used to limit the search space.

A few moments of Python later, and we have a viable flag:

Basics

This challenge was presented as a Verilator “pack” of a SystemVerilog file and a C++ test bench. You can download this here.

Upon initial inspection, the critical “check” function is as follows:

wire [55:0] magic = {
{memory[0], memory[5]},
{memory[6], memory[2]},
{memory[4], memory[3]},
{memory[7], memory[1]}
};

wire [55:0] kittens = { magic[9:0], magic[41:22], magic[21:10], magic[55:42] };
assign open_safe = kittens == 56'd3008192072309708;

The challenge seems simple enough, but the devil is in the details, and is annoying enough to warrant some thought for how to correctly reverse this.

I initially thought to simply modify the test bench to print the value of kittens on each clock cycle, but Verilator had optimised this out, and I couldn’t get this to work trivially. Instead, I ended up simply brute forcing the bit associations between input and output: once I knew which bits in the input influenced which bits in the output, it’s a trivial exercise to construct the input from the desired flag value:

You can download the brute forcer here.

Thankyou to the Google team for putting together a fantastic CTF as always – a shame I only spent the time/effort to solve 2 challenges (but it’s good to be dipping my toes back into the water again).

Posted in Bards, Computers, Jesting | Leave a comment

Side-Channel Analysis of Keeloq

Over the past few days, I have spent some time reviewing side-channel analysis of the Keeloq cipher. This post documents the results of this review.

Background

This work is not “new” per se: various forms of side-channel attack against Keeloq have been carried out in the past. Some public resources around this are below:

In addition, the 25C3 presentation on breaking Keeloq is worth watching, as is (albeit slightly beyond the scope of this post) this presentation at RuhrSec 2017.

To begin with, let’s take an overview of the Keeloq algorithm:

Put simply, Keeloq “mixes in” a single bit of key material from each round – therefore, in theory, we should be able to use this as a distinguisher for a simple DPA attack, as set forth in marc-invalid’s work.

Initial Analysis

To start, I took an HCS301-based entry system from Aliexpress. Opening a remote, I was able to locate the HCS301 chip, which I could extract with a little bit of hot air. I placed a 10 Ohm shunt in the VCC line, wrote a simple AVR trigger, and reviewed my initial power trace:

On the trace, some “blocks” are clearly visible, including one in the middle which appears to be a cryptographic block. The RF synchronisation pulses can be seen on the right: unfortunately, this revealed nothing. A tip from the NewAE forums pointed me to using a larger shunt resistor, which together with using the LC filter from the ChipWhisperer’s CW308 board did the trick, and gave me clean traces:

This looks much more like the power trace in the 2008 paper, and from here, I can make a guess at the location of the cryptographic block. Note that since 2008, this chip completes the Keeloq encryption a bit quicker, and is finished in <20ms from the trigger point, suggesting some implementation change.

Onwards, to Key Recovery!

I initially attempted to go down the marc-invalid path of identifying the ciphertext output via a TVLA test, but this didn’t produce the expected correlation peaks (TVLA against any bit of ciphertext output):

No matter – we can identify that our power trace has the same general layout as the 2008 paper, albeit with some timing differences, so let’s assume that the encryption operation is in the same general place:

We begin our exploration by performing a CPA attack against the Hamming distance of the last intermediate value of Keeloq to the final value, in the final round of decryption with a guessed key but (LSB of the key guess below, turns out setting the maximum key guess value to 2 breaks reporting functionality elsewhere):

Unfortunately,the raw difference in correlation value isn’t that huge, and this introduces the additional complexity of having to backtrack when a bit is guessed incorrectly (in the 2008 paper, this is when all candidates for the next key guessing round show low correlation values). However, we can note that given the round-based nature of Keeloq, the Hamming distance between a “guessed” intermediate value and the actual intermediate value should broaden: that is, the error from a single incorrect guessed bit of Keeloq should cascade through subsequent rounds, resulting in a different intermediate values (and different Hamming distance in the state register) over the next few rounds.

In truth, the actual difference is tiny, but we should be able to correlate this with known ciphertexts and known intermediate values to make this difference more significant.

We can capitalise on this by grouping Keeloq rounds into blocks of 8 bits of each, and perform a correlation analysis, correctly picking the actual key against a reprogrammed test chip correctly, with a stronger degree of confidence:

This result is backed up by continued success on the second round:

We can further confirm this across a variety of independent traces, further rounds, against the VCC line and across a ground shunt, etc.

Interestingly, it is possible to get some TVLA result on intermediate values that lends credibility (imo) to the hypothesis that the round-to-round Hamming distance “cascades” with previous rounds.

Firstly, a TVLA with incorrect partitioning (partitioned by intermediate value of first key byte, all bits incorrect):

And now, with four correct bits:

And now with correct partitioning:

Note that this doesn’t actually identify the specific leakage point (or rather, indicates that there is a discernable, statistically significant difference of means across a large part of a trace), but can give some indicating by comparison whether a Keeloq bit is correct. Frustratingly, DPA (using the same partitioning code as above) doesn’t seem to work:

Further Work

The work is not yet complete, and there are still tooling improvements to make. This technique fails eventually, due to clock drift – the clock is inconsistent within a single trace, as well as between traces.

The marc-invalid fork of ChipWhisperer provides some clues to implementation, allowing the user to “slice” each trace and fit them onto a new trace, aligning each “slice” with the corresponding slices in other traces via their peak or minimum points – it’s my intention to implement something similar, and experiment with various peak extraction techniques.

There is also the matter of why DPA doesn’t work (and “fails hard”, not just ghost peaks). I wonder if it’s got to do with the fact that DPA is instantaneous (and the “raw” maximum difference of means is larger for an incorrect guess, than the correct guess).

All the code used above can be found here, and sample traces for Keeloq here (known key starts with 0x18). Note that you’ll need to edit support/attacks/Keeloq.py to enter fragments of the known key to replicate the above.

Posted in Bards, Computers, Jesting | Leave a comment