Writeup – Old Crypto Server (AeroCTF)

This weekend, I spent a little bit of time doing AeroCTF, but couldn’t maintain focus due to the sheer number of other active projects. I solved one challenge in the time allocated, a fairly straightforward AES padding game that you can download here.

Astoundingly, I have no writeups of any challenges like this, so here we go.

The vulnerability in the system is straightforward as can be: the server, using AES in ECB mode (i.e. an input block will always have the same output block), accepts some input and pads it into block-sized chunks. We’re able to manipulate this by controlling the plaintext to some degree (allowing us to insert an arbitrary block), which we compare to the last block:

From here, we can individually brute force the bytes of the key one byte at a time – we add one character to the prefix, causing one character (second last character of the flag) to be added to the final byte of the key. By brute forcing the additional prefix character, we can determine what the second last character of the flag is:

Assuming a few key functions are available, we can automate the attack. The solver script I used for this challenge is here.

Thanks to the AeroCTF team for putting these challenges together – while I didn’t manage to solve any more during the time allocated, the other challenge I began looking at (aerofloat) looked fairly well put together.

See you in UTCTF.

Posted in Bards, Computers, Jesting | Leave a comment

Fault Injection on Linux: Practical KERNELFAULT-Style Attacks

This post will detail my examination of the KERNELFAULT paper, which you can download here. In brief summary, the KERNELFAULT paper describes a methodology for injecting faults into a system running Linux, for the purpose of privilege escalation to root, as well as for controlling the program counter.

The code used in this attack can be downloaded here, within the “pi” folder.

I began my experiment by preparing a Raspberry Pi 4. I made a software reset puller (controlled via ChipWhisperer’s GPIO’s) and broke out the UART. I also removed the 1.2V decoupling capacitors on the other side of the PCB, opposite the Broadcom SoC. These are marked below, with purple and blue markings:

Conceptually, the path to root is clear:

  • We begin as a standard user
  • We set all our unused registers to 0
  • We execute a setresuid(0,0,0) system call
  • We induce a fault of some description within the setresuid call, causing the operating system to incorrectly assign us a privilege.
  • We check if we now have elevated privileges, and if so, return the user to a root shell.

Let us examine the setresuid call further, pasted below for reference incase it changes down the track:

/*
* This function implements a generic ability to update ruid, euid,
* and suid. This allows you to implement the 4.4 compatible seteuid().
*/
long __sys_setresuid(uid_t ruid, uid_t euid, uid_t suid)
{
struct user_namespace *ns = current_user_ns();
const struct cred *old;
struct cred *new;
int retval;
kuid_t kruid, keuid, ksuid;

kruid = make_kuid(ns, ruid);
keuid = make_kuid(ns, euid);
ksuid = make_kuid(ns, suid);

if ((ruid != (uid_t) -1) && !uid_valid(kruid))
return -EINVAL;

if ((euid != (uid_t) -1) && !uid_valid(keuid))
return -EINVAL;

if ((suid != (uid_t) -1) && !uid_valid(ksuid))
return -EINVAL;

new = prepare_creds();
if (!new)
return -ENOMEM;

old = current_cred();

retval = -EPERM;
if (!ns_capable_setid(old->user_ns, CAP_SETUID)) {
if (ruid != (uid_t) -1 && !uid_eq(kruid, old->uid) &&
!uid_eq(kruid, old->euid) && !uid_eq(kruid, old->suid))
goto error;
if (euid != (uid_t) -1 && !uid_eq(keuid, old->uid) &&
!uid_eq(keuid, old->euid) && !uid_eq(keuid, old->suid))
goto error;
if (suid != (uid_t) -1 && !uid_eq(ksuid, old->uid) &&
!uid_eq(ksuid, old->euid) && !uid_eq(ksuid, old->suid))
goto error;
}

if (ruid != (uid_t) -1) {
new->uid = kruid;
if (!uid_eq(kruid, old->uid)) {
retval = set_user(new);
if (retval < 0)
goto error;
}
}
if (euid != (uid_t) -1)
new->euid = keuid;
if (suid != (uid_t) -1)
new->suid = ksuid;
new->fsuid = new->euid;

retval = security_task_fix_setuid(new, old, LSM_SETID_RES);
if (retval < 0)
goto error;

return commit_creds(new);

error:
abort_creds(new);
return retval;
}

The fault must be injected precisely, such that the ns_capable_setid passes. We are unable to precisely control the fault to within an instruction’s execution time (we are – as far as I’m aware – unable to synchronise to the Broadcom SoC’s internal PLL), but fortunately, the instructions highlighted in blue are not required for our attack to succeed, some instruction corruption here is fine, as long as we do not end up with a kernel panic. And there are some wierd, wallpaper-worthy kernel panics:

To set up our test harness, we create a small usermode program, available in the git repo, which uses a memory mapped GPIO to trigger a ChipWhisperer’s delay, then a glitch is inserted in the middle of the sys_setresuid call. Several additional checks are made, to determine if we have “won”, as well as a sanity counter to determine when we our glitches are too far forward: regular corruption of the counting loop indicates that the ext_offset of our glitches is too large.

We must be careful in leveraging our success as well – “/bin/bash” and “/bin/sh” both drop privileges upon execution, so we must use “/bin/dash”, which does not – this cost me a number of successful glitches.

A little while of glitching later, and we have success:

Along the way, a large number of crashes within the kernel were generated. I built a simple Python script to generate some simple statistics. Of note:

  • At a high level, 4982 glitch attempts were made across a few hours in one afternoon (on and off). Of these, 1393 were “crashes” of some description, and 4 attempts resulted in a successful privilege escalation, giving this instance of the attack a success rate of just under 0.1%.
    • Note that the average time to a successful glitch is 15 to 30 minutes, once a narrow range of successful glitching has been established.
  • The top three locations in the kernel for crashes to occur during glitching (giving clues to the severity and type of corruption) are:
    • cap_capable+0x2C with 106 instances
    • commit_creds+0x90 with 94 instances
    • cap_capable+0x14 with 58 instances
  • A good portion of the cap_capable glitches involved pointer derefrencing to near-zero addresses. Given that we are likely seeding the zero pointer, perhaps it is possible to *vastly* increase the likelihood of success by pointing cap_capable to pre-poisoned structures, established in userland (and semi-bypassing KASLR from crash reporting?)
  • Twice, the system suffered “unsafe recovery”, where the system would appear to recover, but executables would fail to run, as if corruption in some fundamental system ABI would occur, unpredictably. For example:

  • A variant of this is that the executable would “mute”, and simply do nothing, needing to be rebuilt before working again.

Thankyou to the Riscure team for their original public discussion of this type of attack – given the potential impacts, I’m surprised this style of attack is not further discussed in the industry. This type of work is eminently enjoyable, and I look forward to investigating similar techniques further.

Posted in Bards, Computers, Jesting | Leave a comment

Late Writeup – SimpleMachine (Codegate 2020 Teaser)

This weekend, I participated in the Codegate 2020 Teaser CTF. I started late – in my defence, ctftime’s time reporting was inaccurate, and I was able to solve the SimpleMachine challenge, though far too late to get points. The writeup is presented below.

SimpleMachine

This challenge was presented as an executable and a “target” file, which you can download here. You can also download a copy of my working notes here.

The binary is a stripped Linux executable, but the challenge name indicates it’s some kind of virtual machine. Fortunately, the executable is not obfuscated, making reverse engineering relatively simple. Moving through the disassembly (i.e. following the call flow from main()), we can note the opcode processor at 0x17C0:

Looking at the function, we can make a few key observations:

  • The byte at $rdi + 0x30 holds the opcode.
  • The word at $rdi + 0x34 holds argument 1
  • The word at $rdi + 0x34 holds argument 2
  • The return value is held in $rdi + 0x3E. We don’t know where it goes for now.
  • 8 total operations exist – read, write, load, xor, multiply (without carry), add, compare, jump-if-zero (I intially mistakenly thought this was “exit”).

Next, we instrument the application with GDB to observe this in action. We can see it load the first opcode, a “read” opcode, corresponding to the first 8 bytes in the “target” file (the xxd options reverse the byte endianness, and groups bytes into pairs):

Matching this to the disassembly, this is the “read” opcode, reading ox24 bytes into address {base} + 0x4000. By following the read call itself, base is defined by the first qword pointer at $rdi[0] within the opcode processor function.

Further tracing of the application, correlating application behaviour with disassembly listing, sheds light on the instruction format (little endian):

AABB XXXX YYYY ZZZZ 
AA: Addressing mode (controls behaviour of XYZ)
BB: opcode
XXXX: where the result is stored.
YYYY: arg1
ZZZZ: arg2

Spending a bit more time debugging the executable, we can observe the following behavior:

  • Firstly, the program reads a flag (the 06 opcode)
  • The program compares the first part of the flag to CODEGATE2020 through static compares. If any bytes don’t match, the program is exited through the use of opcode 5, with an argument of 0x1a0 (which leads to opcode 8, exit).
  • The program then loads a number of constants into memory, for an unknown purpose. Let’s call this the “key material”.

At this point, we are at 0xf8 in the “target” file. The program continues:

  • We load the constant 0xdead into a virtual register
  • We load the constant 0x1 into a virtual register
  • We double 0xdead, and save the result in a register. This is at address 0x108 – this is important later.
  • We xor 0xdead with the doubled 0xdead, and save the result in a register. The result is 0x63f7.
  • I’m not sure what the next instruction is for. We multiply 2 by 0 and save the result?
  • We load 0xf974, a part of the key material loaded earlier, into 0x154. Note that 0x154 is part of the initial code, 0xFFFF in the “target” file – the program has now become self-modifying. Let’s press on.
  • We save the constant 0x400c, after “CODEGAT2020”, into 0x14c. This is also self-modifying. Note that 0x400c points to the next bytes of the flag after “CODEGATE2020”.

At this point, we can monitor the executable with gdb, using the “rwatch” command:

rwatch <address>
  • Three “nops” are executed, consisting of 0000 0000 0000 0000 instructions.
  • We xor the two bytes of the input flag with 0x63f7
  • We add the result to the first two bytes of the key material, 0xf974
  • We compare the above result to 0.

We can represent the above operation as thus:

input_flag ^ 0x63f7 + 0xf974 =0x10000
input_flag = (0x10000 - 0xf974) ^ 0x63f7

We can quickly check this in Python:

Great, this looks pretty sane. Let’s move forward:

  • If the last result was not zero (i.e. if the flag was incorrect), jump to 0x1a0
  • The next three operations appear to be a loop counter, checking if we’ve hit 0xc iterations, and if not, jumping to 0x108.

At this point, instead of doubling 0xdead, it doubled the 0x63f7. Going through the loop a few times, we note the following repeating cycle occuring:

From here, it is a simple matter to derive the key manually, and confirm it via the simple_machine executable:

My solving methodology for this challenge was particularly haphazard, with a half-hearted attempt at writing an emulator and many, many failures at tracing the program, mostly due to my own fuckups in patching virtual compare operations. A shortcut I learned was that gdb can skip X iterations of a breakpoint, with the following command:

ignore 1 15
(ignores breakpoint 1 the next 15 times)

I disagree with the assessment that it is “ezpz”, but in another light, it is a humbling reminder of how much more I have to learn, and what a challenge it is to keep my skills up to date, while also expanding into other areas.

Thankyou to the Codegate 2020 CTF organisers for putting together these challenges. See you in Aero CTF.

Posted in Bards, Computers, Jesting | Leave a comment

The Z80 Adventure Part III

In my last post on this topic, I discussed improvements to my Z80 computer, which allowed the use of peripherals from within the Z80 OS. I recently rebuilt this computer, to allow for a bit more extensibility (and eventually, LED debugging).

Conceptually, this is similar to the last iteration, with an ATMega32A acting as a custom I/O port to the Z80, it’s gateway to the outside world. The address bus accessible to the ATMega has been reduced to 8 bits – the upper 4 bits weren’t being used, except for debugging (and we can debug with LED’s and a manual clock input).

To add wifi and disk support, I used the ESP-01 module: this acts as a WiFi bridge, so you can connect to it via netcat. It also supports a series of specially formatted (but human readable, for debugging) UART commands, which allow disk access:

The actual protocol is fairly simple:

  • !xx!yy!zz\r\n:
    • xx is set to 0xFF for read operations
    • yy is set to the low byte of the address bus.
    • zz is the data bus
    • This will wait for one byte – if it’s a “write-only” operation, like selecting the disk number, the byte will be treated as an “ack” and ignored. If it’s a read operation, like reading a byte from disk, the byte will be placed on the data bus (or routed as the ATMega sees fit).
  • !!! means an actual “!” is being printed over UART.

To correctly support replies, a change had to be made to the way the ATMega handled UART input. Previously, when the ATMega was simply bridging for the Z80, an asynchronous interrupt-based UART was fine – this was the way the Z80’s BASIC rom BIOS expected input. Here, we need to temporarily disable interrupts every time we want to wait for a response from the ESP-01 as part of a I/O operation, otherwise, your interrupt routine will consume your input:

char async_getchar()
{
	char tempByte;
	cli();
	while(!(UCSRA & (1 << RXC)));
	tempByte = UDR;
	sei();
}

Given the static nature of the Z80, all this occurs in one clock cycle – the clock is simply held, while the ATMega waits for a reply from the ESP module.

Finally, to connect to this system, we need to disable line buffering (stty -icanon; nc 192.168.1.1 23) before we use netcat, for characters to be sent correctly to the CP/M OS:

With this in place, the next step is to implement a working CP/M disk structure – but this can be fully implemented on the ESP side (which is, incidentally, a lot easier to program). I’m setting aside this project temporarily, to spend my time elsewhere. The code can be found on Github as usual.

Posted in Bards, Computers, Jesting | Leave a comment

Notes on RF Retroreflectors

Over the past few weeks, I’ve been spending some time learning about RF retroreflectors. The goal of this is to improve my understanding of the theory and application of RF in general.

An RF retroreflector is a device which takes a target signal, and when illuminated by a continuous wave of RF energy, amplitude modulates the wave as it’s reflected back:

The primary advantage of this device is that it doesn’t need power, and doesn’t “transmit” unless it’s actively illuminated.

Three primary resources exist for playing with RF Retroreflectors.

  • GBBPR2’s Youtubes, here
  • Mike Ossmann’s work, here
  • The Wakabayashi paper, here

All three are worth reviewing, and the entirety of GBBPR’s internet contributes are well worth watching. If you’re still out there, I hope you’re carrying out your good work still.

The concept of a RF retroreflector is simple: a piece of wire is used as a passive reflecting antenna, and a MOSFET controls it’s capacitance by connecting and disconnecting it to a ground plane. The device is trivial, and can be made as follows:

The MOSFET is flexible: I used a BSS138, but a standard 2N7000 works too. Select this based on the rise time, corresponding to the source signal you want to exfiltrate. A finished device looks like this:

The antenna can be any length, but a quarter-wavelength monopole is recommended from experimentation. More on this below.

We can use gnuradio to generate the illuminator wave, as follows:

Theoretically, the antenna should best modulate at it’s resonant frequency. In practice, we can observe a very wide frequency envelope in which we can successfully recover signal (subject to noise – I’m waiting for some nice log-periodic antennas to ship). Here is a 10khz square wave reflected at approximately 2.41 GHz:

And the same, at 2.47:

Through expeirmentation, the impact of frequency is eclipsed by the impact of illuminator strength – no wonder the NSA used a dedicated unit, all praise to the unblinking eye watching us all.

Recovering the target signal can be done with a simple moving average and a threshold filter:

To provide a reference for relative amplitude, here’s the same code running, with the retroreflector disabled:

Nevertheless, this is useless unless we can recover actual logical data. My initial attempt with a knock-off Arduino failed for reasons unknown, so I made a new PIC firmware to constantly transmit a stream of the character 0xAA (for timing clarity) over UART in a loop. To begin with, let’s confirm that we can “see” the signal:

This waveform can be downsampled and pushed out to file, all in GNURadio (it’s possible to stream it to a custom Python block, but it means your Python block needs to implement a USART receiver – example code for the streaming is in the Github below).

You can download the source code here. Have fun!

Posted in Bards, Computers, Jesting | Leave a comment