Earlier this year, I participated in the Nuit du Hack CTF. During this challenge, I attempted a high-point-value reversing challenge, “Matroichka”. I was able to solve the first three stages of this, but was unable to solve the fourth stage of the challenge. Today, I was able to solve this challenge to completion.
Before I begin, I’d like to credit the “hexpresso” team: you can read their writeup here. I felt I did not understand their writeup very much, so I was driven to reattempt this challenge. During this process, I referred to the hexpresso writeup for the “QEMU method”, and to confirm my result.
Congratulations, hexpresso, on completing this challenge in time (I think), and thankyou for your writeup.
Without further ado, I present the writeup for Matroichka Stage 4 below.
This challenge is presented as part of the Matroichka series of challenges. By solving the first three challenges of this series, the player is presented with stage 4: stage4 download. A quick “file” command shows this to be a Master Boot Record (MBR). The MBR typically lives at the beginning of one’s hard-drive at physical offset 0, but is not a traditional executable.
It is imperative that we grasp how an MBR works if we are to tackle this challenge, therefore, I would advise reading this page and keeping this page bookmarked. For the purposes of this CTF writeup, it is enough to know the following:
- At the completion of the BIOS phase of boot, we proceed to the MBR phase
- The first 512 bytes of the MBR (the first disk sector) is loaded to offset 0x7c00
- The “DL” register is set to the disk number which the MBR is loaded from
- Control is transferred to this address.
We will tackle this problem using mostly static analysis, with dynamic analysis to complement our findings. You can use QEMU to emulate this MBR, as follows (stolen from hexpresso’s writeup):
qemu-system-x86_64 stage4 -s -S -monitor stdio
Then, you can use GDB to connect remotely to this QEMU instance, as follows:
gdb target remote localhost:1234 set architecture i8086 b *0x7C00 c
You can restart execution with system_reset from QEMU’s monitor console. While you may be tempted to run this on an AWS host, I recommend doing this on a graphical terminal, for reasons I will explain further into this writeup. Your setup should hopefully look something like this:
Down the rabbit hole we go!
We begin our adventure by carving the first 512 bytes of the challenge binary with “dd”, and then loading this into IDA Pro as a 16-bit raw executable at the offset of 0x7c00. Initial static analysis reveals a heavily obfuscated binary, with a calls to 0x7C02, 0x7C0E and 0x7C1B. These functions look like this:
These functions simply return and skip one byte, two bytes, and four bytes respectively. Continuing on with analysis with this in mind, we can see that the first 512 bytes are relatively simple, with no meaningful branches. The operations performed are:
- at 0x7CC4, Interrupt 0x13 is invoked, with AH being 0. This resets the disk drive which the MBR was loaded from.
- at 0x7D11, Interrupt 0x13 is invoked once more, with AH being 2, which is a disk read operation. The parameters are as follows:
- CH = 0 (use disk cylinder 0)
- CL = 2 (begin from disk sector 2)
- AL = 0x32 (read 0x32 sectors)
- ES:BX = 0x1000 (save at 0x1000)
This second interrupt returns an error code, with AL set to 9, indicating that only 9 sectors were read. This is correct, as the total file is 5120 bytes in length, and we’re starting from the second sector.
Following this, code eventually 0x7C4A, where we execution is transferred to 0x1000h:
While it is likely possible to use a single disassembly database to complete this challenge, it is far easier to carve out binary[512:], and load the new disassembly at 0x1000 (again, as 16-bit code).
Initial analysis reveals that the clusterfuck of skip_byte, skip_word and skip_dword have returned. This section is more complex, and there are three meaningful function calls:
- At 0x1202, a call is made to 0x107C. This function is a print loop, which prints out the data at 0x101B until it reaches a zero. This data is “What\x27s the magic word ?\x0a\x0d>”
- At 0x121A, a call is made to 0x1117. This function is a read loop, which reads characters from the keyboard, and checks that they are between 0x20 and 0x7E.
- At 0x1163, during each iteration of the above read loop, a call is made to 0x10D3. This function is an echo function, which simply plays back whichever character the user enters.
We know that this challenge is based on the user entering the correct key, so we will need to find out what happens to the user’s input. Static analysis at 0x1166 seems to indicate that this is stored at ds[si], with si being set to 3 prior to the function being called (0x1214):
We can use a breakpoint at 0x1166 to confirm that the user input is actually stored there:
Once the binary has completed reading the user input, it then proceeds to the next steps of a traditional boot process: a global descriptor table is loaded at 0x1286 (not relevant to this CTF), and then the CPU enters protected mode:
From Wikipedia, this mode was initially intended to allow access to more than 512KB of memory than was permitted in real mode. For the purposes of this CTF, this allows the use of 32-bit addressing, as well as the Interrupt Descriptor Table, which is used for the next “stage” of this challenge.
The binary also initializes the stack (to 0x9F000), and loads the address of user input into the “eax” register. Execution is then transferred to 0x1400:
Sectors 3 and onwards…
As the execution mode has changed from real mode (effective 16-bit) to protected mode (32-bit), it makes sense to reload the executable. Once again, we can use ‘dd’ to carve out the code from 0x1400 onward, and load this at 0x1400 as a 32-bit binary.
At this point, we can breathe a sigh of relief, as the skip_byte, skip_word and skip_dword seem to not be present in this executable: however, the analysis is no easier. We quickly realize that the structure of the executable is a bit odd – the user’s input, loaded into “eax” by stage 2, is moved to a static address, an interrupt descriptor table (IDT) is set up, and the executable enters an infinite loop:
From here, my next stop for analysis was the Interrupt Descriptor Table itself. The IDT is loaded via the “LIDT” instruction, from 0x17FE. This is stored as an array of 8-byte structs, as follows:
The IDTDescr structure is not available by default in IDA, but this can be defined by hitting Shift F9 (Views -> Open Subviews -> Structures) and then defining your own structure:
An interrupt descriptor table is similar to an exception handler, in the sense that it defines a series of conditions which describe when an interrupt “fires”, and the function that it runs when the interrupt triggers (the poorly-named offset_2).
Without further clues to how the code worked, I began disassembling the interrupt handlers one by one. An interesting pattern became apparent in how the interrupt descriptors finished off:
At 0x14C7, at the conclusion of the second interrupt handler, the application appears to modify the handler for Interrupt 0x20, and then re-initialize the interrupt descriptor table. This appears to have been used as a substitute for flow control obfuscation, but this challenge is solveable without completely reversing this.
After quickly reviewing them all, it becomes apparent that there are a few primary “types” of interrupt handlers:
- A few interrupt handlers attempt to compare the passcode against known values.
- Some interrupt handlers compared the passcode to “Good_Game_!”
- Some interrupt handlers compare the passcode to semi-printable characters (handlers 5, 26 and 28).
- A few interrupt handlers appeared to call an XOR function.
- Interrupt handler 25 (0x167D) appears to do this against the user’s entered password, xoring against key material stored at 0x19CB.
- A few interrupt handlers clears the interrupt flag, sets it again, and then exits (these seem to do nothing?)
- A few interrupt handlers move zero bytes into other parts of code. I initially feared these were obfuscation traps, designed to slow down analysis by making the on-disk and in-memory copies different, but these turned out to be irrelevant.
From here, I attempted to enter “Good_Game_!”, which causes the “win” message to show up, but according to the hexpresso team’s writeup, this is not the flag.
The other obvious path of analysis is the passcode XOR, along with interrupt handlers 5, 26 and 28 – presumably, the user input (initially all printable characters) would be XOR’ed with a static key, and then compared against unprintable characters.
The cipher is simple enough, and a little bit of Python reveals a second key:
Entering the key shows the “win” message again, confirming the result of the hexpresso team’s analysis:
Unfortunately, this challenge was not completed in time to score points in the Nuit du Hack CTF challenge, but it was an interesting challenge which taught me something new. Thanks to the Nuit du Hack team for building this challenge – I can only imagine the level of work which went into preparing this exercise.
In hindsight, it was my laziness alone which caused me not to solve this challenge – after breaking this challenge apart, none of the components were particularly difficult, just time-consuming with the skip_* mangling of the disassembly.
For those of you playing, I’ll see you in the pwn.voting CTF this weekend (though I’m unsure how much time I can commit to this, I am keen to work on other firmware reverse engineering tasks – I am excited that my toolchain appears to be finally working for my target devices).
For those of you sitting at your desks, doing business excellence emails and slide packs, let this serve as a reminder. Row, row, fight the power, never let the business excellence win – no matter the cost.