Earlier this week, I spent some time assisting a friend in unpacking a Python 2.6 PyInstaller’ed executable, with what seemed to be a number of broken magic numbers in an effort to confuse publically available unpacking tools.
In this post, I will present my approach for unpacking these files below, together with some source code. Note that this is not the fastest way to do this (in hindsight, I could have modified existing tools to ignore magic number checks, as well as tweak a check to the PYZ table of contents – but this is the scenic route).
Analysis + Method
The target is a CGI binary for a media device, which can be found here: file_upload.
(This isn’t actually a Zip file, just trying to get around WordPress’ restrictions).
My first step is to inspect the executable: it appears to be a standard 32-bit ELF binary, with a number of string references to _MEIPASS2, which is a dead giveaway for PyInstaller. Using pyi-archive-viewer (as well as some other tools I had collected over the course of CTF’ing) doesn’t succeed:
The stack trace seems to strongly indicate that we’re not finding a magic number that we’re expecting.
My next approach was to step through the executable in a disassembler, and get a closer view on what was happening in the binary. Initial analysis showed a standard embedded Python executable:
I had some experience with embedded python programming previously, so I had some knowledge of the API’s. Broadly speaking, an embedded Python program works as follows:
- A “Python State” is created, and modules / variables / objects are added to this (similar to “import”)
- To “bind” native functions to be callable from Python, you use a PyMethodDef structure and bind them to a module, which you import and use from within py
- To “bind” Python to C functions you call externally, you use PyRun_SimpleString and friends, and pass it the appropriate “state”.
Investigation of the executable led down two interesting paths: firstly, at 0x0804B4B8, there was a call to PyRun_SimpleStringFlags which appeared to take it’s input from a call to extract. Secondly, the “extract2fs” function appeared to write data to /tmp.
At this point, I started my investigation with PyRun_SimpleStringFlags: I breakpointed PyRun_SimpleStringFlags and dumped the first stack parameter, which would presumably contain the Python code:
The output file is unfortunately not too interesting:
Unfortunately, without the custom (ONELAN_*, Page_file_upload) classes, this file isn’t meaningful. My next approach was to use a similar breakpointing exercise against PyImport_ExecCodeModule, but we never imported the old modules, as “pyexpat” wasn’t available – this executable required a Python 2.6 environment, which is somewhat troublesome to set up correctly.
From here, I parked this approach and went for dumping files to disk: at first, I breakpointed extract2fs, but this didn’t trigger. Inspecting the disassembly, we quickly see why:
A bit of gdb later, to set al to 0x62 at the check above extract2fs, we are rewarded with one out1.pyc file in /tmp.
Unfortunately, we can’t extract this file – ArchiveViewer, pyi-archive-viewer and other tools fail, nothing that the table of contents is broken. A little research later, and I stumbled across this script, which I modified to disable the magic number check, and modified the Table of Contents calculation to give a sane number of files.
This new script does the trick, allowing us to extract a folder full of pycs, including the application logic modules we were missing earlier:
These can be decompiled with uncompyle2 (remember to get the pip install –upgrade uncompyle2 version / uncompyle6 to take apart Python 2.6 files – the apt-get install version won’t work on modern OS’es, as these pyc’s are too old).
You can find the modified PYZ unpacker script here.
Tooling – Cerbero Profiler
During this investigation, I stumbled upon Cerbero Profiler. This is the successor to CFF Explorer – I haven’t used Profiler, but I’m willing to bet that it’s worth looking into on pedigree. I intend to acquire a copy at some point, it looks very useful in binary analysis tasks.