Manually Unpacking PyInstaller (Python 2p6)

Earlier this week, I spent some time assisting a friend in unpacking a Python 2.6 PyInstaller’ed executable, with what seemed to be a number of broken magic numbers in an effort to confuse publically available unpacking tools.

In this post, I will present my approach for unpacking these files below, together with some source code. Note that this is not the fastest way to do this (in hindsight, I could have modified existing tools to ignore magic number checks, as well as tweak a check to the PYZ table of contents – but this is the scenic route).

Analysis + Method

The target is a CGI binary for a media device, which can be found here: file_upload.

(This isn’t actually a Zip file, just trying to get around WordPress’ restrictions).

My first step is to inspect the executable: it appears to be a standard 32-bit ELF binary, with a number of string references to _MEIPASS2, which is a dead giveaway for PyInstaller. Using pyi-archive-viewer (as well as some other tools I had collected over the course of CTF’ing) doesn’t succeed:


The stack trace seems to strongly indicate that we’re not finding a magic number that we’re expecting.

My next approach was to step through the executable in a disassembler, and get a closer view on what was happening in the binary. Initial analysis showed a standard embedded Python executable:


I had some experience with embedded python programming previously, so I had some knowledge of the API’s. Broadly speaking, an embedded Python program works as follows:

  • A “Python State” is created, and modules / variables / objects are added to this (similar to “import”)
  • To “bind” native functions to be callable from Python, you use a PyMethodDef structure and bind them to a module, which you import and use from within py
  • To “bind” Python to C functions you call externally, you use PyRun_SimpleString and friends, and pass it the appropriate “state”.

Investigation of the executable led down two interesting paths: firstly, at 0x0804B4B8, there was a call to PyRun_SimpleStringFlags which appeared to take it’s input from a call to extract. Secondly, the “extract2fs” function appeared to write data to /tmp.

At this point, I started my investigation with PyRun_SimpleStringFlags: I breakpointed PyRun_SimpleStringFlags and dumped the first stack parameter, which would presumably contain the Python code:



The output file is unfortunately not too interesting:


Unfortunately, without the custom (ONELAN_*, Page_file_upload) classes, this file isn’t meaningful. My next approach was to use a similar breakpointing exercise against PyImport_ExecCodeModule, but we never imported the old modules, as “pyexpat” wasn’t available – this executable required a Python 2.6 environment, which is somewhat troublesome to set up correctly.

From here, I parked this approach and went for dumping files to disk: at first, I breakpointed extract2fs, but this didn’t trigger. Inspecting the disassembly, we quickly see why:


A bit of gdb later, to set al to 0x62 at the check above extract2fs, we are rewarded with one out1.pyc file in /tmp.

Unfortunately, we can’t extract this file – ArchiveViewer, pyi-archive-viewer and other tools fail, nothing that the table of contents is broken. A little research later, and I stumbled across this script, which I modified to disable the magic number check, and modified the Table of Contents calculation to give a sane number of files.

This new script does the trick, allowing us to extract a folder full of pycs, including the application logic modules we were missing earlier:


These can be decompiled with uncompyle2 (remember to get the pip install –upgrade uncompyle2 version / uncompyle6 to take apart Python 2.6 files – the apt-get install version won’t work on modern OS’es, as these pyc’s are too old).

You can find the modified PYZ unpacker script here.

Tooling – Cerbero Profiler

During this investigation, I stumbled upon Cerbero Profiler. This is the successor to CFF Explorer – I haven’t used Profiler, but I’m willing to bet that it’s worth looking into on pedigree. I intend to acquire a copy at some point, it looks very useful in binary analysis tasks.


About Norman

Sometimes, I write code. Occasionally, it even works.
This entry was posted in Bards, Computers, Jesting and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s