0x00 pyinstaller

今天wilson师傅说要把他的fenghuang-scan打包成可执行文件,据说用了一个叫做pyinstaller的东西,很是好奇。以前只是听说过这种东西,于是我也搞了一个试试看,不过我主要是想看看能不能将打包好的源码还原出来。

我们先来看看这个东西是怎么工作的吧,文档中有对这一部分进行介绍。https://pyinstaller.readthedocs.io/en/stable/operating-mode.html

PyInstaller reads a Python script written by you. It analyzes your code to discover every other module and library your script needs in order to execute. Then it collects copies of all those files – including the active Python interpreter! – and puts them with your script in a single folder, or optionally in a single executable file.

按照这个说法,pyinstaller会分析你的python源码,然后把需要用到的库以及当前的python interpreter复制一份到一个单独的文件夹中,也可以生成一个单独的可执行文件。看起来很厉害的样子。那么这单独一个文件是怎么工作的呢,文档里也有讲到:https://pyinstaller.readthedocs.io/en/stable/operating-mode.html#how-the-one-file-program-works

这里我简单讲一下,更详细的内容可以参考文档。打包成一个文件后,这个文件的核心其实是一个bootloader。执行的时候,会在临时文件夹下面创建一个_MEIXXXXX的文件夹,然后bootloader会把python脚本中用到的一些文件解压扔进去,大部分是so文件。接下来就是执行python脚本了,当程序结束之后会删除这个文件夹。

pyinstaller生成的临时文件

打包后的文件中不包含任何的python源代码,而是将pyc打包进去了,当然打包进去的pyc进行了一些特殊的处理,后面会讲到。

0x01 打包一个文件试试看

我这里写了个看自己IP的小程序,代码很简单:

#!/usr/bin/env python
# coding: utf-8
# file: myip.py

from __future__ import print_function

import time

import requests
from api import API

def main():
    url = API.url
    r = requests.get(url)
    print(r.content)

if __name__ == "__main__":
    # time.sleep(120)
    main()
#!/usr/bin/env python
# coding: utf-8
# file: api.py

class API(object):
    def __init__(self):
        super(API).__init__()
    url = "http://myip.ipip.net"

这里为了测试打包,故意写的稍微复杂了一些,拆成了两个文件。下面就试试合并成一个可执行文件:

# lightless @ VM-UBUNTU in ~/program/pyinstaller [22:18:27] 
$ pyinstaller -F myip.py
15 INFO: PyInstaller: 3.2.1
15 INFO: Python: 2.7.12
16 INFO: Platform: Linux-4.4.0-77-generic-x86_64-with-Ubuntu-16.04-xenial
16 INFO: wrote /home/lightless/program/pyinstaller/myip.spec
19 INFO: UPX is not available.
20 INFO: Extending PYTHONPATH with paths
['/home/lightless/program/pyinstaller', '/home/lightless/program/pyinstaller']
20 INFO: checking Analysis
24 INFO: Building because /home/lightless/program/pyinstaller/myip.py changed
24 INFO: Initializing module dependency graph...
25 INFO: Initializing module graph hooks...
57 INFO: running Analysis out00-Analysis.toc
74 INFO: Caching module hooks...
76 INFO: Analyzing /home/lightless/program/pyinstaller/myip.py
2739 INFO: Loading module hooks...
2740 INFO: Loading module hook "hook-httplib.py"...
2740 INFO: Loading module hook "hook-requests.py"...
2742 INFO: Loading module hook "hook-encodings.py"...
3012 INFO: Looking for ctypes DLLs
3058 INFO: Analyzing run-time hooks ...
3067 INFO: Looking for dynamic libraries
3272 INFO: Looking for eggs
3273 INFO: Python library not in binary depedencies. Doing additional searching...
3296 INFO: Using Python library /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
3299 INFO: Warnings written to /home/lightless/program/pyinstaller/build/myip/warnmyip.txt
3357 INFO: checking PYZ
3359 INFO: Building because toc changed
3360 INFO: Building PYZ (ZlibArchive) /home/lightless/program/pyinstaller/build/myip/out00-PYZ.pyz
3777 INFO: Building PYZ (ZlibArchive) /home/lightless/program/pyinstaller/build/myip/out00-PYZ.pyz completed successfully.
3835 INFO: checking PKG
3836 INFO: Building because /home/lightless/program/pyinstaller/build/myip/out00-PYZ.pyz changed
3836 INFO: Building PKG (CArchive) out00-PKG.pkg
5693 INFO: Building PKG (CArchive) out00-PKG.pkg completed successfully.
5699 INFO: Bootloader /usr/local/lib/python2.7/dist-packages/PyInstaller/bootloader/Linux-64bit/run
5699 INFO: checking EXE
5699 INFO: Rebuilding out00-EXE.toc because pkg is more recent
5699 INFO: Building EXE from out00-EXE.toc
5700 INFO: Appending archive to ELF section in EXE /home/lightless/program/pyinstaller/dist/myip
5728 INFO: Building EXE from out00-EXE.toc completed successfully.

# lightless @ VM-UBUNTU in ~/program/pyinstaller [22:36:53] 
$ ./dist/myip           
当前 IP:1.1.1.1  来自于:中国 浙江 杭州 联通

0x02 提取python源码

对bootloader的逆向部分以后有机会再讲,这个并不是本次讨论的重点,我们着重来看看如何从打包好的文件中提取python源码。

找了一些资料发现,pyinstaller实际上是自己维护了一个叫做PYZ格式的数据,并附加到了可执行文件的末尾,这段数据以PYZ开头。官方也似乎提供了读取这段数据的工具,具体可以看这个文件:https://github.com/pyinstaller/pyinstaller/blob/develop/PyInstaller/utils/cliutils/archive_viewer.py

把这个文件拿过来试试看:

# lightless @ VM-UBUNTU in ~/program/pyinstaller/dist [22:43:37] C:2
$ python archive_viewer.py myip 
Traceback (most recent call last):
  File "archive_viewer.py", line 266, in <module>
    run()
  File "archive_viewer.py", line 258, in run
    PyInstaller.log.__process_options(parser, args)
  File "/usr/local/lib/python2.7/dist-packages/PyInstaller/log.py", line 49, in __process_options
    logger.setLevel(level)
NameError: global name 'logger' is not defined

会发现报了迷之错误,不知道其他人怎么处理的,反正我是把log.py那个文件自己patch了一下。
archive_viewer读取文件

一共有四个可以用的命令,分别是:

U: go Up one level
O <name>: open embedded archive name
X <name>: extract name
Q: quit

列表中有两个部分需要重点关注,一个是(3626573, 1576197, 1576197, 0, 'z', u'out00-PYZ.pyz'),一个是 (13072, 301, 497, 1, 's', u'myip')
其中out00-PYZ.pyz中都是我们引用到的各种库,可以使用O命令进行查看。

现在我们把pyc文件提取出来:

? x myip
to filename? myip.pyc
? 

0x03 还原python源码

好了,从pyc还原python源码的方式数不胜数,我这里使用easypythondecompiler来搞。直接还原,发现冷酷无情的报错了。
easy python decompiler error

主要是因为:Invalid pyc/pyo file - Magic value mismatch!这个问题。我们知道,每个pyc文件都有一个magic head,这个就是前面说到的对pyc的处理,pyinstaller会把pyc的magic部分干掉,我们需要自己去补上。我这里测试的使用的python2,所以一共补8个字节,后面的4个字节是时间戳,前面的4个字节是python编译的版本。当然这个文件是我编译的,我可以知道这四个字节应该是:\x03\xf3\x0d\x0a,可是如果是反编译其他人打包好的文件,就只能查表去猜了。

这里似乎有个trick可用,我们把这个可执行文件中的系统库的pyc拿出来看看,会发现前四个字节居然存在。

struct pyc magic number

那么我们直接把这4个字节补过来,补好以后是这个样子的。

fix pyc

接着就可以拿到源码了。

python source code

同理我们再对out00-PYZ.pyz中的api部分进行同样的操作,就可以拿到api.py的源码了。

0xFF 参考文献