Avast open-sources its machine-code decompiler
Threat Research

Avast open-sources its machine-code decompiler

Threat Intelligence Team, 12 December 2017

After seven years of development, Avast open-sources its machine-code decompiler for platform-independent analysis of executable files.

Avast released its analytical tool, RetDec, to help the cybersecurity community fight malicious software. The tool allows anyone to study the code of applications to see what the applications do, without running them. Let's fight the bad guys together!

As we announced in our Botconf 2017 presentation at the beginning of December (slides), RetDec, our machine-code decompiler, is now open, which means anyone can freely use it, study its source code, modify it, and redistribute it.

The goal behind open sourcing RetDec is to provide a generic tool to transform platform-specific code, such as x86/PE executable files, into a higher form of representation, such as C source code. By generic, we mean that the tool should not be limited to a single platform, but rather support a variety of platforms, including different architectures, file formats, and compilers. At Avast, RetDec is actively used for analysis of malicious samples for various platforms, such as x86/PE and ARM/ELF.

The source code of the decompiler and other related tools is now available on GitHub under the MIT license. By open-sourcing the decompiler, we would like to make its use more widespread and invite others to cooperate with us on its continued development.

What is a decompiler?

Before we dive into how RetDec works, let’s briefly explain how decompilers work, in general.

A decompiler is a program that takes an executable file as its input and attempts to transform it into a high-level representation while preserving its functionality. For example, the input file may be application.exe, and the output can be source code in a higher-level programming language, such as C. A decompiler is, therefore, the exact opposite of a compiler, which compiles source files into executable files; this is why decompilers are sometimes also called reverse compilers.

By preserving a program's functionality, we want the source code to reflect what the input program does as accurately as possible; otherwise, we risk assuming the program does one thing, when it really does another.

Generally, decompilers are unable to perfectly reconstruct original source code, due to the fact that a lot of information is lost during the compilation process. Furthermore, malware authors often use various obfuscation and anti-decompilation tricks to make the decompilation of their software as difficult as possible.

RetDec addresses the above mentioned issues by using a large set of supported architectures and file formats, as well as in-house heuristics and algorithms to decode and reconstruct applications. RetDec is also the only decompiler of its scale using a proven LLVM infrastructure and provided for free, licensed under MIT.

Decompilers can be used in a variety of situations. The most obvious is reverse engineering when searching for bugs, vulnerabilities, or analyzing malicious software. Decompilation can also be used to retrieve lost source code when comparing two executables, or to verify that a compiled program does exactly what is written in its source code.

You may have already heard about disassemblers and may think that a decompiler is basically the same thing. Wrong. There are several important differences between a decompiler and a disassembler. The former tries to reconstruct an executable file into a platform-agnostic, high-level source code, while the latter gives you low-level, platform-specific assembly instructions. The assembly output is non-portable, error-prone when modified, and requires specific knowledge about the instruction set of the target processor. Another positive aspect of decompilers is the high-level source code they produce, like  C source code, which can be read by people who know nothing about the assembly language for the particular processor being analyzed.

We would like to note that many different types of decompilers exist. RetDec is a machine-code decompiler, which means it only supports the decompilation of programs executing native processor code (e.g. for Intel x86). A machine-code decompiler is thus unable to decompile bytecode (e.g. .NET, Python, Java).

Introducing RetDec:  Avast's machine-code decompiler

RetDec is a machine-code decompiler that has been in development since 2011. It was originally created as a joint project by the  Faculty of Information Technology of the Brno University of Technology in the Czech Republic, and AVG Technologies. Since the acquisition of AVG Technologies by Avast in 2016, Avast has continued to develop the decompiler.

Avast_blog_retargetable_decompiler-1.jpg

The name RetDec stands for Retargetable Decompiler. We have already explained what a decompiler is, but what is a retargetable decompiler? We decided to give the decompiler the name because it is not limited to a single target architecture, operating system, or executable file format.

To give you an idea what the decompiler can do, let’s look at an overview of its features:

  • Supported file formats: ELF, PE, Mach-O, COFF, AR (archive), Intel HEX, and raw machine code.
  • Supported architectures (32b only): Intel x86, ARM, MIPS, PIC32, and PowerPC.
  • Static analysis of executable files with detailed information.
  • Compiler and packer detection.
  • Loading and instruction decoding.
  • Signature-based removal of statically linked library code.
  • Extraction and utilization of debugging information (DWARF, PDB).
  • Reconstruction of instruction idioms.
  • Detection and reconstruction of C++ class hierarchies (RTTI, vtables).
  • Demangling of symbols from C++ binaries (GCC, MSVC, Borland).
  • Reconstruction of functions, types, and high-level constructs.
  • Integrated disassembler.
  • Output in two high-level languages: C and a Python-like language.
  • Generation of call graphs, control-flow graphs, and various statistics.
  • IDA plugin that allows decompilation of files directly from the IDA disassembler.

That sounds great! Where can I try the decompiler?

The easiest way to try out the decompiler is via our web service. From your favorite web browser, you simply upload the executable file you want to decompile and press the decompilation button. After the decompilation finishes, you can view the results:

retargeting_decompiler_1.png

If you have IDA disassembler installed, you can use our IDA plugin to perform decompilations directly within IDA:

retargeting_decompiler_2.png

If you want a more programmatic access, you can use our REST API, which allows anyone to write applications that interact with RetDec by sending HTTP requests. The easiest (and recommended) way of using the decompiler via the API is by using retdec-python:

retargeting_decompiler_3.png

Finally, since the source code of the decompiler is available on GitHub, you can build, install, and use our decompiler directly on your PC. Currently, RetDec supports the Linux and Microsoft Windows operating systems.

We hope that you will find our decompiler useful. If you have any questions or would like to provide feedback, feel free to contact us.

For more information, we encourage you to visit the RetDec home page. More of Avast's open-sourced projects can be found on our Github page.