Paper Reading 1 - All You Ever Wanted to Know About x86/x64 Binary Disassembly But Were Afraid to Ask

One sentence summary

The author systematize binary disassembly through the study of nine popular, open-source tools, and try to figure out three important questions.

  • Q1What are the algorithms and heuristics used in existing disassembly tools and how do they interact?
  • Q2What is the coverage & accuracy of heuristic methods in comparison to algorithmic ones? Are there trade-offs?
  • Q3What errors do existing disassembly tools make and what are the underlying causes?

Background

The disassembly of binary programs is a crucial task in reverse engineering and software security. However, some information is still lost while disassembling a binary due to the reason that heuristics methods do not offer assurances of correctiness. The past research lack of enough qulitative and quantitative study.

Method

  • Inspecting the source code to conduct qualitative study, avoiding ambiguities and out-of-date information found in documentation and publications
  • Applying nine tools on a corpus of 3,788 benchmark binaries, consisting of utilities, client/server programs, and popular libraries on both Linux and Windows systems.
  • For the evaluation of coverage and accuracy, the author build a framework based on LLVM, GCC, the Gold Linker, and Visual Studio to automatically collect the ground truth while building the corpus.

Scope of research

Binaryies

  • Have been produced with mainstream compilers and linkers
  • May include hand-written assembly
  • They have not been obfuscated
  • Do not assume symbol availability, i.e., binaries are stripped
  • Only consider X86/X64 binaries
  • Run on Linux or Windows operating systems

Tools

  • Designated for disassembly or have an independent module for disassembly
  • Can do automated disassembly without user interactions.
  • Are open source tools so that we can study their implemented strategies.
  • Have unique strategies that are not fully covered by other tools.
  • Can run our targeted binaries to support our quantitative evaluation.

Analysis of the tools

Evaluation

Discussion

This research have four important finding.

  • Complex constructs are common and heuristics are indispensable to handle them.
  • Heuristics inherently introduce coverage-correctness trade-offs.
  • Tool selection should be demand-specific.
  • Broader and deeper evaluation is needed for future improvements.

Reading summary

  • What is the motivation?

Past research lack of a systematized study of current popular disassembler, and have not pointed out that the influences brought in by using heuristics.

  • What is the novelty & contribution
  1. Present a thorough systematization of binary disassembly from the perspective of algorithms and heuristics.
  2. Developed a compiler-based framework for automated end-to-end collection of ground truth for binary disassembly and used it to compose a benchmark data set for assessing binary disassembly tools.
  3. Make new observations and improve the understanding of binary-disassembly strategies and tools.