
One-Sentence Summary
The authors systematize binary disassembly by studying nine popular open-source tools and answering three key questions.
- Q1 – What are the algorithms and heuristics used in existing disassembly tools and how do they interact?
- Q2 – What is the coverage & accuracy of heuristic methods in comparison to algorithmic ones? Are there trade-offs?
- Q3 – What errors do existing disassembly tools make and what are the underlying causes?
Background
Disassembling binary programs is a crucial task in reverse engineering and software security. However, information can still be lost during disassembly because heuristic methods do not guarantee correctness. Prior research also lacks sufficient qualitative and quantitative analysis.

Method
- Inspect the source code to conduct a qualitative study while avoiding ambiguities and outdated information in documentation and publications.
- Apply nine tools to a corpus of 3,788 benchmark binaries, including utilities, client/server programs, and popular libraries on both Linux and Windows.
- To evaluate coverage and accuracy, build a framework based on LLVM, GCC, the Gold Linker, and Visual Studio that automatically collects ground truth while building the corpus.

Scope of Research
Binaries
- Produced with mainstream compilers and linkers
- May include hand-written assembly
- Not obfuscated
- No assumption of symbol availability, i.e., binaries are stripped
- Limited to x86/x64 binaries
- Run on Linux or Windows operating systems
Tools
- Designed for disassembly or include an independent disassembly module
- Support automated disassembly without user interaction
- Are open source, which allows the implemented strategies to be studied
- Use strategies that are not fully covered by other tools
- Can process the target binaries used in the quantitative evaluation
Analysis of the Tools

Evaluation

Discussion
This research has four important findings.
- Complex constructs are common and heuristics are indispensable to handle them.
- Heuristics inherently introduce coverage-correctness trade-offs.
- Tool selection should be demand-specific.
- Broader and deeper evaluation is needed for future improvements.
Reading Summary
- What is the motivation?
Past research lacks a systematic study of currently popular disassemblers and has not clearly explained the impact of using heuristics.
- What are the novelty and contribution?
- Present a thorough systematization of binary disassembly from the perspective of algorithms and heuristics.
- Develop a compiler-based framework for automated, end-to-end collection of binary-disassembly ground truth, and use it to build a benchmark dataset for assessing disassembly tools.
- Provide new observations that improve the understanding of binary-disassembly strategies and tools.