One sentence summary
The author systematize binary disassembly through the study of nine popular, open-source tools, and try to figure out three important questions.
- Q1 – What are the algorithms and heuristics used in existing disassembly tools and how do they interact?
- Q2 – What is the coverage & accuracy of heuristic methods in comparison to algorithmic ones? Are there trade-offs?
- Q3 – What errors do existing disassembly tools make and what are the underlying causes?
Background
The disassembly of binary programs is a crucial task in reverse engineering and software security. However, some information is still lost while disassembling a binary due to the reason that heuristics methods do not offer assurances of correctiness. The past research lack of enough qulitative and quantitative study.
Method
- Inspecting the source code to conduct qualitative study, avoiding ambiguities and out-of-date information found in documentation and publications
- Applying nine tools on a corpus of 3,788 benchmark binaries, consisting of utilities, client/server programs, and popular libraries on both Linux and Windows systems.
- For the evaluation of coverage and accuracy, the author build a framework based on LLVM, GCC, the Gold Linker, and Visual Studio to automatically collect the ground truth while building the corpus.
Scope of research
Binaryies
- Have been produced with mainstream compilers and linkers
- May include hand-written assembly
- They have not been obfuscated
- Do not assume symbol availability, i.e., binaries are stripped
- Only consider X86/X64 binaries
- Run on Linux or Windows operating systems
Tools
- Designated for disassembly or have an independent module for disassembly
- Can do automated disassembly without user interactions.
- Are open source tools so that we can study their implemented strategies.
- Have unique strategies that are not fully covered by other tools.
- Can run our targeted binaries to support our quantitative evaluation.
Analysis of the tools
Evaluation
Discussion
This research have four important finding.
- Complex constructs are common and heuristics are indispensable to handle them.
- Heuristics inherently introduce coverage-correctness trade-offs.
- Tool selection should be demand-specific.
- Broader and deeper evaluation is needed for future improvements.
Reading summary
- What is the motivation?
Past research lack of a systematized study of current popular disassembler, and have not pointed out that the influences brought in by using heuristics.
- What is the novelty & contribution
- Present a thorough systematization of binary disassembly from the perspective of algorithms and heuristics.
- Developed a compiler-based framework for automated end-to-end collection of ground truth for binary disassembly and used it to compose a benchmark data set for assessing binary disassembly tools.
- Make new observations and improve the understanding of binary-disassembly strategies and tools.