Tool Learning 0 - Basic Knowledge of Fuzzilli

Introduction

Fuzzing is a practical, widely deployed technique for finding bugs in complex, real-world programs such as JavaScript engines. Recent research, including Park et al. (2020), has explored this topic and extended it to commercial applications.

To learn how to fuzz a JavaScript engine, I first focused on an open-source project: Fuzzilli. In this post, I record my experience learning Fuzzilli and my attempt to apply it to ChakraCore, the open-source JavaScript engine once used in the Edge browser.

For this introduction, I referred to content from this blog post and these slides.

What Is Fuzzilli?

Fuzzilli is a fuzzer for JavaScript engines. It was developed by Samuel Groß, who graduated from the Karlsruhe Institute of Technology. The tool was first proposed in his master's thesis, and his motivation was to find bugs in JavaScript engines, especially JIT compilers.

Fuzzilli uses an intermediate representation (IR) language called FuzzIL, which is well suited for mutation. Any program written in FuzzIL can be converted into valid JavaScript code. The official GitHub repository currently supports JavaScriptCore (the JavaScript engine for Safari), JerryScript (a JavaScript engine for IoT devices), QuickJS (a lightweight embeddable JavaScript engine), SpiderMonkey (the JavaScript engine for Mozilla Firefox), V8 (the JavaScript engine for Google Chrome), and Duktape (an embeddable lightweight JavaScript engine).

Terminology

Term	Description
REPRL(read-eval-print-reset-loop).	REPRL is a mechanism to reduce the overhead of launching a new process. As many fuzzer architectures use a forkserver, Fuzzilli uses the REPRL approach, which saves the overhead caused by fork() so that execution per sample can be faster.
Coverage	LLVM provides this kind of function. The Clang compiler has an option called `-fsanitize-coverage=trace-pc-guard`. With it, we can receive a pointer to the `start` and `end` of the regions, which are initialized by the guard number. Whenever a branch is executed, the `__sanitizer_cov_trace_pc_guard` callback is called.
Generation	The generation program is described in the thesis. All generated programs start from this simple JavaScript expression and grow through a sequence of mutators: `Object()`

Requirements

Valid syntax of produced samples

The main idea is to model JavaScript as a context-free grammar and then apply random production rules. The script below was generated by a grammar-based fuzzer.

...;
var v4 = new Array(42, v3, “foobar”);
for (var v5 = 0; v5 < 1000; v5++) {    
    v4 = v5 * 7;    
    var v6 = v4.slice(v1, v1, v2);
    }
...;

However, an exception may occur when v4.slice is not a function, which prevents the following statement from executing. This could be solved by adding try-catch, but that would mean something very different for a JIT compiler.

High degree of semantic correctness

This requirement is harder to achieve than syntactic correctness. There are multiple options:

Precise type tracking in generated code.

Generate JavaScript code step by step, validating after each step.

Use a mutation-based approach and keep only semantically valid samples in the corpus.

Definition of sensible mutations of JavaScript code

Mutation is possible at different levels, from the source code level to the bytecode level.

FuzzIL and Mutation

As mentioned above, FuzzIL is an intermediate language used to capture control flow and data flow. Four mutation methods are defined on the IL: operation mutator, input mutator, splice mutator, and insertion mutator.

Minimization

Since mutation can only grow a program in size, minimization is required before inserting programs into the corpus. The minimization algorithm is simple; the pseudocode is shown below:

remove one instruction (starting at end)

if the program still valid:

	execute the program

	if behavior (crash, new edges) has changed:

		restore instruction

	else if the behavior not change :

		continue to remove next instruction

else if the program is not valid:

	restore instruction

Thus, the program is minimized until no instruction can be removed without changing its behavior. This algorithm is very expensive, so there is much room for improvement here.

Guided Fuzzing

The feedback system is plugged in to encourage programs to undergo further mutations. A method called edge-coverage, which is similar to AFL, has been implemented.

The guided-fuzzing algorithm is shown below:

The architecture of Fuzzilli is shown below:

Synchronization

I find this part interesting because the process reminds me of Hadoop. There is a master node and several worker nodes. At the beginning, the workers explore the program independently. When one of them finds something interesting, it notifies the master node.

Start	Find something interesting	Master receives the information

After the master node executes and checks the sample, it notifies the other worker nodes.

Master executes	Checking finishes	Share information

The other worker nodes then execute and evaluate the sample on their own. This process can be set up with GCE and Docker, and it could probably be implemented on Kubernetes.

Worker executes and evaluates	End	Can be scaled further

How It Works

For more details about Fuzzilli, refer to this document.

Reference

Park, S., Xu, W., Yun, I., Jang, D., & Kim, T. (2020). Fuzzing JavaScript Engines with Aspect-preserving Mutation. 2020 IEEE Symposium on Security and Privacy (SP), 1629–1642. https://doi.org/10.1109/SP40000.2020.00067