Introduction
Fuzzing is a practical, widely-deployed technique to find bugs in complex, real-world programs like JavaScript engines. Some researchers (Park et al., 2020) nowadays have conducted research on this topic and make their attention to some commercial application.
Thus, to learn how to fuzz on JS engine, firstly I need to pay my attention to a open source project. My choice is Fuzzilli. In this post, I will record the experience of learning Fuzzilli, and my attempt to implement this to Chakracore (which is a open source JSE implemented in Edge broswer).
In the whole part of introduction, I have referred some content in this blog and this slide.
What is Fuzzilli
Fuzzilli is a fuzzer for fuzzing JavaScript engines. It was developed by Samuel Groß, who graduated from Karlsruher Institut für Technologie. This tool was firstly proposed in his master thesis, and its motivation of designing this is to find bugs in JS engines (JIT compilers, for instance).
It uses an intermediate representation (IR) language called FuzzIL, which is perfectly suitable for mutation. Any program written in FuzzIL could be converted to valid JS code. Currently, the official github repo support JavaScriptCore (JS engine for safari), Jerryscript (JS engine for IoT devices), QuickJS (Light weight embeddable JS engine), Spider monkey (JS engine for Mozilla Firefox), V8 (JS engine for Google Chrome), duktape (Embeddable Javascript engine, can be regarded as light weight v8).
Terminology
Term | Description |
---|---|
REPRL(read-eval-print-reset-loop). | REPRL is a mechanism to reduce the overhead of launching a new process have been implemented. As many fuzzer architectures use Forkserver, Fuzzilli use REPRL approach, which saves the overhead caused by fork() and execution per sample could be faster. |
Coverage | LLVM provide this kind of function. Clang complier has a schema called -fsanitize-coverage=trace-pc-guard , we can receive a pointer to the start and end of the regions, which are initialized by the guard number. Whenever a branch is executed, the __sanitizer_cov_trace_pc_guard callback is called. |
Generation | The generation program has been described in the thesis. All the program stem from this simple JavaScript by cleverly applying multiple mutators: Object() |
Requirements
Valid syntax of produced samples
The main idea is to formulate JavaScript language as context-free grammar, then apply random production rules. The below script was generated by grammar-based fuzzer.
1 | ...; |
However, the exception may occur when v4.slice is not a function, which will make the following statement would not be executed. This could be solved by adding try-catch, but it will make two pretty different things for a JIT compiler.
High degree of semantic correctness
This requirement is harder to achieve than syntactical correctness. There are multiple options for this:
Precise type tracking in generated code.
Generate JavaScript code “step-by-step”, validating after each step.
Use mutational approach, only keep semantically valid samples in corpus.
Definition of sensible mutations of JavaScript code
The mutation possible at different levels, from source code level to bytecode level.
FuzzIL and mutation
As mentioned in the previous part, FuzzIL is a intermediate language, and it used to capture control flow and data flow. There are four mutation methods defined on the IL, which are operation mutator, input mutator, splice mutator and insertion mutator.
Minimization
Since mutation can only grow a program in size, minimization has been required before inserting them into the corpus. The algorithm of minimization is simple, the psuedocode is as below:
1 | remove one instruction (starting at end) |
Thus, the program will be minimized until it has no instruction could be remove without changing its behavior. The cost of this algorithm is very expensive, so there is much room for improvement here.
Guided Fuzzing
The feedback system will be plugged in to stimulate the programs have future mutations. There is a method called edge-coverage, which is similar to afl, has been implemented.
The algorithm of guided fuzzing is as below:
The architecture of fuzzilli could be as below:
Synchronization
I think this part is interesting, this process let me think about hadoop. There is a master node and a couple of worker node. At begin, the worker will begin to find something in the program independently, when one of them find some interesting, they will notify the master node.
At begin | Find some interesting | Master got the information |
---|---|---|
![]() |
![]() |
![]() |
After master node executed and checked, it would notify this to other worker node.
Master executed | Finish checking | Share information |
---|---|---|
![]() |
![]() |
![]() |
The other worker node will executed and evaluated by their own. This process can be setup with GCE and docker, and probably can implement on Kubernetes.
Worker executed and evaluated | End | Can be scaled further |
---|---|---|
![]() |
![]() |
![]() |
How it works
For more details of fuzzilli, can refer to this document.
Reference
Park, S., Xu, W., Yun, I., Jang, D., & Kim, T. (2020). Fuzzing JavaScript Engines with Aspect-preserving Mutation. 2020 IEEE Symposium on Security and Privacy (SP), 1629–1642. https://doi.org/10.1109/SP40000.2020.00067