Architecture Overview
The bf-jit system follows a modular architecture with a unified frontend that can target multiple backends:
Brainfuck Source -> Tokenizer -> Intermediate Representation
|
v
+----------------+----------------+-----------------+
| | | |
Interpreter Assembler ELF Compiler JIT Compiler
| | | |
treewalk x86 assembly Standalone binary Native execution
Tokenizer and Parser
The system begins with a generic tokenizer that processes Brainfuck source code into tokens representing the eight Brainfuck commands: >
, <
, +
, -
, .
, ,
, [
, and ]
. Repeated operations of the same type are compressed into a single token with repeated metadata (aside from loop boundaries)
1. Interpreter
The interpreter is the simplest backend, executing Brainfuck commands directly:
- Maintains a 30,000-byte tape buffer
- Uses a pc pointer and data pointer
- Directly runs through a loop moving the pc
2. Assembler
The assembler backend generates x86-64 assembly code from the Brainfuck IR:
- Maps Brainfuck operations to equivalent x86-64 instructions
- Creates a 30,000 byte buffer
tape
in the bss section
- Uses labels to manage looping
- Outputs a .s file that can be compiled with a tool like
clang
3. ELF64 Compiler
The most complex static compilation backend generates complete ELF64 executables:
- Manually constructs ELF headers, program headers, and section headers from a raw byte buffer
- Generates proper entry point and 30k byte buffer in bss
- Similar to the assembler, maps each instruction to the x86-64 instruction(s). Only this time it's raw machine code.
- To deal with loops, keeps a count of how many bytes each instruction is and does a relatie jump by that byte count in between
- Outputs a raw executable file
4. JIT Compiler
The JIT compiler provides the speed of the compiler at the convenience level of the interpreter:
- Allocates executable memory pages using
mmap
with PROT_EXEC
- Maps 30k byte buffer in program space before executing the
mmap
ed and just passes a pointer to it
- Emits raw x86-64 machine code directly to memory in the same way the compiler does
- On finish, treats the memory like a function pointer and executes it directly