Architecture Overview
The TAC compiler follows a traditional multi-stage compilation pipeline:
Source Code → Tokenizer → Token Stream → Parser → AST → TAC Generator → TAC Stream → Register Allocator → TAC Stream w/ Context → Assembly
The Tokenizer and Parser are both nothing amazing, just thrown together so I could get down to the new stuff I hadn't touched yet:
1. Three Address Code Generation
The IR generator transforms the AST into Three Address Code (TAC) intermediate representation:
- Converts AST expressions into sequences of simple operations with at most three operands
- Introduces temporary variables to hold intermediate computation results
- Produces a linear sequence of
ThreeAddressCode instructions
- Each TAC instruction has an operator and up to two operands (literals, references, or variables)
2. Register Allocation
The register allocator maps virtual temporaries to physical CPU registers using a graph coloring approach:
- Analyzes liveness of values across the instruction stream
- Uses 14 available x86-64 registers (rax, rbx, rcx, rdx, rsi, rdi, r8-r15)
- Implements a simplified version of linear scan register allocation
- Computes live ranges and allocates registers based on instruction dependencies
- Spills to memory when registers are exhausted (though with 14 registers this is rare for the simple cases I was testing)
3. Assembly Generation
The assembler translates register-allocated TAC instructions to NASM-compatible x86-64 assembly:
- Maps TAC operations to their x86-64
- Generates proper instruction sequences with register operands
- Creates a complete NASM assembly file with proper sections and system call exit
Overall, this project was a great start in the world of compilers, bridging me further from AST land to an optimized binary. I'm looking forward to my next step towards this world ;)