A 16-bit RISC CPU implemented in Sparkle HDL to demonstrate hardware design with formal verification.
Sparkle-16 is a simplified 16-bit RISC architecture designed for clarity and ease of formal verification.
- Data Width: 16 bits
- Address Width: 16 bits (64KB address space)
- Registers: 8 general-purpose registers (R0-R7)
- R0 is hardwired to 0 (zero register)
- R1-R7 are writable
- Architecture: Harvard (separate instruction and data memory)
- Execution: Single-cycle (simplified, no pipelining)
| Instruction | Encoding | Description |
|---|---|---|
LDI Rd, Imm8 |
000_ddd_iiiiiiii_pp |
Load 8-bit immediate into Rd (zero-extended) |
ADD Rd, Rs1, Rs2 |
001_ddd_sss_ttt_pppp |
Rd = Rs1 + Rs2 |
SUB Rd, Rs1, Rs2 |
010_ddd_sss_ttt_pppp |
Rd = Rs1 - Rs2 |
AND Rd, Rs1, Rs2 |
011_ddd_sss_ttt_pppp |
Rd = Rs1 & Rs2 |
LD Rd, [Rs1] |
100_ddd_sss_ppppppp |
Load word from memory[Rs1] |
ST [Rs1], Rs2 |
101_sss_ttt_ppppppp |
Store Rs2 to memory[Rs1] |
BEQ Rs1, Rs2, Offset |
110_sss_ttt_ooooooo |
Branch to PC+Offset if Rs1==Rs2 |
JMP Addr |
111_aaaaaaaaaaaaa |
Jump to absolute address (13-bit) |
Legend: ooo=opcode, ddd=Rd, sss=Rs1, ttt=Rs2, iii=immediate, ppp=padding
Defines the instruction set architecture with:
Instructioninductive type for all 8 instructionsencode : Instruction → BitVec 16- Convert instruction to machine codedecode : BitVec 16 → Option Instruction- Parse machine code- Helper functions:
isBranch,writesRegister,destReg?
Example:
open Sparkle16
-- Create an instruction
let instr := Instruction.LDI ⟨1, by omega⟩ 0x42
-- Encode to machine code
let encoded := Instruction.encode instr -- Returns BitVec 16
-- Decode back
let decoded := Instruction.decode encoded -- Returns some instr
-- Display
#eval instr.toString -- "LDI R1, 66"Arithmetic Logic Unit performing:
- ADD: 16-bit addition
- SUB: 16-bit subtraction
- AND: 16-bit bitwise AND
Interface:
module ALU (
input logic [2:0] opcode,
input logic [15:0] a, b,
output logic [15:0] result,
output logic zero
);Features:
- Combinational logic (no registers)
- All operations computed in parallel
- Mux selects result based on opcode
- Zero flag indicates result == 0
Generate Verilog:
lake env lean --run Examples/Sparkle16/ALU.lean8-register file with special R0 behavior:
Interface:
module RegisterFile (
input logic clk, rst,
input logic we, // Write enable
input logic [2:0] rd_addr, wr_addr,
input logic [15:0] wr_data,
output logic [15:0] rd_data
);Features:
- R0 hardwired to 0 (cannot be written)
- R1-R7 are 16-bit writable registers
- Single read port (asynchronous)
- Single write port (synchronous on clk)
- Reset clears all registers to 0
Implementation:
- Each register (R1-R7) has conditional write enable
- 8-way mux tree for read selection
- Proper feedback to maintain register state
Generate Verilog:
lake env lean --run Examples/Sparkle16/RegisterFile.leanComplete CPU implementation integrating all components:
Features:
- Fetch-Decode-Execute state machine
- 3-phase pipeline: Fetch, Decode, Execute
- Harvard architecture: Separate instruction and data memory
- Single-cycle execution: Simplified design for clarity
- Memory interface: SimMemory for simulation
- Full ISA support: All 8 instructions implemented
CPU State:
structure CPUState where
pc : Word -- Program counter
regs : Fin 8 → Word -- Register file (functional)
phase : Phase -- Current execution phase
instr : Option Instruction -- Currently decoded instruction
memData : Word -- Data from memory loadExecution Phases:
- Fetch: Read instruction from instruction memory at PC
- Decode: Parse instruction fields (opcode, registers, immediates)
- Execute: Perform operation, update registers, PC, or memory
Run CPU Simulation:
lake env lean --run Examples/Sparkle16/Core.leanExample Output:
=== Sparkle-16 CPU Core ===
Example 1: Simple arithmetic
Program:
LDI R1, 10
LDI R2, 20
ADD R3, R1, R2
SUB R4, R3, R1
After 12 cycles:
PC=4 Phase=Fetch Instr=None
R0=0x0000#16 R1=0x000a#16 R2=0x0014#16 R3=0x001e#16
R4=0x0014#16 R5=0x0000#16 R6=0x0000#16 R7=0x0000#16
✓ All values correct!
Implementation Details:
- Pure functional execution semantics (
executeInstruction) - Separate memory handling for LD/ST instructions
- R0 always reads as 0, writes ignored
- Branch instructions update PC directly
- Jump instruction sets PC to absolute address
Provides both simulation and synthesis-level memory:
SimMemory (for testing):
- Functional array model (256 words × 16 bits)
- Synchronous read/write operations
- Address wrapping for bounds safety
Hardware Modules (for synthesis):
instructionMemoryModule: Read-only SRAM for instructionsdataMemoryModule: Read/write SRAM for data- Uses vendor SRAM primitives (SRAM_256x16)
Run Memory Tests:
lake env lean --run Examples/Sparkle16/Memory.leanFundamental lemmas about BitVec operations:
- Commutativity:
bitvec_add_comm,bitvec_and_comm, etc. - Associativity:
bitvec_add_assoc,bitvec_and_assoc, etc. - Identity elements:
bitvec_add_zero,bitvec_or_zero, etc. - De Morgan's Laws:
bitvec_not_and,bitvec_not_or - Bit manipulations: Extensions, truncations, shifts
Correctness proofs for ALU operations:
Proven Theorems:
alu_add_correct: ADD operation matches mathematical additionalu_sub_correct: SUB operation matches mathematical subtractionalu_and_correct: AND operation matches bitwise ANDalu_zero_flag_correct: Zero flag is set iff result is zeroalu_deterministic: Same inputs always produce same outputalu_add_comm: Addition is commutativealu_add_assoc: Addition is associativealu_and_comm: AND is commutativealu_preserves_width: All operations preserve 16-bit width
Example:
import Sparkle.Verification.ALUProps
-- Prove that ADD is correct
theorem my_add_proof (a b : BitVec 16) :
aluSpec .ADD a b = a + b := by
apply alu_add_correctISA correctness proofs for instruction encoding/decoding:
Proven Theorems:
opcode_encode_decode: Opcode encoding is bijective (all opcodes roundtrip)opcode_decode_total: Every 3-bit value maps to an opcodewrites_has_dest: Instructions that write have a destination registerbranch_no_write: Branch instructions don't write to registersencode_preserves_dest: Encoding preserves destination registersencode_decode_id: Full instruction encode/decode roundtrip (in progress)
Example:
import Sparkle.Verification.ISAProps
-- Prove opcode roundtrip
example : Opcode.fromBitVec (Opcode.toBitVec Opcode.ADD) = some Opcode.ADD :=
opcode_encode_decode Opcode.ADD
-- Prove branch instructions don't write
example (instr : Instruction) (h : instr.isBranch = true) :
instr.writesRegister = false :=
branch_no_write instr hTest ISA Proofs:
lake env lean --run Examples/Sparkle16/ISAProofTests.lean- Lean 4 (v4.26.0 or later)
- Lake (Lean build tool)
# Build entire project
lake build
# Build and test ALU
lake env lean --run Examples/Sparkle16/ALU.lean
# Build and test RegisterFile
lake env lean --run Examples/Sparkle16/RegisterFile.lean
# Build and test Memory
lake env lean --run Examples/Sparkle16/Memory.lean
# Run CPU Core simulation
lake env lean --run Examples/Sparkle16/Core.leanRunning the examples generates SystemVerilog files:
ALU.sv- Arithmetic Logic UnitRegisterFile.sv- 8-register fileInstructionMemory.sv- Instruction memory (SRAM)DataMemory.sv- Data memory (SRAM)
These files are synthesizable and can be used in FPGA or ASIC flows.
Examples/Sparkle16/
├── ISA.lean # Instruction set definitions
├── ALU.lean # Arithmetic Logic Unit
├── RegisterFile.lean # 8-register file
├── Memory.lean # Memory interface (instruction & data)
├── Core.lean # Complete CPU core with state machine
├── ISAProofTests.lean # ISA correctness proof tests
└── README.md # This file
Sparkle/Verification/
├── Basic.lean # Fundamental BitVec lemmas
├── ALUProps.lean # ALU correctness proofs
└── ISAProps.lean # ISA encoding/decoding correctness
- Simplicity: Single-cycle execution, no complex pipelining
- Formal Verification: Proofs alongside implementation
- Modularity: Components can be tested independently
- Clear Mapping: Hardware directly corresponds to Lean code
- Type Safety: Lean's type system prevents many HDL bugs
- Test Programs: More comprehensive assembly programs for validation
- ISA Correctness Proofs: Encode/decode bijectivity, instruction semantics
- State Machine Proofs: PC increment correctness, phase transitions
- Hardware Synthesis: Generate complete CPU Verilog (currently simulation-only)
- More Instructions: OR, XOR, SLT (set less than), shifts, rotates
- Branch Prediction: Simple static or dynamic prediction
- Pipeline: Multi-stage pipeline with hazard detection
- Interrupts: Basic interrupt handling and exception support
- Cache: Simple instruction/data caches
- ✅ Prove ALU operations are correct
- ⏳ Prove R0 always reads as 0
- ⏳ Prove PC increments correctly (except branches)
- ⏳ Prove instruction decode is bijective
- ⏳ Prove register writes are isolated (no crosstalk)
- ⏳ Prove memory safety (bounds checking)
- ⏳ Prove state machine invariants hold
import Sparkle16.ISA
open Sparkle16
-- LDI R1, 42
let ldi := Instruction.LDI ⟨1, by omega⟩ 42
let encoded := Instruction.encode ldi
-- encoded = 0b000_001_00101010_00
-- ADD R2, R1, R3
let add := Instruction.ADD ⟨2, by omega⟩ ⟨1, by omega⟩ ⟨3, by omega⟩
#eval Instruction.encode add
-- BEQ R1, R2, 5
let beq := Instruction.BEQ ⟨1, by omega⟩ ⟨2, by omega⟩ 5
#eval beq.toString -- "BEQ R1, R2, 5"import Sparkle.Verification.ALUProps
open Sparkle.Verification.ALUProps
-- Prove commutativity
example (a b : BitVec 16) : aluSpec .ADD a b = aluSpec .ADD b a := by
apply alu_add_comm
-- Prove associativity
example (a b c : BitVec 16) :
aluSpec .ADD (aluSpec .ADD a b) c = aluSpec .ADD a (aluSpec .ADD b c) := by
apply alu_add_assocThis is an educational project demonstrating:
- Functional HDL design in Lean 4
- Formal verification of hardware
- Hardware synthesis to SystemVerilog
- Type-safe hardware development
Contributions welcome! Areas of interest:
- Additional CPU instructions
- Pipelining
- Memory system
- More formal proofs
- Test programs
- Documentation
- Sparkle HDL Documentation
- RISC-V Specification (inspiration)
- Computer Organization and Design by Patterson & Hennessy
MIT
Status: CPU Core Complete ✓
Completed Components:
- ✅ ISA defined with encode/decode
- ✅ ALU implemented with formal correctness proofs
- ✅ RegisterFile implemented with R0=0 invariant
- ✅ Memory interface (simulation and hardware modules)
- ✅ CPU Core with fetch-decode-execute state machine
- ✅ Verification framework with BitVec lemmas
- ✅ Example programs demonstrating all instruction types
Verification Status:
- ✅ ALU correctness proven (9 theorems)
- ✅ ISA opcode correctness (encode/decode bijection)
- ✅ ISA instruction classification (branches, register writes)
- ⏳ Full instruction encode/decode bijectivity (in progress)
- ⏳ State machine invariants
- ⏳ R0=0 invariant
- ⏳ Memory safety proofs
Performance-optimized simulation using proven temporal properties to skip stable periods.
Run it:
lake env lean --run Examples/Sparkle16/CycleSkipping.leanExample scenarios:
-
Reset Sequence (2x speedup)
- CPU stays at initial state for 100 cycles
- Proven stable, skipped in single evaluation
-
Busy-Wait Loop (1.5x speedup)
- Polling loop waiting for flag
- PC oscillates predictably
-
NOP Sled (3x speedup)
- 200 NOP instructions with stable registers
- Only PC increments, all else constant
-
Full CPU Simulation
- Multiple optimization phases combined
- Reset + Initialization + Wait states
- Demonstrates realistic CPU workload
Performance Summary:
┌─────────────────────┬───────────┬──────────────┐
│ Scenario │ Cycles │ Speedup │
├─────────────────────┼───────────┼──────────────┤
│ Reset sequence │ 100 │ 2x │
│ Initialization │ 100 │ 2x │
│ Busy-wait polling │ 50 │ 1.5x │
│ NOP sled │ 200 │ 2.5x │
│ Wait states │ 200 │ 2.5x │
└─────────────────────┴───────────┴──────────────┘
Key Features:
- Correctness proven via temporal logic
- Zero accuracy loss vs standard simulation
- Typical speedup: 2-3x for real CPU workloads
- Soundness guaranteed by Lean's type system
Next Steps:
- Complete full instruction encode/decode roundtrip proofs
- State machine verification (phase transitions, PC increment)
- R0=0 invariant proof
- Hardware synthesis for complete CPU
- More comprehensive test programs