A complete pipelined RV32IMA CPU with S-mode, Sv32 MMU, and peripherals — written entirely in Signal DSL. Boots Linux 6.6.0 via OpenSBI v0.9 on Verilator — both hand-written and auto-generated SystemVerilog reach the same kernel init point.
OpenSBI v0.9 — Platform: Sparkle RV32IMA SoC — ISA: rv32imasu
Linux version 6.6.0 ... #6 Thu Feb 26 06:29:23 UTC 2026
Machine model: Sparkle RV32IMA SoC
Memory: 26208K/28672K available (1279K kernel code, 465K rwdata, ...)
| Hand-written SV | Generated SV (from Lean) | |
|---|---|---|
| UART output | 3944 bytes | 5250 bytes |
| Linux boot | Kernel init | Kernel init |
| Final PC | 0xC013A9xx | 0xC013A9xx |
| Firmware test | ALL PASS | ALL PASS |
-- 122 registers in a single Signal.loop — full SoC in one function
-- State fields accessed by name via declare_signal_state macro
def rv32iSoCSimulate (firmware : BitVec 12 → BitVec 32) : Signal dom SoCState :=
Signal.loopMemo fun state => rv32iSoCWithFirmwareBody firmware state- 4-stage pipeline: IF/ID/EX/WB with hazard detection and data forwarding
- RV32IMA ISA: Full integer + M-extension (MUL/DIV/REM) + A-extension (LR.W/SC.W/AMO)
- Multi-cycle divider: Restoring division algorithm in Signal DSL
- S-mode + M-mode: Full privilege separation with trap delegation (medeleg/mideleg)
- Sv32 MMU: 4-entry TLB + hardware page table walker (PTW), megapage support
- CSRs: mstatus, mie, mtvec, mepc, mcause, mtval, satp, sstatus, stvec, sepc, scause, stval, medeleg, mideleg, mcounteren, scounteren, PMP stubs
- UART 8250: TX/RX with LSR/IER/LCR/DLL/DLM registers (Linux-compatible)
- CLINT: Machine timer with mtime/mtimecmp
- Memory: 32 MB DRAM (byte-addressable, sub-word load/store: LB/LH/LBU/LHU/SB/SH)
Ergonomic operators and macros reduce boilerplate in hardware descriptions:
open Sparkle.Core.Signal
-- Hardware equality (replaces (· == ·) <$> a <*> Signal.pure val)
let isIdle := fsmReg === (0#4)
-- Bool operators (replaces (· && ·) <$> a <*> b etc.)
let startAndIdle := start &&& isIdle
let shouldFlush := branchTaken ||| trap_taken
-- Implicit constant lifting via Coe (replaces Signal.pure)
let p3SavedNext := Signal.mux cond (true : Signal dom _) p3SavedReg
-- Hardware conditional macro (replaces deeply nested Signal.mux)
let fsmNext := hw_cond fsmReg -- default value
| startAndIdle => (1#4 : Signal dom _) -- first match wins
| stemDone => (2#4 : Signal dom _)
| stageConvDone => (3#4 : Signal dom _)
| isDone => (0#4 : Signal dom _)All features are synthesis-compatible — === expands to (· == ·) <$> a <*> b, hw_cond expands to nested Signal.mux calls, and Coe unfolds to Signal.pure.
Eliminates error-prone magic-number indices for hardware state tuples:
-- Declare state with named fields, types, and defaults
declare_signal_state BottleneckState
| fsmReg : BitVec 2 := 0#2
| residualReg : BitVec 8 := 0#8
| resultReg : BitVec 8 := 0#8
| doneReg : Bool := false
-- Access fields by name (no more projN! state 122 47)
let fsmReg := BottleneckState.fsmReg state
let residualReg := BottleneckState.residualReg stateGenerates a tuple type alias, synthesis-compatible accessor defs, default value, and Inhabited instance. The RV32 SoC uses this for all 122 registers — adding/removing a field no longer requires updating every index.
Two APIs: Signal API (drop-in replacement for loopMemo) and Streaming API (O(1) memory for long runs):
-- Signal API: same interface as loopMemo, JIT speed under the hood
let soc ← rv32iSoCJITSimulate (jitCppPath := "verilator/generated_soc_jit.cpp") (firmware := fw)
let out := soc.atTime 1000 -- SoCOutput with pc, uartValid, uartData, ...
-- Streaming API: 10M+ cycles with per-cycle callback, O(1) memory
rv32iSoCJITRun (jitCppPath := cppPath) (firmware := fw) (cycles := 10000000)
(callback := fun cycle vals => do
let out := SoCOutput.fromWireValues vals
if out.uartValid then IO.println s!"UART: {out.uartData}"
return true) -- continue- Compile C++ to shared library at runtime, load via
dlopenfrom Lean - Hash-based caching: recompilation skipped if source unchanged
- Uses stable named output wires (
_gen_pcReg,_gen_uartValidBV, etc.) — immune to DCE - 6 observable wires (via
SoCOutput.wireNames), 11 memories, 6 input ports
| Backend | Speed | vs Verilator | vs Lean |
|---|---|---|---|
| JIT evalTick (fused) | 13.0M cyc/s | 1.17x faster | ~2600x |
| JIT evalTick + 6 wires | 12.7M cyc/s | 1.14x faster | ~2540x |
| JIT eval+tick (separate) | 12.7M cyc/s | 1.14x faster | ~2540x |
| Verilator 5.044 | 11.1M cyc/s | 1.00x | ~2220x |
| CppSim (-O3 AOT) | 6.0M cyc/s | 0.54x | ~1200x |
| Lean loopMemo | ~5K cyc/s | — | 1x |
JIT exceeds Verilator speed (1.17x faster) thanks to: (1) no mutex/thread overhead (Verilator 5.x wastes 17.4% on locks even single-threaded), (2) observable wire optimization (33 class members + 321 locals, L1-cache friendly), (3) fewer CPU instructions per sim-cycle, and (4) fused evalTick() with stack-local _next variables (eliminates ~260 intermediate memory ops/cycle). See docs/BENCHMARK.md for detailed benchmark results, profiling data, and bottleneck analysis.
Detects when the CPU is stuck in a tight halt loop and skips forward by advancing the cycle counter + CLINT timer, achieving 706x effective speedup. The oracle receives the JITHandle directly for dynamic register reads, bulk memory ops (memsetWord), and direct state mutation:
import Sparkle.Core.Oracle
open Sparkle.Core.Oracle
-- Create oracle: detects PC stuck in ≤12-byte range for 50+ consecutive cycles
-- Oracle receives JITHandle per-call and handles all state mutations internally
let (oracle, statsRef) ← mkSelfLoopOracle {}
-- Run with oracle — 10M cycles complete in 9ms instead of 5.5 seconds
let cycles ← JIT.runOptimized handle 10_000_000 wireIndices oracle callback
-- Post-run: 9,998 triggers, 9,998,000 cycles skipped, UART output identical
let stats ← statsRef.get
IO.println s!"Skipped {stats.totalSkipped} cycles in {stats.triggerCount} triggers"
-- Bulk memory fill (e.g., BSS zeroing for Linux boot)
JIT.memsetWord handle memIdx addr 0 count| Metric | Without Oracle | With Oracle |
|---|---|---|
| Wall-clock time (10M cycles) | ~770 ms | 7 ms |
| Effective cyc/s | 13.0M | ~1.4 billion |
BSS-Clear Speculative Warp: A custom inline firmware (7-instruction BSS-clear loop) demonstrates the full pattern — the oracle detects the memory-clearing loop, bulk-zeros all 4 DMEM byte banks via memsetWord, and skips ~100K cycles in <1 ms (389 triggers, 99,584 cycles skipped). See Tests/RV32/JITDynamicWarpTest.lean.
Speculative Simulation with Rollback: Full-state snapshot/restore via C++ default copy constructor enables guard-and-rollback speculation. The oracle snapshots state, speculatively applies bulk updates, checks guard conditions (e.g., timer interrupt), and rolls back if the guard fails — providing bit-accurate cycle-skipping even with interrupts. See Tests/RV32/JITSpeculativeWarpTest.lean.
Also includes register snapshot/restore API (130 registers), bulk memory API, and full-state snapshot/restore for state introspection:
- Auto-generated SystemVerilog via
#writeDesign— boots Linux (5250 UART bytes at 10M cycles) - Hand-written SystemVerilog reference at
verilator/rv32i_soc.sv(3944 UART bytes at 10M cycles) - Both reach same kernel init region (PC 0xC013A9xx) — generated SV matches reference behavior
- VCD waveform tracing for debugging
- Firmware test mode + OpenSBI/Linux boot mode
# Lean simulation
lake test # Includes RV32 simulation tests
# JIT simulation from Lean (compile + load + run)
lake exe rv32-jit-test verilator/generated_soc_jit.cpp firmware/firmware.hex 5000
# JIT loop test (loopMemoJIT Signal API + Streaming API)
lake exe rv32-jit-loop-test verilator/generated_soc_jit.cpp firmware/firmware.hex 5000
# JIT cycle-skip test (register snapshot/restore roundtrip)
lake exe rv32-jit-cycle-skip-test
# JIT oracle test (self-loop detection, 10M cycles with cycle-skipping)
lake exe rv32-jit-oracle-test
# JIT dynamic warp test (memsetWord + dynamic oracle with JITHandle access)
lake exe rv32-jit-dynamic-warp-test
# JIT speculative warp test (snapshot/restore + guard-and-rollback)
lake exe rv32-jit-speculative-warp-test
# CppSim (standalone C++)
cd verilator && make build-cppsim && make run-cppsim
# Verilator simulation (much faster)
cd verilator && make build
./obj_dir/Vrv32i_soc ../firmware/firmware.hex 500000 # Firmware tests
# Linux kernel boot
./obj_dir/Vrv32i_soc ../firmware/opensbi/boot.hex 10000000 \
--dram /tmp/opensbi/build/platform/generic/firmware/fw_jump.bin \
--dtb ../firmware/opensbi/sparkle-soc.dtb \
--payload /tmp/linux/arch/riscv/boot/ImageParse existing SystemVerilog RTL (e.g., PicoRV32) into Sparkle IR and execute via JIT — no Verilator required.
picorv32.v + picorv32_soc_m.v
→ [SV Parser] Parse to SV AST (250 items, 32 always blocks)
→ [Lower] MemorySSA sequential emitter + MUX for always @*
→ [Flatten] Inline sub-modules with parameter overrides
→ [CppSim] Generate C++ simulation class
→ [GCC -O2] Compile to shared library
→ [JIT dlopen] Load and simulate from Lean
GCC-compiled bare-metal firmware runs on the transpiled PicoRV32 SoC with M-extension:
# Build RV32I firmware
cd firmware
riscv32-none-elf-gcc -march=rv32i -mabi=ilp32 -O2 -nostdlib \
-T link.ld -o firmware_rv32i.elf boot_rv32i.S main_rv32i.c
# Build RV32IM firmware (MUL/DIV/REM)
riscv32-none-elf-gcc -march=rv32im -mabi=ilp32 -O2 -nostdlib \
-T link.ld -o firmware_rv32im.elf boot_rv32i.S main_rv32im.c
# Run 34 tests via Sparkle JIT (no external dependencies)
lake exe svparser-test| Test | Expected | Actual |
|---|---|---|
| Fibonacci | 0,1,1,2,3,5,8,13,21,34 | ✓ exact match |
| Array Sum | 360 | ✓ |
| Bubble Sort | 3,8,17,42,55,99 | ✓ exact match |
| GCD | 6,25,1 | ✓ exact match |
| MUL 7*6 | 42 | ✓ |
| MUL 12345*6789 | 83810205 | ✓ |
| DIV 42/6, 100/7 | 7, 14 | ✓ |
| REM 42%6, 100%7 | 0, 2 | ✓ |
| Factorial 10! | 3628800 | ✓ |
| Store/Load 0x12345678 | 0x12345678 | ✓ |
| Final (RV32I) | 0xCAFE0000 | ALL PASSED |
| Final (RV32IM) | 0xCAFE0000 | ALL PASSED |
All tests embed Verilog as string literals — no external file dependencies:
| Tests | Description |
|---|---|
| 1-6 | Parser, lowering, round-trip, JIT simulation |
| 7-11 | PicoRV32 parse, SoC "Hello", C firmware (requires /tmp files) |
| 12-16 | IR regression: nested if/else, case(1'b1) priority, concat-LHS, for-loop SSA, array read |
| 17-21 | pcpi_mul: IR structure, SSA chain, topo sort, JIT 76/100100/12345*6789/consecutive/wrapper |
| 22-26 | JIT pairs: if/else+register, array read, case decoder, read-overwrite, for-loop accumulator |
| 27-29 | Bit replication {4{en}}, byte-lane write/read, multi-bit replication {2{din}} |
- MemorySSA Sequential Emitter: Process
always @*statements top-to-bottom, creating SSA wires for each variable write. Correctly handles read-then-overwrite patterns that cause cyclic dependencies under pure MUX. - If-Conversion (Guarded Assignments): Walk statement tree tracking guard conditions; emit flat priority mux chain with
!coveredguard for first-match-wins incase(1'b1). - Generate Block Evaluation: Resolve
generate if (PARAM)using parameter defaults + overrides. 2-pass architecture for nested hierarchies. - Concat-LHS Bit Scatter: Handle
{next_rdt[j+3], next_rd[j+:4]} = exprvia read-modify-write decomposition with__RMW_BASE__placeholder resolved bystmtsToMuxExprBlocking. - Byte-Strobe Memory: Detect
if(wstrb[N]) memory[addr][hi:lo] <= wdata[hi:lo]and generate read-modify-write with per-byte mask and proper shift. - Bit Replication:
{N{expr}}→(0-val) & maskfor 1-bit,concatfor multi-bit. - For-Loop Unrolling: Static unroll with parameter substitution. MemorySSA handles sequential dependencies natively.
- $signed Sign Extension: SHL+ASR pattern with min-32-bit types and 64-bit promotion to prevent C++ UB.
Write Verilog directly in Lean, prove properties at compile time:
import Tools.SVParser.Macro
verilog! "
module counter8_en (input clk, input rst, input en, output [7:0] count);
reg [7:0] count_reg;
assign count = count_reg;
always @(posedge clk) begin
if (rst) count_reg <= 0;
else if (en) count_reg <= count_reg + 1;
assert(rst ? (count_reg == 0) : 1); // auto-proved!
end
endmodule
"
open counter8_en.Verify
-- These theorems prove against the auto-generated nextState
theorem counter_holds (s : State) :
nextState s { rst := 0, en := 0 } = s := by simp [nextState]
-- auto_assert_0 was auto-generated and auto-proved by bv_decide
-- Change the assert condition → instant error in editorFeatures:
verilog!macro: Parse Verilog at elaboration time, inject State/Input/nextStateassert(cond): Auto-generatetheoremand prove viasimp [nextState]; bv_decide- Instant feedback: Edit Verilog, save → proofs re-check in milliseconds
- No Lean knowledge needed: Write standard Verilog assertions, get mathematical proofs