RV32IMA RISC-V SoC

RV32IMA RISC-V SoC — Boots Linux

A complete pipelined RV32IMA CPU with S-mode, Sv32 MMU, and peripherals — written entirely in Signal DSL. Boots Linux 6.6.0 via OpenSBI v0.9 on Verilator — both hand-written and auto-generated SystemVerilog reach the same kernel init point.

OpenSBI v0.9 — Platform: Sparkle RV32IMA SoC — ISA: rv32imasu
Linux version 6.6.0 ... #6 Thu Feb 26 06:29:23 UTC 2026
Machine model: Sparkle RV32IMA SoC
Memory: 26208K/28672K available (1279K kernel code, 465K rwdata, ...)

	Hand-written SV	Generated SV (from Lean)
UART output	3944 bytes	5250 bytes
Linux boot	Kernel init	Kernel init
Final PC	0xC013A9xx	0xC013A9xx
Firmware test	ALL PASS	ALL PASS

-- 122 registers in a single Signal.loop — full SoC in one function
-- State fields accessed by name via declare_signal_state macro
def rv32iSoCSimulate (firmware : BitVec 12 → BitVec 32) : Signal dom SoCState :=
  Signal.loopMemo fun state => rv32iSoCWithFirmwareBody firmware state

CPU Core

4-stage pipeline: IF/ID/EX/WB with hazard detection and data forwarding
RV32IMA ISA: Full integer + M-extension (MUL/DIV/REM) + A-extension (LR.W/SC.W/AMO)
Multi-cycle divider: Restoring division algorithm in Signal DSL

Privilege & Virtual Memory

S-mode + M-mode: Full privilege separation with trap delegation (medeleg/mideleg)
Sv32 MMU: 4-entry TLB + hardware page table walker (PTW), megapage support
CSRs: mstatus, mie, mtvec, mepc, mcause, mtval, satp, sstatus, stvec, sepc, scause, stval, medeleg, mideleg, mcounteren, scounteren, PMP stubs

Peripherals

UART 8250: TX/RX with LSR/IER/LCR/DLL/DLM registers (Linux-compatible)
CLINT: Machine timer with mtime/mtimecmp
Memory: 32 MB DRAM (byte-addressable, sub-word load/store: LB/LH/LBU/LHU/SB/SH)

Signal DSL Ergonomics

Ergonomic operators and macros reduce boilerplate in hardware descriptions:

open Sparkle.Core.Signal

-- Hardware equality (replaces (· == ·) <$> a <*> Signal.pure val)
let isIdle := fsmReg === (0#4)

-- Bool operators (replaces (· && ·) <$> a <*> b etc.)
let startAndIdle := start &&& isIdle
let shouldFlush  := branchTaken ||| trap_taken

-- Implicit constant lifting via Coe (replaces Signal.pure)
let p3SavedNext := Signal.mux cond (true : Signal dom _) p3SavedReg

-- Hardware conditional macro (replaces deeply nested Signal.mux)
let fsmNext := hw_cond fsmReg            -- default value
  | startAndIdle  => (1#4 : Signal dom _)  -- first match wins
  | stemDone      => (2#4 : Signal dom _)
  | stageConvDone => (3#4 : Signal dom _)
  | isDone        => (0#4 : Signal dom _)

All features are synthesis-compatible — === expands to (· == ·) <$> a <*> b, hw_cond expands to nested Signal.mux calls, and Coe unfolds to Signal.pure.

Named State Accessors (`declare_signal_state`)

Eliminates error-prone magic-number indices for hardware state tuples:

-- Declare state with named fields, types, and defaults
declare_signal_state BottleneckState
  | fsmReg      : BitVec 2   := 0#2
  | residualReg : BitVec 8   := 0#8
  | resultReg   : BitVec 8   := 0#8
  | doneReg     : Bool        := false

-- Access fields by name (no more projN! state 122 47)
let fsmReg := BottleneckState.fsmReg state
let residualReg := BottleneckState.residualReg state

Generates a tuple type alias, synthesis-compatible accessor defs, default value, and Inhabited instance. The RV32 SoC uses this for all 122 registers — adding/removing a field no longer requires updating every index.

JIT Simulation — Transparent & Fast (~200x faster than Lean)

Two APIs: Signal API (drop-in replacement for loopMemo) and Streaming API (O(1) memory for long runs):

-- Signal API: same interface as loopMemo, JIT speed under the hood
let soc ← rv32iSoCJITSimulate (jitCppPath := "verilator/generated_soc_jit.cpp") (firmware := fw)
let out := soc.atTime 1000  -- SoCOutput with pc, uartValid, uartData, ...

-- Streaming API: 10M+ cycles with per-cycle callback, O(1) memory
rv32iSoCJITRun (jitCppPath := cppPath) (firmware := fw) (cycles := 10000000)
  (callback := fun cycle vals => do
    let out := SoCOutput.fromWireValues vals
    if out.uartValid then IO.println s!"UART: {out.uartData}"
    return true)  -- continue

Compile C++ to shared library at runtime, load via dlopen from Lean
Hash-based caching: recompilation skipped if source unchanged
Uses stable named output wires (_gen_pcReg, _gen_uartValidBV, etc.) — immune to DCE
6 observable wires (via SoCOutput.wireNames), 11 memories, 6 input ports

Simulation Performance (10M cycles, Apple Silicon)

Backend	Speed	vs Verilator	vs Lean
JIT evalTick (fused)	13.0M cyc/s	1.17x faster	~2600x
JIT evalTick + 6 wires	12.7M cyc/s	1.14x faster	~2540x
JIT eval+tick (separate)	12.7M cyc/s	1.14x faster	~2540x
Verilator 5.044	11.1M cyc/s	1.00x	~2220x
CppSim (-O3 AOT)	6.0M cyc/s	0.54x	~1200x
Lean loopMemo	~5K cyc/s	—	1x

JIT exceeds Verilator speed (1.17x faster) thanks to: (1) no mutex/thread overhead (Verilator 5.x wastes 17.4% on locks even single-threaded), (2) observable wire optimization (33 class members + 321 locals, L1-cache friendly), (3) fewer CPU instructions per sim-cycle, and (4) fused evalTick() with stack-local _next variables (eliminates ~260 intermediate memory ops/cycle). See docs/BENCHMARK.md for detailed benchmark results, profiling data, and bottleneck analysis.

JIT Cycle-Skipping — Dynamic Oracle

Detects when the CPU is stuck in a tight halt loop and skips forward by advancing the cycle counter + CLINT timer, achieving 706x effective speedup. The oracle receives the JITHandle directly for dynamic register reads, bulk memory ops (memsetWord), and direct state mutation:

import Sparkle.Core.Oracle
open Sparkle.Core.Oracle

-- Create oracle: detects PC stuck in ≤12-byte range for 50+ consecutive cycles
-- Oracle receives JITHandle per-call and handles all state mutations internally
let (oracle, statsRef) ← mkSelfLoopOracle {}

-- Run with oracle — 10M cycles complete in 9ms instead of 5.5 seconds
let cycles ← JIT.runOptimized handle 10_000_000 wireIndices oracle callback

-- Post-run: 9,998 triggers, 9,998,000 cycles skipped, UART output identical
let stats ← statsRef.get
IO.println s!"Skipped {stats.totalSkipped} cycles in {stats.triggerCount} triggers"

-- Bulk memory fill (e.g., BSS zeroing for Linux boot)
JIT.memsetWord handle memIdx addr 0 count

Metric	Without Oracle	With Oracle
Wall-clock time (10M cycles)	~770 ms	7 ms
Effective cyc/s	13.0M	~1.4 billion

BSS-Clear Speculative Warp: A custom inline firmware (7-instruction BSS-clear loop) demonstrates the full pattern — the oracle detects the memory-clearing loop, bulk-zeros all 4 DMEM byte banks via memsetWord, and skips ~100K cycles in <1 ms (389 triggers, 99,584 cycles skipped). See Tests/RV32/JITDynamicWarpTest.lean.

Speculative Simulation with Rollback: Full-state snapshot/restore via C++ default copy constructor enables guard-and-rollback speculation. The oracle snapshots state, speculatively applies bulk updates, checks guard conditions (e.g., timer interrupt), and rolls back if the guard fails — providing bit-accurate cycle-skipping even with interrupts. See Tests/RV32/JITSpeculativeWarpTest.lean.

Also includes register snapshot/restore API (130 registers), bulk memory API, and full-state snapshot/restore for state introspection:

Verilator Backend (~1000x faster)

Auto-generated SystemVerilog via #writeDesign — boots Linux (5250 UART bytes at 10M cycles)
Hand-written SystemVerilog reference at verilator/rv32i_soc.sv (3944 UART bytes at 10M cycles)
Both reach same kernel init region (PC 0xC013A9xx) — generated SV matches reference behavior
VCD waveform tracing for debugging
Firmware test mode + OpenSBI/Linux boot mode

# Lean simulation
lake test  # Includes RV32 simulation tests

# JIT simulation from Lean (compile + load + run)
lake exe rv32-jit-test verilator/generated_soc_jit.cpp firmware/firmware.hex 5000

# JIT loop test (loopMemoJIT Signal API + Streaming API)
lake exe rv32-jit-loop-test verilator/generated_soc_jit.cpp firmware/firmware.hex 5000

# JIT cycle-skip test (register snapshot/restore roundtrip)
lake exe rv32-jit-cycle-skip-test

# JIT oracle test (self-loop detection, 10M cycles with cycle-skipping)
lake exe rv32-jit-oracle-test

# JIT dynamic warp test (memsetWord + dynamic oracle with JITHandle access)
lake exe rv32-jit-dynamic-warp-test

# JIT speculative warp test (snapshot/restore + guard-and-rollback)
lake exe rv32-jit-speculative-warp-test

# CppSim (standalone C++)
cd verilator && make build-cppsim && make run-cppsim

# Verilator simulation (much faster)
cd verilator && make build
./obj_dir/Vrv32i_soc ../firmware/firmware.hex 500000  # Firmware tests

# Linux kernel boot
./obj_dir/Vrv32i_soc ../firmware/opensbi/boot.hex 10000000 \
    --dram /tmp/opensbi/build/platform/generic/firmware/fw_jump.bin \
    --dtb ../firmware/opensbi/sparkle-soc.dtb \
    --payload /tmp/linux/arch/riscv/boot/Image

SV→Sparkle Transpiler {#sv-transpiler}

Parse existing SystemVerilog RTL (e.g., PicoRV32) into Sparkle IR and execute via JIT — no Verilator required.

Pipeline

picorv32.v + picorv32_soc_m.v
    → [SV Parser]    Parse to SV AST (250 items, 32 always blocks)
    → [Lower]         MemorySSA sequential emitter + MUX for always @*
    → [Flatten]       Inline sub-modules with parameter overrides
    → [CppSim]        Generate C++ simulation class
    → [GCC -O2]       Compile to shared library
    → [JIT dlopen]    Load and simulate from Lean

C Firmware Execution

GCC-compiled bare-metal firmware runs on the transpiled PicoRV32 SoC with M-extension:

# Build RV32I firmware
cd firmware
riscv32-none-elf-gcc -march=rv32i -mabi=ilp32 -O2 -nostdlib \
    -T link.ld -o firmware_rv32i.elf boot_rv32i.S main_rv32i.c

# Build RV32IM firmware (MUL/DIV/REM)
riscv32-none-elf-gcc -march=rv32im -mabi=ilp32 -O2 -nostdlib \
    -T link.ld -o firmware_rv32im.elf boot_rv32i.S main_rv32im.c

# Run 34 tests via Sparkle JIT (no external dependencies)
lake exe svparser-test

Test	Expected	Actual
Fibonacci	0,1,1,2,3,5,8,13,21,34	✓ exact match
Array Sum	360	✓
Bubble Sort	3,8,17,42,55,99	✓ exact match
GCD	6,25,1	✓ exact match
MUL 7*6	42	✓
MUL 12345*6789	83810205	✓
DIV 42/6, 100/7	7, 14	✓
REM 42%6, 100%7	0, 2	✓
Factorial 10!	3628800	✓
Store/Load 0x12345678	0x12345678	✓
Final (RV32I)	0xCAFE0000	ALL PASSED
Final (RV32IM)	0xCAFE0000	ALL PASSED

34 CI-Safe JIT Tests

All tests embed Verilog as string literals — no external file dependencies:

Tests	Description
1-6	Parser, lowering, round-trip, JIT simulation
7-11	PicoRV32 parse, SoC "Hello", C firmware (requires `/tmp` files)
12-16	IR regression: nested if/else, case(1'b1) priority, concat-LHS, for-loop SSA, array read
17-21	pcpi_mul: IR structure, SSA chain, topo sort, JIT 76/100100/12345*6789/consecutive/wrapper
22-26	JIT pairs: if/else+register, array read, case decoder, read-overwrite, for-loop accumulator
27-29	Bit replication `{4{en}}`, byte-lane write/read, multi-bit replication `{2{din}}`

Key Algorithms

MemorySSA Sequential Emitter: Process always @* statements top-to-bottom, creating SSA wires for each variable write. Correctly handles read-then-overwrite patterns that cause cyclic dependencies under pure MUX.
If-Conversion (Guarded Assignments): Walk statement tree tracking guard conditions; emit flat priority mux chain with !covered guard for first-match-wins in case(1'b1).
Generate Block Evaluation: Resolve generate if (PARAM) using parameter defaults + overrides. 2-pass architecture for nested hierarchies.
Concat-LHS Bit Scatter: Handle {next_rdt[j+3], next_rd[j+:4]} = expr via read-modify-write decomposition with __RMW_BASE__ placeholder resolved by stmtsToMuxExprBlocking.
Byte-Strobe Memory: Detect if(wstrb[N]) memory[addr][hi:lo] <= wdata[hi:lo] and generate read-modify-write with per-byte mask and proper shift.
Bit Replication: {N{expr}} → (0-val) & mask for 1-bit, concat for multi-bit.
For-Loop Unrolling: Static unroll with parameter substitution. MemorySSA handles sequential dependencies natively.
$signed Sign Extension: SHL+ASR pattern with min-32-bit types and 64-bit promotion to prevent C++ UB.

Inline Formal Verification (`verilog!` Macro)

Write Verilog directly in Lean, prove properties at compile time:

import Tools.SVParser.Macro

verilog! "
module counter8_en (input clk, input rst, input en, output [7:0] count);
    reg [7:0] count_reg;
    assign count = count_reg;
    always @(posedge clk) begin
        if (rst) count_reg <= 0;
        else if (en) count_reg <= count_reg + 1;
        assert(rst ? (count_reg == 0) : 1);  // auto-proved!
    end
endmodule
"

open counter8_en.Verify

-- These theorems prove against the auto-generated nextState
theorem counter_holds (s : State) :
    nextState s { rst := 0, en := 0 } = s := by simp [nextState]

-- auto_assert_0 was auto-generated and auto-proved by bv_decide
-- Change the assert condition → instant error in editor

Features:

verilog! macro: Parse Verilog at elaboration time, inject State/Input/nextState
assert(cond): Auto-generate theorem and prove via simp [nextState]; bv_decide
Instant feedback: Edit Verilog, save → proofs re-check in milliseconds
No Lean knowledge needed: Write standard Verilog assertions, get mathematical proofs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RV32IMA RISC-V SoC