Feature/rv32 by junjihashimoto · Pull Request #12 · Verilean/sparkle

junjihashimoto · 2026-03-26T02:08:46Z

No description provided.

Add Sparkle/Verification/RV32Props.lean with 36 formally proved theorems covering: - Opcode encode/decode roundtrip + injectivity (2) - ALUOp encode/decode roundtrip + injectivity (2) - R-type field extraction: opcode, rd, funct3, rs1, rs2, funct7 (6) - Immediate roundtrip: I-type, U-type, S-type (3) - ALU algebraic properties: AND-zero, SUB=ADD+neg, SLT/SLTU irrefl, SRL/SRA-zero identity (6) - Instruction format classification: totality + 6 specific (7) - Decoder control signals: LOAD/STORE/ALU/BRANCH/JAL/LUI (8) - Branch evaluation: BEQ/BNE reflexivity (2) All proofs use cases/simp/native_decide/bv_decide — no sorry.

Add encodeB/encodeJ helpers and prove the two trickiest immediate format roundtrips in RV32I: - b_type_imm_roundtrip: {inst[31], inst[7], inst[30:25], inst[11:8], 0} correctly encodes and decodes with sign extension - j_type_imm_roundtrip: {inst[31], inst[19:12], inst[20], inst[30:21], 0} correctly encodes and decodes with sign extension Both require the constraint that bit 0 = 0 (halfword alignment). Proved via bv_decide (SAT-based BitVec decision procedure). Total: 38 theorems, zero sorry.

Add Sparkle/Verification/PipelineProps.lean with pure Lean models of the pipeline's critical invariants that guard against the class of bugs found during Linux boot development: Forwarding (5): - forward_never_x0: WB→EX bypass never activates for x0 - forward_implies_match: active forwarding implies address match - forward_implies_wb_en: active forwarding implies WB enabled - wb_disable_x0: x0 destination disables write-back - wb_disable_no_write: regWrite=false disables write-back Hazard Detection (5): - no_hazard_no_load: no stall without load in EX - no_hazard_x0_dest: no stall for x0 destination - no_hazard_no_match: no stall when registers don't match - hazard_rs1_match: stall when load rd = ID rs1 - hazard_rs2_match: stall when load rd = ID rs2 Flush/NOP (5): - branch/jump/trap each cause flush - no_flush_idle: idle state doesn't flush - flush_comm: flush conditions are commutative x0 Invariance (3): - x0_always_zero: reading x0 always returns 0 - forward_priority: forwarding beats register file - regfile_when_no_fwd: register file used without forwarding Store-to-Load Forwarding (5): - disabled/match/no_match/byte_offset/adjacent_no_match Decoder Invariants (3): - load_alu_is_add, store_no_regwrite, branch_no_regwrite

Add Sparkle/Verification/CSRProps.lean with pure Lean models of CSR operations that formally demonstrate specification violations: BUGS FOUND (proved as theorems): - mstatus_wpri_bug_csrrs: CSRRS sets reserved WPRI bits (bit 1) - mstatus_wpri_bug_csrrw: CSRRW overwrites all WPRI bits - mstatus_all_bits_writable_bug: CSRRS 0xFFFFFFFF sets all 32 bits but only bits 3,7,11,12 (MIE/MPIE/MPP) should be writable - csrrs_zero_value_correct_but_write_not_suppressed: hardware csrDoWrite is active even when rs1=x0 (spec violation) CORRECT behavior proved (masked implementation): - mstatus_wpri_correct_csrrs: with mask, bit 1 stays 0 - mstatus_masked_only_legal: only legal fields are settable Trap/MRET transitions (verified correct): - trap_mie_to_mpie: MIE→MPIE on trap entry - mret_mpie_to_mie: MPIE→MIE on MRET - trap_mret_roundtrip_mie: MIE preserved across trap→MRET cycle - trap_preserves_non_mstatus_bits: no WPRI bits set by trap M-extension edge cases (verified correct): - div_int_min_neg1: INT_MIN / -1 = INT_MIN - rem_int_min_neg1: INT_MIN % -1 = 0 - div_by_zero, rem_by_zero, divu_by_zero, mulh_neg1_neg1

Prove that the synthesizable Signal DSL hardware implementations produce the same results as the pure Lean reference specifications. This bridges the gap between "spec is correct" and "hardware matches spec" — the Signal DSL IS the hardware (generates Verilog), and these proofs show it computes identical values. Key technique: Signal dom α = Nat → α. We provide 17 @[simp] lemmas that push .val through all Signal combinators (mux, beq, +, -, &, |, ^, <<<, >>>, slt, ult, ashr, ~~~, register), reducing Signal expressions to pure BitVec computations. Then simp closes the goal. ALU equivalence (8): ADD, SUB, AND, OR, XOR, SLL, SRL, PASS aluSignal (pure op) (pure a) (pure b) .val 0 = aluCompute op a b Branch equivalence (3): BEQ (universal), BNE (concrete ×2) branchCompSignal (pure f3) (pure a) (pure b) .val 0 = evalBranch f3 a b Hazard detection (4): concrete hit/miss/no-load/x0 hazardSignal (pure ...) .val 0 = expected Register semantics (2): init at t=0, delay-by-one at t+1

…S bug)

…+ CI test Oracle improvements (Sparkle/Core/Oracle.lean): - MIE/MTIE guard: check MSTATUS.MIE and MIE.MTIE before timer-compare skip. Prevents skipping when interrupts are disabled (the timer interrupt wouldn't fire, so skipping would be incorrect). - WFI fast-path: optional wfiWireArrayIdx triggers immediate skip when WFI instruction is detected (threshold=1 instead of default 50). - S-mode fields: mstatusRegIdx, mieRegIdx for privilege-aware checking. - mkBootOracle now enables checkInterruptEnable by default. Self-contained CI test (Tests/RV32/OracleAccuracyTest.lean): - Test 1: Halt loop detection — oracle triggers on firmware halt loop - Test 2: Timer advance accuracy — set mtimecmp=mtime+5000, verify skip - Test 3: MIE guard — disable interrupts, verify oracle does NOT skip - Test 4: MTIE guard — disable timer interrupt, verify oracle does NOT skip All 4 tests pass. Added to CI workflow and lakefile.

…us, readme

…igation Investigation findings: - SVParser successfully PARSES PicoRV32 with ENABLE_MUL=1, ENABLE_DIV=1 (152 registers vs 148 for RV32I-only) - However, execution FAILS: even RV32I firmware produces no UART output on the M-ext SoC, indicating the PCPI sub-module flattening breaks the signal routing - Root cause: the generate-if blocks for ENABLE_MUL instantiate picorv32_pcpi_mul/picorv32_pcpi_div sub-modules whose state machines are not correctly integrated into the flattened design - No new IR operators needed: PicoRV32's MUL/DIV uses shift-and-add algorithms with standard +/-/shift operators - Fix requires: debugging the sub-module flattening in Lower.lean for the PCPI interface (pcpi_wr, pcpi_rd, pcpi_wait, pcpi_ready signals) Status changed from "Not started" to "Investigated".

Three root-cause fixes in flattenDesign (Lower.lean): 1. Parameter override propagation (was: generate-if used defaults only) - lowerModule now accepts paramOverrides argument - flattenDesign re-lowers sub-modules with parameter overrides from the instantiation (#(.ENABLE_MUL(1)) etc.) - generate-if blocks now evaluate with correct parameter values 2. Complex output port connections (was: silently dropped) - Output ports connected to non-identifier expressions (array index, bit slice, concat) were silently skipped with `| _ => pure ()` - Now creates a temporary wire and assigns the sub-module output to it - Parent module can read from the temporary wire 3. Missing sub-module handling (was: kept as dangling .inst) - Missing sub-modules now emit a warning wire instead of keeping an un-flattenable .inst statement that would break CppSim Note: PicoRV32 M-extension (ENABLE_MUL=1) parsing succeeds and flattening produces a single module, but execution still fails due to PCPI handshake signal routing issues. Further debugging needed. All 11 SVParser tests + 504 unit tests pass (no regressions).

Root cause analysis and fixes for sub-module wiring failures: 1. Parser: parse #(.PARAM(val)) instead of skipping (was discarded) - SVModuleItem.instantiation now carries paramOverrides field - Parser reads named parameter values from #(...) syntax 2. Lowering: apply parameter overrides to constant assignments - lowerModule accepts paramOverrides argument - Parameter values emitted as correct constants (not defaults) - generate-if blocks expanded with overridden values 3. Flattening: 2-pass architecture for nested hierarchies - flattenDesign gets SVDesign for re-lowering with overrides - Extracts parameter overrides from SV AST at flatten time - Iterative loop (up to 5 passes) until no .inst remains - Re-adds original module map for each iteration 4. IR stays clean: no parameter info in Stmt.inst - Parameter resolution is purely a parser/lowering concern - IR sees only resolved port connections Status: PicoRV32 MUL sub-module (picorv32_pcpi_mul) now correctly inlined with ENABLE_MUL=1. DIV module fails because the parser skips module definitions after picorv32_pcpi_fast_mul (separate parser bug, not a flattening issue). All 11 SVParser tests + 504 unit tests pass.

Parser fix: add unary minus (negation) to parseUnary. Previously, -expr in Verilog (e.g., -pcpi_rs1, -quotient) caused parse failure, which silently stopped module parsing. This was the root cause of picorv32_pcpi_div being invisible to the flattener. Before: 5 modules parsed (stopped at picorv32_pcpi_fast_mul) After: 9 modules parsed (all PicoRV32 sub-modules visible) M-extension status: PicoRV32 with ENABLE_MUL=1, ENABLE_DIV=1 now parses, lowers, flattens (183 registers), and compiles to JIT C++. RV32I firmware execution still hangs — PCPI handshake stall issue remains (pcpi_int_wait blocks CPU instruction fetch). All 11 SVParser tests + 504 unit tests pass.

Traced PicoRV32 M-extension failure to mem_state FSM stuck at 0. if-conversion generates wrong guard priority: outer reset guard (mem_state<=0) overrides inner case branch (mem_state<=1).

Root cause: PicoRV32 uses Verilog macro invocations like: \`assert(mem_wstrb == 0); \`assert(mem_do_prefetch || mem_do_rinst || mem_do_rdata); These backtick-assert lines were not stripped by the preprocessor, causing the parser to fail inside case arms. This silently corrupted the case statement structure, which in turn caused if-conversion to generate wrong guard priorities for mem_state FSM transitions. The fix: add \`assert to the preprocessor skip list (1 line change). Result: RV32I firmware now executes correctly on PicoRV32 SoC with ENABLE_MUL=1, ENABLE_DIV=1 (183 registers, 26 UART words — identical to RV32I-only SoC output). All 11 SVParser tests + 504 unit tests pass.

M-extension firmware now runs to completion (pass marker reached): - MUL 7*6 returns 0 (expected 42) — PCPI mul sub-module result bug - DIV/REM propagate the wrong MUL result - RV32I firmware on M-ext SoC: PASS (26 words, identical to RV32I SoC) The PCPI handshake works (CPU doesn't hang), but picorv32_pcpi_mul returns incorrect computation results. Likely a register/shift issue in the flattened mul sub-module.

Two fixes for always @* combinational block processing: 1. collectGuardedBlock: add forLoop handling (was silently ignored) - for loops inside always @* now have their body collected - Matches collectGuardedNB which already handled forLoop 2. tryEvalConst: constant folding for if-else conditions - eq(const, const) and not(const) are evaluated at collect time - When condition is statically true/false, only one branch is taken - Applied to both collectGuardedNB and collectGuardedBlock 3. Parameter substitution in SV AST: - substParamExpr/substParamStmt: recursive parameter→literal replacement - substituteParamsInItem: applied to always blocks and cont assigns - Parameters in if conditions inside always @* are now constants MUL result still incorrect due to CARRY_CHAIN=4 default using [j+:W] (variable part-select) syntax which is not yet parsed. The CARRY_CHAIN=0 workaround generates a broken for loop (j=j+0 infinite loop). Next step: implement [base +: width] part-select parsing. All 11 SVParser tests + 504 unit tests pass.

Three new capabilities for Verilog-2001 support: 1. Part-select parsing ([base +: width]): - New SVExpr.partSelectPlus variant in AST - Parser recognizes [expr +: expr] after [ in parsePostfix - Lowering: (expr >> base) & ((1 << width) - 1) 2. For-loop unrolling (SV AST level): - unrollForLoops: statically unroll for loops with constant bounds - Loop variable substituted in body for each iteration - Recursive unrolling for nested loops - Applied after parameter substitution in substituteParamsInItem 3. Constant folding in if-conversion: - tryEvalConst: evaluate eq(const, const) and not(const) - collectGuardedNB/Block: when if-condition is compile-time constant, take only the true/false branch (eliminates dead code) MUL status: PicoRV32 pcpi_mul uses concat-LHS blocking assignment ({next_rdt[j+3], next_rd[j +: 4]} = add_expr) inside the unrolled carry-save loop. This concat-LHS in always @* is not yet handled by collectGuardedBlock (only collectGuardedNB has concat-LHS support via lowerConcatLhsAssign). The next_rdt=0 initialization survives as the final value because the loop body assignments are silently dropped. Next step: add concat-LHS blocking assignment support to collectGuardedBlock (bit-scatter decomposition for always @*). All 11 SVParser tests + 504 unit tests pass.

1. decomposeMultiConcatLhs: decompose {a[hi], b[lo+:w]} = rhs into per-variable assignments. Each target gets the corresponding bit-field from the RHS expression via slice + shift. 2. collectGuardedBlock: use decomposeMultiConcatLhs as fallback when both exprToName and lowerConcatLhsAssign fail on blockAssign LHS. 3. collectBlockNamesTop: extract variable names from concat-LHS elements (ident, index, slice, partSelectPlus). Also recurse into forLoop bodies. Verified working with simple [hi:lo] slice concat-LHS (nb[3], na[3:0]). PicoRV32 pcpi_mul uses [base +: width] in concat-LHS, which the collectBlockNamesTop recognizes but the blocking-assign concat parser doesn't see (parser issue with LHS concat containing +: syntax). Next: debug why {nb[3], na[0 +: 4]} = expr is not parsed as concat-LHS blockAssign — likely the parser's LHS expression handling doesn't recognize concat with partSelectPlus elements. All 11 SVParser tests + 504 unit tests pass.

Major progress on M-extension support: 1. Fixed [base +: width] parsing: use parsePrimary (not parseExpr) for base to prevent + being consumed as addition operator 2. Added svExprToNat: evaluate constant SVExpr expressions (add/sub/mul) for concat-LHS index resolution after forLoop unroll (e.g., j+CARRY_CHAIN-1 with j=0, CARRY_CHAIN=4 → 3) 3. Read-modify-write pattern for concat-LHS decomposition: value = (old & ~mask) | (shifted & mask) This preserves other bit fields when multiple loop iterations update different fields of the same register Result: MUL 7*6 = 40 (was 0, expected 42). Off by 2 — likely a carry propagation edge case in the carry-save accumulator decomposition. The carry-save pattern is fundamentally working now. All 11 SVParser tests + 504 unit tests pass.

…e naming

Implement SSA renaming for blocking assignment loops: - Variables written in loop body get iteration-specific names (e.g., next_rd → next_rd_ssa0_0, next_rd_ssa0_1, ...) - Prologue captures initial values, epilogue writes back final - Depth parameter distinguishes nested loops (ssa0_, ssa1_, ...) - SSA wires auto-declared when name contains _ssa Status: Standalone multiplier test still fails due to nested SSA complexity. The outer loop (32 iters) and inner loop (16 iters) create deeply nested SSA chains where the inner loop's variables get double-renamed (next_rd → next_rd_ssa0_k → next_rd_ssa0_k_ssa1_j). The epilogue's _ssa0_32 reference isn't declared because it only exists as the write-target of the last iteration. Next step: ensure epilogue SSA vars are also declared as wires, or simplify by only applying SSA to the innermost loop (which is the one with the self-referential carry-save accumulator pattern). All 11 SVParser tests + 504 unit tests pass.

Major SSA improvements for carry-save accumulator support: 1. Self-referential-only SSA: only variables appearing in both LHS and RHS of loop body get SSA renaming (reduces unnecessary renaming) 2. Recursive LHS rename (renameLhsOnly): correctly renames write-side of blockAssign even inside nested ifElse/forLoop structures 3. OR-merge for same-guard assignments: concat-LHS bit-field writes to the same variable are OR-combined instead of mux-overwritten 4. SSA base resolution: _ssa0_N uses _ssa0_{N-1} as base value instead of self-referencing (eliminates circular dependency) 5. Auto-declare all SSA wires from both LHS and RHS scan Status: SSA chain is correctly formed (next_rd_ssa0_1 = next_rd_ssa0_0 instead of self-reference). However OR-merge of 16 concat-LHS bit-field decompositions doesn't reach the guardedToMux filtered list. The concat-LHS decomposition inside nested ifElse+unrolled forLoop needs investigation — the guardedAssigns may not be collected from the deeply nested structure. All 11 SVParser tests + 504 unit tests pass.

SSA prologue fix: rd_ssa0_0 = rd no longer creates circular dependency with rd's epilogue assignment. The topoSort now recognizes _ssa*_0 assignments as reading the initial value, not the final value. Result: SSA chain evaluates in correct order (rd_ssa0_0 first). However rd_ssa0_2 still evaluates before rd_ssa0_1 because rd_ssa0_1 depends on rdx, which depends on rdt, which creates an indirect cycle. Root cause: stmtsToMuxExprBlocking merges ALL blocking assignments to a variable into a single mux expression, losing the sequential evaluation semantics of always @*. Variables like rdx have both an initialization (rdx = 0) and a post-loop update (rdx = rdt << 1), but the mux expression only preserves the final value. The fundamental issue is that Sparkle's IR represent combinational logic as simultaneous assignments, but Verilog's always @* blocks have sequential semantics for blocking assignments. A full solution requires either: 1. Sequential evaluation in CppSim (respecting statement order) 2. Full SSA conversion of ALL blocking assignments (not just loops) All 11 SVParser tests + 504 unit tests pass.

Three critical fixes that enable the carry-save accumulator to work: 1. Sequential emit for always @* blocks (emitBlockingStmtsSequential): Convert blocking assignments to IR assigns in statement order instead of mux-ifying them. This preserves the sequential evaluation semantics where later statements read the results of earlier ones. Stored in seqBody (not body) to bypass topoSort reordering. 2. SVExpr width for [base +: width] part-select: Width is now kept as SVExpr (not Nat) in the AST, so parameter substitution (CARRY_CHAIN → 4) is applied BEFORE width evaluation. Previously width was evaluated at parse time when CARRY_CHAIN was still an identifier, resulting in width=1 (fallback). 3. Concat-LHS OR-accumulate in sequential emit: Each concat-LHS decomposition appends `name = name | shifted_bits` instead of overwriting. Multiple iterations (j=0,4,...) each add their bit-field to the running value via OR. 4. Exclude part-select variables from SSA renaming: Variables accessed via [base +: width] in concat-LHS have non- overlapping bit-field writes across loop iterations, so SSA renaming is unnecessary and harmful (breaks OR-accumulation). Result: 8-bit standalone carry-save multiplier passes all tests! mul8(1,6)=6 ✓ mul8(3,5)=5 ✓ mul8(7,1)=1 ✓ All 11 SVParser tests + 504 unit tests pass.

… type Standalone carry-save multiplier (8-bit) passes all tests. Full PicoRV32 M-ext compilation fails: cpu_cpuregs is declared as uint64_t (scalar) but accessed as cpuregs[idx] (array). The register file (reg [31:0] cpuregs [0:31]) needs to be recognized as memory in CppSim when ENABLE_MUL=1 adds direct cpuregs[] write paths.

When a memory (e.g., cpuregs[0:31]) is flattened into the top module, CppSim would also declare a local scalar variable with the same name in eval(), shadowing the std::array member. This caused compilation errors like 'invalid types uint64_t[uint8_t] for array subscript'. Fix: collect memory names from IR .memory statements and exclude them from localWires declarations. Result: PicoRV32 M-ext SoC (183 registers) now compiles to JIT C++. Firmware execution still produces 0 UART words — likely the sequential always @* emit causes evaluation order issues in the larger CPU FSM where some blocking assignments interact with the posedge always block. All 11 SVParser tests + 504 unit tests pass.

- Use sequential emit only for always @* blocks containing concat-LHS (carry-save accumulator pattern). Other always @* blocks use mux mode. - hasConcatLhs: recursive check for concat-LHS in nested ifElse/forLoop - seqBody placed after topoSorted body (not before) RV32I-only SoC: PASS (11/11 tests). M-ext SoC: compiles but RV32I firmware produces 0 UART words. The regression from the backtick-assert fix commit is likely caused by partSelectPlus width SVExpr change or other cumulative changes affecting the PicoRV32 CPU FSM evaluation. All 11 SVParser tests + 504 unit tests pass.

…I FSM)

…ges case priority

Removed constant folding from both collectGuardedNB (non-blocking) and collectGuardedBlock (blocking) in if-conversion. The constant folding was changing guard priority in the decoder's case statements, causing incorrect instruction decode (instr_addi=0 for ADDI). Constant folding is preserved only in emitBlockingStmtsSequential (always @* sequential emit) where it's needed for carry-save accumulator's CARRY_CHAIN==0 branch elimination. M-ext SoC still fails (0 UART words) — the decoder regression may have a different root cause beyond tryEvalConst. Needs comparison of instr_addi_next mux expression between working (ef6ef74) and current version. All 11 SVParser tests + 504 unit tests pass.

Major architectural changes to the SVParser→IR lowering: - Remove emitBlockingStmtsSequential entirely: all always @* blocks now use the MUX-based collectGuardedBlock path - Fix sigNames filter to use collectBlockNamesTop (recursive) instead of flat .any check — fixes cpuregs_rs1/rs2 being silently dropped when nested inside if(ENABLE_REGS_DUALPORT) - Fix processCaseArms with !covered guard for first-match-wins - Remove OR-merge hack from stmtsToMuxExprBlocking - decomposeMultiConcatLhs now produces read-modify-write with __RMW_BASE__ placeholder, resolved by stmtsToMuxExprBlocking to the correct SSA base (previous iteration's output) - SSA rename all write variables in loops (including part-select) with nested SSA support (stripSsa base-name unification) - Fix unrollForLoops to use recursive unroll result (was discarding it) - Fix ssaBase to handle nested SSA (use last _ssa segment) - Add 64-bit promotion in decomposeMultiConcatLhs to avoid C++ UB on shifts >= 32 bits - Add collectRefs before guardedToMux for proper scoping - Skip outer SSA for single-iteration loops (numIters <= 1) Results: 11/11 SVParser tests pass, RV32I firmware on M-ext SoC produces all 26 UART words correctly. Runtime MUL via carry-save accumulator (PCPI) still produces incorrect values — to be debugged via standalone pcpi_mul unit test in next session.

Test 12: Nested always @* assign inside if/else (cpuregs_rs1 bug) Test 13: case(1'b1) first-match-wins priority (processCaseArms bug) Test 14: Concat-LHS part-select decomposition in always @* Test 15: For-loop unroll with SSA renaming chain Test 16: Array read in always @* (register file dual-port) Test 17: pcpi_mul standalone IR structure (regs, SSA wires, assigns) Test 18: Carry-save SSA chain: next_rd_ssa1_N refs ssa1_{N-1} Test 19: Carry-save SSA chain: next_rdt_ssa1_N refs ssa1_{N-1} Test 20: next_rdx reads post-loop next_rdt (not stale _ssa0_0) All 20 tests pass. Tests 18-20 confirm the IR-level SSA chain for pcpi_mul's carry-save accumulator is correct — the remaining MUL bug is in CppSim emit or C++ numeric evaluation, not in lowering.

Replace MUX-only approach with emitSequentialSSA that processes statements top-to-bottom, creating SSA wires for each variable write. This correctly handles read-then-overwrite patterns (e.g., PicoRV32's carry-save accumulator) that cause cyclic dependencies under pure MUX. Key change: always @* blocks now use sequential evaluation with dynamic SSA wire generation. For-loop SSA in unrollForLoops is disabled (sequential emitter handles ordering natively). Test 21: pcpi_mul JIT (7*6=42) PASSES for the first time! Tests 15-20 need updating for new _seq naming convention. Test 11 (RV32I firmware) has a regression to investigate.

When a variable has no prior _seq wire and is only modified in one branch of if-else, skip the MUX to avoid self-referencing the final output (fixes decoded_rs cycle in PicoRV32). Test 21 (pcpi_mul 7*6=42) still PASSES. Test 10/11 (RV32I SoC) still has regression — CPU always @* blocks need further investigation with sequential emitter.

Each test embeds Verilog, parses, lowers, JIT-compiles, and verifies output values — no external file dependencies (CI-safe). Test 22: if/else one-sided assign + register (decoded_rs pattern) Test 23: Array read in always @* → register (cpuregs_rs1 pattern) Test 24: case(1'b1) decoder priority (processCaseArms pattern) Test 25: Read-then-overwrite sequential chain (pcpi_mul pattern) Test 26: For-loop accumulator (carry-save pattern) All 5 new tests PASS, confirming the sequential SSA emitter correctly handles all patterns that caused bugs during development.

Fix case statement merge in emitSequentialSSA: when a variable has no prior _seq wire (e.g., initialized with don't-care 'bx), avoid self-referencing the final output in the default MUX path. This resolves 16 cyclic dependencies in PicoRV32's always @* blocks (alu_out, cpuregs_wrdata, mem_rdata_word, etc.). Results: - Test 10: PicoRV32 SoC "Hello" PASS - Test 11: RV32I C firmware (26 words, ALL C TESTS OK) PASS - Test 21: pcpi_mul JIT 7*6=42 PASS - Tests 22-26: All JIT pair tests PASS - M-ext SoC: RV32I PASS, RV32IM (MUL/DIV/REM) PASS - M-ext factorial 10! = 3628800 (0x375F00) CORRECT Tests 15-20 need updating for _seq naming convention (not failures). Large number MUL (12345*6789) still has a precision issue to investigate.

- Update Tests 15-20, 21a for _seq naming (was _ssa1_) - Add Test 21b: pcpi_mul 100*100=10000 (PASS) - Add Test 21c: pcpi_mul 12345*6789=83810205 (PASS) - Add topo sort deadlock warning in topoSortBody All 29 tests PASS. pcpi_mul standalone handles all multiply sizes correctly. SoC integration has a type-width issue for large operands (CPU→PCPI operand truncation) — separate from the compiler.

- Test 21d: pcpi_mul consecutive multiply (7*6 then 12345*6789) PASS - Fixed test protocol: deassert pcpi_valid between multiplications to allow FSM re-trigger (mul_start edge detection) - SoC large-number MUL issue identified as CPU⇔PCPI protocol timing, not a compiler bug (pcpi_mul standalone handles all sizes correctly) All 30 tests PASS.

Test 21e: mul_wrapper instantiates pcpi_mul with a minimal CPU-like sequencer that drives pcpi_valid/rs1/rs2, captures pcpi_rd on ready. Tests consecutive multiplies: 7*6=42 then 12345*6789=83810205. SoC integration issue with large MUL (12345*6789) is isolated to the full PicoRV32 CPU data path — not in pcpi_mul, not in the wrapper, and not in the compiler. All 31 standalone tests PASS.

Root cause of SoC large-number MUL failure: lowerExpr for repeat_ was ignoring count, turning {4{mem_la_write}} into 1-bit value. Fix: (0 - val) & mask for 1-bit, concat for multi-bit. Test 27 (bit replication) PASS. wstrb now correctly 0xF for sw. Store/Load still returns byte-0 only despite correct wstrb — investigating CppSim byte-lane write/read ordering.

Root cause of SoC store/load failure: buildByteStrobeWrite did not shift the data slice to the target bit position before masking. e.g., mem_wdata[15:8] (= 0x56) was ANDed with mask 0xFF00 directly, giving 0 instead of being shifted to 0x5600 first. Fix: shift dataExpr by lane.lo bits before applying mask. Test 28 (byte-lane memory write/read): PASS Simple MUL test (12345*6789=83810205): PASS on M-ext SoC! Store/Load test (0x12345678 round-trip): PASS! All SoC tests now pass: RV32I (26 words), RV32IM (18 words), Simple MUL (5 words), Store/Load (6 words). 33/34 standalone tests pass (Test 29 multi-bit replication WIP).

For repeat_ with ident operands (e.g., {2{din}}), the element width was incorrectly assumed to be 1-bit, causing (0 - val) & 3 instead of proper concat. Fix: use concat path for unknown-width operands, which correctly shift+OR using inferExprWidth from the type map. Test 29: {2{16'hABCD}} = 0xABCDABCD — PASS All 34 tests PASS. All M-ext SoC tests PASS.

After fixing {N{expr}} replication, sb (store byte) correctly produces {4{byte}} in mem_wdata (e.g., 0x48484848 for 'H'). Fix UART display to mask low 8 bits: "Hello" restored. 34/34 tests PASS. All SoC tests PASS.

sb (store byte) in PicoRV32 produces {4{byte}} in mem_wdata (e.g., 0x48484848 for 'H'). This is normal — SoC UART stores the full word, test masks to low 8 bits to extract actual byte.

- STATUS.md: Phase 4 (M-ext) marked Done, Phase 51 detailed summary - CHANGELOG.md: Phase 51 entry with architecture changes, 12 bugs, test table - RV32.md: Updated SV Transpiler section with M-ext results, 34 tests, MemorySSA - README.md: Updated IP catalog entry for SV Transpiler (34 tests, M-ext)

- Add RV32 JIT benchmark step: runs firmware.hex for 5000 cycles via Signal DSL SoC JIT (122 registers, 4-stage pipeline) - Commit verilator/generated_soc_jit.cpp (3038 lines) so CI can run the JIT test without regenerating - Update SVParser test description (34 tests) - Reference: JIT ~13M cyc/s vs Verilator 11.1M cyc/s on Apple Silicon

Add benchmark-action/github-action-benchmark integration: - Measures RV32 JIT SoC simulation (5000 cycles, compile+sim) - Pushes results to dev/rv32-jit-bench on GitHub Pages - Alert at 150% regression, comment on alert - Add permissions (deployments: write, contents: write) Reference performance (Apple Silicon): JIT: 13.0M cyc/s (1.17x faster than Verilator) Verilator 5.044: 11.1M cyc/s

Full benchmark pipeline: 1. Install Verilator on Ubuntu 2. Build Verilator simulation (make build) 3. Build JIT shared library (.so) from generated_soc_jit.cpp 4. Build benchmark binaries (verilator_bench, jit_bench) 5. Run both for 10M cycles, parse cycles/sec 6. Push results to GitHub Pages via benchmark-action Metrics tracked: - Verilator (10M cycles) — cycles/sec - JIT eval+tick (10M cycles) — cycles/sec - JIT evalTick fused (10M cycles) — cycles/sec - JIT evalTick+6wires (10M cycles) — cycles/sec Reference (Apple Silicon): JIT 13M cyc/s vs Verilator 11.1M cyc/s

junjihashimoto added 30 commits March 25, 2026 11:39

docs: add Phase 48 (AXI4-Lite) and Phase 49 (RV32 102 proofs + MSTATU…

22b6e8a

…S bug)

docs: add Phase 50 (Linux Boot Idle-Loop Skipping) to changelog, stat…

05af349

…us, readme

docs: M-ext root cause — if-conversion priority bug in collectGuardedNB

3fe3604

Traced PicoRV32 M-extension failure to mem_state FSM stuck at 0. if-conversion generates wrong guard priority: outer reset guard (mem_state<=0) overrides inner case branch (mem_state<=1).

docs: M-ext status — blocking assignment self-reference needs SSA wir…

8a3680c

…e naming

docs: M-ext CPU executes ~25 instructions then stalls at PC=0x64 (PCP…

148c691

…I FSM)

debug: decoder instr_trap=1 for ADDI after SW — constant folding chan…

36baaa2

…ges case priority

junjihashimoto added 23 commits March 26, 2026 07:36

fix: UART byte display (mask low 8 bits for ASCII)

860af48

After fixing {N{expr}} replication, sb (store byte) correctly produces {4{byte}} in mem_wdata (e.g., 0x48484848 for 'H'). Fix UART display to mask low 8 bits: "Hello" restored. 34/34 tests PASS. All SoC tests PASS.

docs: add comment explaining UART byte replication in Test 10

8db2c69

sb (store byte) in PicoRV32 produces {4{byte}} in mem_wdata (e.g., 0x48484848 for 'H'). This is normal — SoC UART stores the full word, test masks to low 8 bits to extract actual byte.

ci: add rv32i_soc_wrapper.sv for Verilator build in CI

18551bb

ci: fix verilator bench link — remove duplicate verilated_threads.o

fbc9470

ci: remove generated_soc_jit.cpp from repo (generated by lake build)

5aca6f2

ci: exclude tb_soc.o from verilator_bench link (duplicate main)

b986b06

ci: fix JIT .so path — use ./generated_soc_jit.so for dlopen

933bc66

junjihashimoto merged commit 9da7df6 into main Mar 26, 2026
1 of 2 checks passed

junjihashimoto deleted the feature/rv32 branch March 26, 2026 03:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/rv32#12

Feature/rv32#12
junjihashimoto merged 53 commits intomainfrom
feature/rv32

junjihashimoto commented Mar 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

junjihashimoto commented Mar 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant