Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
07fe598
feat: LiteX SoC parse support - Phase 1 PASS (1730 lines, 784 items)
junjihashimoto Mar 26, 2026
c8ea374
feat: LiteX Phase 2-3 PASS (Lower + JIT C++ generation)
junjihashimoto Mar 26, 2026
0c247b7
feat: LiteX SoC Phase 4 PASS - JIT compile + simulate (100 cycles)
junjihashimoto Mar 26, 2026
6362c06
perf: enable IR optimization for SVParser path (+12% JIT speed)
junjihashimoto Mar 26, 2026
931be6a
perf: add safe constant folding patterns (mux(0), not(not))
junjihashimoto Mar 26, 2026
d173b6c
perf: fix Int const folding with toUnsigned — 35% C++ reduction
junjihashimoto Mar 26, 2026
3aaeccc
perf: function splitting for eval() — 38% faster, 59% smaller C++
junjihashimoto Mar 26, 2026
aa1038a
perf: per-chunk local wire promotion in function splitting
junjihashimoto Mar 26, 2026
43a6049
perf: type precision for _seq wires — inherit width from base variable
junjihashimoto Mar 26, 2026
0c6a21e
perf: type-precise constant literals (U vs ULL suffix)
junjihashimoto Mar 26, 2026
dd8fe0c
analysis: mask reduction hurts perf — GCC uses masks for optimization
junjihashimoto Mar 26, 2026
385f7ba
analysis: CSE attempted but reverted — GCC already does it better
junjihashimoto Mar 26, 2026
98dde47
analysis: trigger-based eval feasibility study
junjihashimoto Mar 26, 2026
e1cc1b5
feat: 2-partition IR splitting + threaded C++ codegen
junjihashimoto Mar 26, 2026
3435ebd
feat: partitioned JIT simulation — Phase 6 PASS
junjihashimoto Mar 26, 2026
38060f2
analysis: 2-thread per-cycle spinlock too slow for small peripheral
junjihashimoto Mar 26, 2026
e1998a0
analysis: MUX→if-else conversion — 4% slower (branch misprediction)
junjihashimoto Mar 27, 2026
11529ae
perf: peripheral-skip trigger-based eval — +21% (6.05M cyc/s)
junjihashimoto Mar 27, 2026
8652429
perf: PCPI guard + peripheral skip — 6.83M cyc/s (0.79x Verilator)
junjihashimoto Mar 27, 2026
1ed347d
perf: auto-detect PCPI guard pattern — general-purpose conditional skip
junjihashimoto Mar 27, 2026
d6a98f4
feat: JIT optimization Phase 52 + LiteX SoC support + Timer Oracle
junjihashimoto Mar 29, 2026
3eeab19
feat: multi-core LiteX SoC generator + scaling benchmark
junjihashimoto Mar 30, 2026
6c53bd8
ci: add LiteX SoC benchmark to GitHub Actions
junjihashimoto Mar 30, 2026
113b0bb
feat: sub-module instantiation support — hierarchical JIT codegen
junjihashimoto Mar 30, 2026
7c588ef
docs: add multi-core scaling analysis + fair comparison explanation
junjihashimoto Mar 30, 2026
7616030
fix: fair multi-core benchmark — all UART ports connected as real I/O
junjihashimoto Mar 30, 2026
2d3ace0
feat: shared bus multi-core benchmark — Sparkle 11.2x faster at 8-core
junjihashimoto Mar 30, 2026
67c17a7
docs: update benchmark with honest multi-core results
junjihashimoto Mar 30, 2026
9976e02
feat: multi-core parallel JIT runner — 3.87x speedup on 8 cores
junjihashimoto Mar 30, 2026
c5e6065
ci: add multi-core parallel benchmark to GitHub Actions
junjihashimoto Mar 30, 2026
154db23
feat: benchmark suite — bench.sh + preprocess + standalone C++ tools
junjihashimoto Mar 30, 2026
ac1eae5
docs: update all docs with final Phase 52 results
junjihashimoto Mar 30, 2026
9a1130e
feat: verified reverse synthesis, OracleReduction type class, sim!/\#…
junjihashimoto Apr 1, 2026
719961b
refactor: move production IPs from Examples/ to IP/
junjihashimoto Apr 1, 2026
2b09509
fix: restore lean-toolchain, slim verilog! macro, update .gitignore
junjihashimoto Apr 2, 2026
9d007c8
fix: inlineAssigns infinite loop in verilog! macro (10min → 1.4s)
junjihashimoto Apr 2, 2026
e4c182a
fix: rewrite Basic.lean using stdlib lemmas (eliminate omega on 2^n)
junjihashimoto Apr 2, 2026
995e77f
docs: add multi-domain parallel section to tutorial, sim_parallel! TODO
junjihashimoto Apr 2, 2026
4a4b807
chore: remove committed build artifacts from tracking
junjihashimoto Apr 2, 2026
30c6717
ci: fix Examples.RV32 → IP.RV32 in GitHub Actions
junjihashimoto Apr 2, 2026
9e80e8b
fix: add all registers as reachability DCE roots (fix wire elimination)
junjihashimoto Apr 2, 2026
8693c67
fix: protect register-input wires from single-use inlining
junjihashimoto Apr 4, 2026
91f0fde
fix: include register input refs in evalTick tickRefs
junjihashimoto Apr 5, 2026
9ef14b2
fix: filter identity assigns (x = ref x) in IR optimizer and CppSim
junjihashimoto Apr 7, 2026
edb77d2
docs: add KnownIssues.md documenting pcpi_mul FSM bug and other open …
junjihashimoto Apr 7, 2026
70f9b48
fix(CppSim): resolve Issue 1 — pcpi_mul standalone FSM freeze
junjihashimoto Apr 8, 2026
bdba703
fix(CppSim): resolve Issue 6 — disable unsound wrapConditionalGuards
junjihashimoto Apr 8, 2026
a249596
fix(Optimize): resolve Issue 7 — unsound AND-with-allones mask removal
junjihashimoto Apr 8, 2026
613bd14
fix(ci): verilator Makefile target Examples.RV32 → IP.RV32
junjihashimoto Apr 8, 2026
c8b54b1
fix: rename remaining Examples.RV32 references to IP.RV32
junjihashimoto Apr 8, 2026
53a9286
chore: gitignore verilator/generated_soc_cppsim.h build artifact
junjihashimoto Apr 8, 2026
cd9d63c
fix(CppSim): dedupe statement declarations by identifier
junjihashimoto Apr 8, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
304 changes: 302 additions & 2 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ jobs:
run: lake env lean --run Examples/Sparkle16/Core.lean

- name: Build RV32 Signal DSL
run: lake build Examples.RV32
run: lake build IP.RV32

- name: Build CDC Multi-Clock example
run: lake build Examples.CDC
Expand Down Expand Up @@ -94,7 +94,7 @@ jobs:
# ---- Synthesis Tests ----

- name: Build SoC Verilog (generates verilator/ files)
run: lake build Examples.RV32.SoCVerilog
run: lake build IP.RV32.SoCVerilog

- name: Run Verilog generation tests
run: lake exe verilog-tests
Expand Down Expand Up @@ -200,6 +200,306 @@ jobs:
comment-on-alert: true
fail-on-alert: false

# ---- LiteX SoC Benchmark ----

- name: "Fetch PicoRV32 RTL for LiteX benchmark"
run: |
curl -sL https://raw.githubusercontent.com/YosysHQ/picorv32/main/picorv32.v -o /tmp/picorv32.v
echo "PicoRV32: $(wc -l < /tmp/picorv32.v) lines"

- name: "Build LiteX Verilator simulation"
run: |
python3 Tests/SVParser/fixtures/gen_litex_multicore.py 1
cat Tests/SVParser/fixtures/litex_1core.v /tmp/picorv32.v > /tmp/litex_verilator.v
cat > /tmp/tb_litex.cpp << 'TBEOF'
#include <cstdio>
#include <cstdint>
#include <chrono>
#include "Vsim_1core.h"
int main() {
auto* top = new Vsim_1core;
top->sys_clk = 0; top->eval();
auto t0 = std::chrono::high_resolution_clock::now();
for (uint64_t c = 0; c < 10000000; c++) {
top->sys_clk = 1; top->eval();
top->sys_clk = 0; top->eval();
}
auto t1 = std::chrono::high_resolution_clock::now();
double ms = std::chrono::duration<double,std::milli>(t1-t0).count();
printf("%d\n", (int)(10000000.0/ms*1000.0));
delete top; return 0;
}
TBEOF
verilator --cc --exe --build -j 0 \
-Wno-WIDTHEXPAND -Wno-WIDTHTRUNC -Wno-UNUSEDSIGNAL \
-Wno-CASEINCOMPLETE -Wno-UNOPTFLAT -Wno-LATCH -Wno-MULTIDRIVEN \
-Wno-COMBDLY -Wno-PINMISSING \
--top-module sim_1core -CFLAGS "-O2" \
/tmp/litex_verilator.v /tmp/tb_litex.cpp --Mdir /tmp/litex_obj

- name: "Build LiteX JIT"
run: |
python3 -c "
import re
with open('Tests/SVParser/fixtures/litex_sim_minimal.v') as f: litex = f.read()
pico = ''
try:
with open('/tmp/picorv32.v') as f: pico = f.read()
except: pass
c = litex + '\n' + pico
c = c.replace('@(*)', '@*')
lines = c.split('\n')
r = ['' if l.lstrip().startswith('integer ') else (l.split('begin : ')[0]+'begin' if 'begin : ' in l else l) for l in lines]
c = '\n'.join(r)
import re as _re
pat = r'for\s*\((\w+)\s*=\s*0\s*;\s*\1\s*<\s*4\s*;\s*\1\s*=\s*\1\s*\+\s*1\)\s*\n((?:\t\t|\s{8}).*(?:\n(?:\t\t\t|\s{12}).*)*)'
c = _re.sub(pat, lambda m: '\n'.join(m.group(2).replace(f'{m.group(1)}*8',f'{i*8}').replace(f'{m.group(1)}]',f'{i}]') for i in range(4)), c)
p2 = r'if \((\w+)\[0\]\)\s*\n\s*(\w+)\[(\w+)\]\[0 \+: 8\] <= (\w+)\[0 \+: 8\];'
for m in _re.finditer(p2, c):
we,arr,addr,data = m.group(1),m.group(2),m.group(3),m.group(4)
rep = f'\tif ({we}) {arr}[{addr}] <= {data};'
for i in range(4):
old = f'\tif ({we}[{i}])\n\t\t\t{arr}[{addr}][{i*8} +: 8] <= {data}[{i*8} +: 8];'
c = c.replace(old, rep if i==0 else '')
with open('/tmp/litex_pp.v','w') as f: f.write(c)
"
cat > /tmp/gen_litex_jit.lean << 'GENEOF'
import Tools.SVParser
import Sparkle.Backend.CppSim
open Tools.SVParser.Lower
def main : IO Unit := do
let src ← IO.FS.readFile "/tmp/litex_pp.v"
let design ← IO.ofExcept (parseAndLowerFlat src)
let jitCpp := Sparkle.Backend.CppSim.toCppSimJIT design
IO.FS.writeFile "/tmp/litex_jit.cpp" jitCpp
GENEOF
lake env lean --run /tmp/gen_litex_jit.lean
g++ -O2 -std=c++17 -shared -fPIC -o /tmp/litex_jit.so /tmp/litex_jit.cpp

- name: "Run LiteX benchmark (10M cycles)"
run: |
# Verilator
LITEX_VLTR=$(/tmp/litex_obj/Vsim_1core 2>/dev/null)
echo "LiteX Verilator: ${LITEX_VLTR} cyc/s"
# JIT
cat > /tmp/bench_litex.cpp << 'BEOF'
#include <cstdio>
#include <cstdint>
#include <chrono>
#include <dlfcn.h>
typedef void* (*fn0)(); typedef void (*fn1)(void*);
int main() {
void* lib = dlopen("/tmp/litex_jit.so", RTLD_LAZY);
auto create = (fn0)dlsym(lib,"jit_create");
auto destroy = (fn1)dlsym(lib,"jit_destroy");
auto reset = (fn1)dlsym(lib,"jit_reset");
auto et = (fn1)dlsym(lib,"jit_eval_tick");
void* ctx = create(); reset(ctx);
auto t0 = std::chrono::high_resolution_clock::now();
for (uint64_t c = 0; c < 10000000; c++) et(ctx);
auto t1 = std::chrono::high_resolution_clock::now();
double ms = std::chrono::duration<double,std::milli>(t1-t0).count();
printf("%d\n", (int)(10000000.0/ms*1000.0));
destroy(ctx); dlclose(lib); return 0;
}
BEOF
g++ -O2 -std=c++17 -o /tmp/bench_litex /tmp/bench_litex.cpp -ldl
LITEX_JIT=$(/tmp/bench_litex)
echo "LiteX JIT: ${LITEX_JIT} cyc/s"
# Write combined results
cat <<EOF > litex-bench-results.json
[
{ "name": "LiteX Verilator (10M cycles)", "unit": "cycles/sec", "value": ${LITEX_VLTR:-0} },
{ "name": "LiteX JIT evalTick (10M cycles)", "unit": "cycles/sec", "value": ${LITEX_JIT:-0} }
]
EOF
cat litex-bench-results.json

- name: Store LiteX benchmark result
uses: benchmark-action/github-action-benchmark@v1
if: github.event_name == 'push'
with:
name: LiteX PicoRV32 SoC Benchmark (Verilator vs JIT)
tool: customBiggerIsBetter
output-file-path: litex-bench-results.json
benchmark-data-dir-path: dev/litex-bench
github-token: ${{ secrets.GITHUB_TOKEN }}
auto-push: true
alert-threshold: "80%"
comment-on-alert: true
fail-on-alert: false

# ---- Multi-Core Parallel Benchmark ----

- name: "Build multi-core JIT (hierarchical)"
run: |
# Generate 8-core wrapper (Verilator-style hierarchy)
for n in 1 8; do
python3 Tests/SVParser/fixtures/gen_litex_multicore.py $n
done
# Generate hierarchical JIT
cat > /tmp/gen_hier_jit.lean << 'GENEOF'
import Tools.SVParser
import Sparkle.Backend.CppSim
open Tools.SVParser.Lower
def main : IO Unit := do
let src ← IO.FS.readFile "Tests/SVParser/fixtures/litex_1core.v"
let pico ← IO.FS.readFile "/tmp/picorv32.v"
let design ← IO.ofExcept (parseAndLowerHierarchical (src ++ "\n" ++ pico))
let jitCpp := Sparkle.Backend.CppSim.toCppSimJIT design
IO.FS.writeFile "/tmp/sparkle_hier.cpp" jitCpp
IO.println s!"JIT: {jitCpp.length} chars, {design.modules.length} modules"
GENEOF
lake env lean --run /tmp/gen_hier_jit.lean
g++ -O2 -std=c++17 -shared -fPIC -o /tmp/sparkle_hier.so /tmp/sparkle_hier.cpp

- name: "Build multi-core runner"
run: g++ -O2 -std=c++20 -shared -fPIC -o /tmp/multicore_runner.so c_src/cdc/multicore_runner.cpp -lpthread

- name: "Build Verilator 8-core"
run: |
cat Tests/SVParser/fixtures/litex_8core.v /tmp/picorv32.v > /tmp/litex_8core.v
cat > /tmp/tb_8core.cpp << 'TBEOF'
#include <cstdio>
#include <cstdint>
#include <chrono>
#include "Vsim_8core.h"
int main() {
auto* top = new Vsim_8core;
top->sys_clk = 0;
top->serial_sink_data_0 = 0; top->serial_sink_valid_0 = 0; top->serial_source_ready_0 = 1;
top->serial_sink_data_1 = 0; top->serial_sink_valid_1 = 0; top->serial_source_ready_1 = 1;
top->serial_sink_data_2 = 0; top->serial_sink_valid_2 = 0; top->serial_source_ready_2 = 1;
top->serial_sink_data_3 = 0; top->serial_sink_valid_3 = 0; top->serial_source_ready_3 = 1;
top->serial_sink_data_4 = 0; top->serial_sink_valid_4 = 0; top->serial_source_ready_4 = 1;
top->serial_sink_data_5 = 0; top->serial_sink_valid_5 = 0; top->serial_source_ready_5 = 1;
top->serial_sink_data_6 = 0; top->serial_sink_valid_6 = 0; top->serial_source_ready_6 = 1;
top->serial_sink_data_7 = 0; top->serial_sink_valid_7 = 0; top->serial_source_ready_7 = 1;
top->eval();
auto t0 = std::chrono::high_resolution_clock::now();
for (uint64_t c = 0; c < 10000000; c++) {
top->sys_clk = 1; top->eval();
top->sys_clk = 0; top->eval();
}
auto t1 = std::chrono::high_resolution_clock::now();
double ms = std::chrono::duration<double,std::milli>(t1-t0).count();
printf("%d\n", (int)(10000000.0/ms*1000.0));
delete top; return 0;
}
TBEOF
verilator --cc --exe --build -j 0 \
-Wno-WIDTHEXPAND -Wno-WIDTHTRUNC -Wno-UNUSEDSIGNAL \
-Wno-CASEINCOMPLETE -Wno-UNOPTFLAT -Wno-LATCH -Wno-MULTIDRIVEN \
-Wno-COMBDLY -Wno-PINMISSING -Wno-UNDRIVEN \
--top-module sim_8core -CFLAGS "-O2" \
/tmp/litex_8core.v /tmp/tb_8core.cpp --Mdir /tmp/litex_8core_obj

- name: "Run multi-core benchmark"
run: |
# 1-core single-thread baseline
cat > /tmp/bench_1core.cpp << 'BEOF'
#include <cstdio>
#include <cstdint>
#include <chrono>
#include <dlfcn.h>
typedef void* (*fn0)(); typedef void (*fn1)(void*);
int main() {
void* lib = dlopen("/tmp/sparkle_hier.so", RTLD_LAZY);
auto create = (fn0)dlsym(lib,"jit_create");
auto destroy = (fn1)dlsym(lib,"jit_destroy");
auto reset = (fn1)dlsym(lib,"jit_reset");
auto et = (fn1)dlsym(lib,"jit_eval_tick");
void* ctx = create(); reset(ctx);
auto t0 = std::chrono::high_resolution_clock::now();
for (uint64_t c = 0; c < 10000000; c++) et(ctx);
auto t1 = std::chrono::high_resolution_clock::now();
double ms = std::chrono::duration<double,std::milli>(t1-t0).count();
printf("%d\n", (int)(10000000.0/ms*1000.0));
destroy(ctx); dlclose(lib); return 0;
}
BEOF
g++ -O2 -std=c++17 -o /tmp/bench_1core /tmp/bench_1core.cpp -ldl
JIT_1CORE=$(/tmp/bench_1core)
echo "JIT 1-core: ${JIT_1CORE} cyc/s"

# 8-core sequential
cat > /tmp/bench_8seq.cpp << 'BEOF'
#include <cstdio>
#include <cstdint>
#include <chrono>
#include <dlfcn.h>
typedef void* (*fn0)(); typedef void (*fn1)(void*);
int main() {
void* lib = dlopen("/tmp/sparkle_hier.so", RTLD_LAZY);
auto create = (fn0)dlsym(lib,"jit_create");
auto destroy = (fn1)dlsym(lib,"jit_destroy");
auto reset = (fn1)dlsym(lib,"jit_reset");
auto et = (fn1)dlsym(lib,"jit_eval_tick");
void* cores[8];
for (int i = 0; i < 8; i++) { cores[i] = create(); reset(cores[i]); }
auto t0 = std::chrono::high_resolution_clock::now();
for (uint64_t c = 0; c < 10000000; c++)
for (int i = 0; i < 8; i++) et(cores[i]);
auto t1 = std::chrono::high_resolution_clock::now();
double ms = std::chrono::duration<double,std::milli>(t1-t0).count();
printf("%d\n", (int)(10000000.0/ms*1000.0));
for (int i = 0; i < 8; i++) destroy(cores[i]);
dlclose(lib); return 0;
}
BEOF
g++ -O2 -std=c++17 -o /tmp/bench_8seq /tmp/bench_8seq.cpp -ldl
JIT_8SEQ=$(/tmp/bench_8seq)
echo "JIT 8-core sequential: ${JIT_8SEQ} per-core cyc/s"

# 8-core parallel (batch=10000)
cat > /tmp/bench_8par.cpp << 'BEOF'
#include <cstdio>
#include <cstdint>
#include <dlfcn.h>
struct MulticoreResult { uint64_t total_cycles; double elapsed_ms; double mcycles_per_sec; int success; };
typedef MulticoreResult (*mc_fn)(void*, int, uint64_t, int);
int main() {
void* jit = dlopen("/tmp/sparkle_hier.so", RTLD_LAZY);
void* runner = dlopen("/tmp/multicore_runner.so", RTLD_LAZY);
auto mc_run = (mc_fn)dlsym(runner, "multicore_run");
auto r = mc_run(jit, 8, 10000000, 10000);
printf("%d\n", (int)(r.mcycles_per_sec * 1000000.0));
dlclose(runner); dlclose(jit); return 0;
}
BEOF
g++ -O2 -std=c++17 -o /tmp/bench_8par /tmp/bench_8par.cpp -ldl
JIT_8PAR=$(/tmp/bench_8par)
echo "JIT 8-core parallel: ${JIT_8PAR} per-core cyc/s"

# Verilator 8-core
VLTR_8CORE=$(/tmp/litex_8core_obj/Vsim_8core 2>/dev/null)
echo "Verilator 8-core: ${VLTR_8CORE} cyc/s"

# Write results
cat <<EOF > multicore-bench-results.json
[
{ "name": "JIT 1-core single-thread", "unit": "cycles/sec", "value": ${JIT_1CORE:-0} },
{ "name": "JIT 8-core sequential", "unit": "cycles/sec", "value": ${JIT_8SEQ:-0} },
{ "name": "JIT 8-core parallel (batch=10K)", "unit": "cycles/sec", "value": ${JIT_8PAR:-0} },
{ "name": "Verilator 8-core", "unit": "cycles/sec", "value": ${VLTR_8CORE:-0} }
]
EOF
cat multicore-bench-results.json

- name: Store multi-core benchmark result
uses: benchmark-action/github-action-benchmark@v1
if: github.event_name == 'push'
with:
name: Multi-Core Benchmark (8-core LiteX PicoRV32)
tool: customBiggerIsBetter
output-file-path: multicore-bench-results.json
benchmark-data-dir-path: dev/multicore-bench
github-token: ${{ secrets.GITHUB_TOKEN }}
auto-push: true
alert-threshold: "80%"
comment-on-alert: true
fail-on-alert: false

# ---- Unit Tests ----

- name: Run full test suite (481 tests)
Expand Down
32 changes: 31 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,38 @@ Thumbs.db
*.bak
*.log

# ELF binaries and compiled objects
c_src/cdc/cdc_example
c_src/cdc/cdc_test
c_src/cdc/*.so

# Verilator build artifacts
verilator/obj_dir/
verilator/obj_dir_notrace/
verilator/generated_soc_jit.cpp
verilator/generated_soc_cppsim.h
verilator/verilator_bench

# Python virtual environment
venv/

# Planning documents
plans/
Sparkle/Verification/Generated/
Examples/CDC/gen/
verilator/generated_soc_jit.cpp

# Generated firmware (rebuild from source)
firmware/*.bin
firmware/*.elf
firmware/*.map
firmware/firmware_litex_*.hex
firmware/firmware_multest.hex
firmware/firmware_rv32i.hex
firmware/firmware_rv32im.hex
firmware/firmware_storeload.hex

# Generated Verilog fixtures (regenerate with gen_litex_multicore.py)
Tests/SVParser/fixtures/*_flat.v

# Nix shell artifacts
result
Loading
Loading