A RISC-V CPU simulator for computer architecture education, based on the RV64-IM ISA. Supports functional interpretation, in-order pipeline, out-of-order execution (Tomasulo + ROB), branch prediction, multi-core simulation, and a two-level cache hierarchy — suitable for architecture lab experiments and performance analysis.
Compatible with NJU AbstractMachine (AM) (ARCH=riscv64-nemu). Tested with Super Mario Bros. via LiteNES.
- RV64-IM: Base integer ISA + multiply/divide extension
- Zicsr: CSR instructions (
mhartid,mcycle, etc.), supportsrdcyclefor benchmarking - System instructions: FENCE, FENCE.I, ECALL, MRET, EBREAK
| Mode | Description | Flag |
|---|---|---|
| Functional (default) | Interpreted execution; correctness baseline | (default) |
| In-order pipeline | 5-stage IF/ID/EX/MEM/WB with hazard handling | --inorder |
| Out-of-order (Tomasulo) | ROB + reservation stations + physical register file + RAT/RRAT | --ooo |
- Branch Target Buffer (BTB)
- Tournament predictor: local history (LHT/LPHT) + global history (GHR/GPHT) + meta selector
- Two-level cache: private L1I/L1D per core + shared L2
- Configurable sets (
sbits) and ways (w); fixed 64-byte cache lines - LRU replacement, write-back + write-allocate
- Dual-hart simulation with independent L1I/L1D, pipeline/OOO state, and branch predictor per core
- Write-invalidate cache coherence protocol
- FENCE instruction enforces memory ordering across OOO cores
- Serial (UART output for bare-metal programs)
- Flash, RTC, VGA, Keyboard
sudo apt install gcc g++ make libsdl2-dev libreadline-dev llvm-11-devbrew install llvm sdl2 readline riscv-gnu-toolchain python3
pip3 install kconfiglibmacOS notes:
- The build system auto-detects macOS (
uname -s) and switches toclang/clang++with Homebrew paths.- LLVM 20+ is supported (compatible with LLVM ≥ 11).
kconfiglibreplaces Linux-specificmconf/confformake menuconfig.
makeThe simulator binary is produced at build/sustemu.
make menuconfig # interactive Kconfig menu; writes include/generated/autoconf.hmake test # builds test/kernel.bin and runs it; exit 0 = pass./build/sustemu <image.bin> # functional mode
./build/sustemu --inorder <image.bin> # in-order pipeline
./build/sustemu --ooo --bpred <image.bin> # OOO + branch predictor
./build/sustemu --ooo --bpred --dual <image.bin> # dual-core OOO
./build/sustemu -b -e <image.elf> <image.bin> # with ELF symbol infomake bench # functional / in-order / in-order+bpred / OOO+bpred on kernel
make bench-dhrystone # Dhrystone 2.1 across all modes
make bench-dual # dual-core OOO + bpred on independent Dhrystone workloadsSUSTemu can run a full NES emulator (LiteNES) as a RISC-V workload, which provides a realistic stress test of the pipeline and branch predictor.
The RISC-V cross-compiler and SDL2 must be installed (see Installation).
A pre-built binary is included at litenes/build/; rebuild with:
AM_HOME=$(pwd)/am make -C litenes ARCH=riscv64-nemu| Target | Mode | Command |
|---|---|---|
| OOO + branch predictor (default) | --ooo --bpred |
make run-mario |
| In-order pipeline + bpred | --inorder --bpred |
make run-mario-inorder |
| Functional interpreter | (default) | make run-mario-functional |
| OOO + difftest | --ooo --bpred --difftest |
make run-mario-difftest |
make run-marioOn Linux, the emulator is pinned to core 0 via
taskset -c 0for consistent timing. On macOS, SDL2 renders the NES display in a native window; ensure a display is available.
All lab assignments live under labs/. Each lab contains numbered questions (Q1, Q2, …) with a Makefile providing run, run-inorder, run-ooo, etc. targets. Build the simulator first (make in the repo root) before running any lab.
Explores instruction-level parallelism, Tomasulo scheduling, and memory-latency hiding under OOO execution.
| Question | Topic |
|---|---|
Q1 (ilp/) |
ILP: in-order vs OOO throughput on independent instruction chains |
Q2 (tomasulo/) |
Tomasulo scheduling: RAW dependence chains, ROB commit order |
Q3 (memlat/) |
Memory-latency tolerance: pointer-chasing under in-order vs OOO |
cd labs/ooo_lab/Q1/ilp && make run-inorder # in-order baseline
cd labs/ooo_lab/Q1/ilp && make run-ooo # OOO executionMeasures branch misprediction costs and evaluates predictor designs across synthetic and real-world branch patterns.
| Question | Topic |
|---|---|
| Q1 | Misprediction penalty: OOO without vs with branch predictor |
| Q2 | Loop unrolling and its effect on branch frequency |
| Q3 | In-order vs OOO misprediction recovery cost |
| Q4 | Tournament predictor vs confidence-fusion predictor |
| Q5 | Spectre-style speculative execution side-channel |
cd labs/predictor_lab/Q1 && make run-bpredDemonstrates relaxed memory ordering on the OOO dual-core simulator and the role of FENCE instructions.
| Question | Topic |
|---|---|
| Q1 | Litmus test (store-load reordering) with and without FENCE |
| Q2 | Shared counter: race conditions under relaxed ordering |
| Q3 | Peterson's mutual-exclusion algorithm: correctness requires FENCE |
cd labs/order_lab/Q1 && make run # no fence (may observe reordering)
cd labs/order_lab/Q1 && make run-fence # with FENCE (sequentially consistent)Measures cache performance effects on memory-bound workloads.
| Question | Topic |
|---|---|
Q1 (stream/) |
Memory bandwidth: sequential streaming access |
| Q2 | Matrix multiplication: cache-friendly vs naive layout |
| Q3 | Sequential scan: working-set size vs L1/L2 capacity |
cd labs/cache_lab/Q2 && make runLicensed under the Mulan Permissive Software License, Version 2 (Mulan PSL v2).
