forked from OpenXiangShan/XiangShan
-
Notifications
You must be signed in to change notification settings - Fork 3
Open
0 / 10 of 1 issue completedDescription
Infra
- Verilator multi-threaded simulation
- Trace based Difftest
- ISA simulator #9
- Online Difftest
Functionality
Async GEMM
- XS pipeline modification (decoding, scheduling, and commit)
- RhyMAX modification (mrelease support)
- Monitor PUT queue in HBL2
- Test with attention
- Test with quant/dequant
Testcases
- M=N=64, K=256
- M=N=128, K=256
- Case with epilogue
| M | K | N | Status |
|---|---|---|---|
| 512+32 | 512 | 512 | Testcase not ready |
| 512 | 512+32 | 512 | Testcase not ready |
| 512 | 512 | 512+32 | Testcase not ready |
| 512+32 | 512+32 | 512 | Testcase not ready |
| 512 | 512+32 | 512+32 | Testcase not ready |
| 512+32 | 512 | 512+32 | Testcase not ready |
| 512+32 | 512+32 | 512+32 | Testcase not ready |
Low precision
- int8 DPA
- mxfp8 DPA
- nvfp4 DPA
- basic tile register
- tile register with e8m0/ue4m3 scaling factor
- datapath for basic tile A/B load
- datapath for tile A/B load with scaling factor
GEMM Performance test
Compute bound cases
- M=K=N=512/1024/2048/4096/8192 #12
- K=512,M=N=512/1024/2048/4096/8192
Memory bound cases
- M=N=64/128/256, K=4096
- M=64, K=2048, N=8192
- M=64, K=16x1024, N=7x1024
HBL2
- Create dedicated repo for HBL2
- Support back pressure with grant buffer: testing
- Support native Put via A-Channel #13
- Solve routing problem on shifter of 512B: not started
- Support CHI
- Integrate HBL2 with Zhujiang
- Improve L2-L3 bandwidth
CUTE
Reactions are currently unavailable
Sub-issues
Metadata
Metadata
Assignees
Labels
No labels