Skip to content

XSAI Roadmap 2025H2 #4

@shinezyy

Description

@shinezyy

Infra

  • Verilator multi-threaded simulation
  • Trace based Difftest
  • ISA simulator #9
  • Online Difftest

Functionality

Async GEMM

  • XS pipeline modification (decoding, scheduling, and commit)
  • RhyMAX modification (mrelease support)
  • Monitor PUT queue in HBL2
  • Test with attention
  • Test with quant/dequant

Testcases

  • M=N=64, K=256
  • M=N=128, K=256
  • Case with epilogue
M K N Status
512+32 512 512 Testcase not ready
512 512+32 512 Testcase not ready
512 512 512+32 Testcase not ready
512+32 512+32 512 Testcase not ready
512 512+32 512+32 Testcase not ready
512+32 512 512+32 Testcase not ready
512+32 512+32 512+32 Testcase not ready

Low precision

  • int8 DPA
  • mxfp8 DPA
  • nvfp4 DPA
  • basic tile register
  • tile register with e8m0/ue4m3 scaling factor
  • datapath for basic tile A/B load
  • datapath for tile A/B load with scaling factor

GEMM Performance test

Compute bound cases

Memory bound cases

  • M=N=64/128/256, K=4096
  • M=64, K=2048, N=8192
  • M=64, K=16x1024, N=7x1024

HBL2

  • Create dedicated repo for HBL2
  • Support back pressure with grant buffer: testing
  • Support native Put via A-Channel #13
  • Solve routing problem on shifter of 512B: not started
  • Support CHI
  • Integrate HBL2 with Zhujiang
  • Improve L2-L3 bandwidth

CUTE

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions