Skip to content

Commit 752894c

Browse files
committed
add quamba2
1 parent 31570cf commit 752894c

18 files changed

+166
-6
lines changed

_bibliography/papers.bib

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,22 @@
11
---
22
---
33
4+
@article{chiang2025quamba2,
5+
title={Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models},
6+
author={Chiang, Hung-Yueh and Chang, Chi-Chih and Frumkin, Natalia and Wu, Kai-Chiang and Abdelfattah, Mohamed S. and Marculescu, Diana},
7+
journal={arXiv preprint arXiv:2503.22879},
8+
venue_type={arXiv},
9+
year={2025},
10+
url={https://arxiv.org/abs/2503.22879},
11+
website={https://hychiang.info/projects/quamba2/},
12+
code={https://github.com/enyac-group/Quamba},
13+
models={https://huggingface.co/ut-enyac},
14+
pdf={https://arxiv.org/pdf/2503.22879},
15+
preview={quamba2.png},
16+
bibtex_show={true},
17+
selected={true}
18+
}
19+
420
@inproceedings{chiang2025quamba,
521
title={Quamba: A Post-Training Quantization Recipe for Selective State Space Models},
622
author={Chiang*, Hung-Yueh and Chang*, Chi-Chih and Frumkin, Natalia and Wu, Kai-Chiang and Marculescu, Diana},

_data/coauthors.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
"Marculescu":
22
- firstname: ["Diana"]
33
url: http://users.ece.utexas.edu/~dianam/
4+
"Abdelfattah":
5+
- firstname: ["Mohamed S.", "Mohamed"]
6+
url: https://www.mohsaied.com/
47
"Yang":
58
- firstname: ["Yuedong"]
69
url: https://radum.ece.utexas.edu/lab-member/yuedong-yang/
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
---
2+
layout: post
3+
date: 2025-03-31
4+
inline: true
5+
---
6+
:page_with_curl: **<span style="color:red">Paper Released</span>** <br/>
7+
Our newest paper: *Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models* is released on [Arxiv](https://arxiv.org/abs/2503.22879).

_projects/quamba.md

Lines changed: 15 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ full_title: "Quamba: A Post-Training Quantization Recipe for Selective State Spa
55
authors: Hung-Yueh Chiang, Chi-Chih Chang, Natalia Frumkin, Kai-Chiang Wu, Diana Marculescu
66
description: "Quamba: A Post-Training Quantization Recipe for Mamba"
77
img: assets/img/publication_preview/quamba_blog.jpg
8-
importance: 1
8+
importance: 8
99
category: research
1010
---
1111

@@ -15,13 +15,22 @@ category: research
1515
<abbr class="badge" style="background-color:#00369f; margin-left:0.1rem; margin-right:0.1rem; font-size:1.1rem;">ICLR 2025</abbr>
1616
</div>
1717

18-
<div class="authors"> <a href="https://hychiang.info">Hung-Yueh Chiang</a><sup><sub>*</sub>1</sup>, <a href="https://ccchang.info/">Chi-Chih Chang</a><sup><sub>*</sub>2</sup>, <a href="https://www.nfrumkin.com/">Natalia Frumkin</a><sup>1</sup>, <a href="https://people.cs.nycu.edu.tw/~kcw/">Kai-Chiang Wu</a><sup>2</sup>, <a href="https://users.ece.utexas.edu/~dianam/">Diana Marculescu</a><sup>1</sup></div>
19-
<div class="authors"> <sup>1</sup> The University of Texas at Austin, <sup>2</sup>National Yang Ming Chiao Tung University</div>
18+
<div class="authors">
19+
<a href="https://hychiang.info">Hung-Yueh Chiang</a><sup><sub>*</sub>1</sup>,
20+
<a href="https://ccchang.info/">Chi-Chih Chang</a><sup><sub>*</sub>2</sup>,
21+
<a href="https://www.nfrumkin.com/">Natalia Frumkin</a><sup>1</sup>,
22+
<a href="https://people.cs.nycu.edu.tw/~kcw/">Kai-Chiang Wu</a><sup>2</sup>,
23+
<a href="https://users.ece.utexas.edu/~dianam/">Diana Marculescu</a><sup>1</sup>
24+
</div>
25+
<div class="authors">
26+
<sup>1</sup>The University of Texas at Austin,
27+
<sup>2</sup>National Yang Ming Chiao Tung University
28+
</div>
2029
<div style="text-align: center; font-family: Times;"> <sup>*</sup> Equal contribution</div>
30+
2131
<div style="text-align: center; margin-top:12px;">
22-
<a href="https://arxiv.org/abs/2410.13229"><i class="fa fa-file-pdf-o" style="font-size:24px;color"></i><b> Paper</b></a>
23-
&nbsp;
24-
<a href="https://github.com/enyac-group/Quamba"><i class="fa fa-github" style="font-size:24px;color"></i><b> Code</b></a>
32+
<a href="https://arxiv.org/abs/2410.13229"><i class="fa fa-file-pdf-o" style="font-size:24px;color"></i><b> Paper</b></a>&nbsp;
33+
<a href="https://github.com/enyac-group/Quamba"><i class="fa fa-github" style="font-size:24px;color"></i><b> Code</b></a>&nbsp;
2534
<a href="https://huggingface.co/ut-enyac"><span style="font-size: 22px;">&#129303;</span><b> Models</b></a>
2635
</div>
2736

_projects/quamba2.md

Lines changed: 125 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,125 @@
1+
---
2+
layout: page
3+
title: Quamba2
4+
full_title: "Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models"
5+
authors: Hung-Yueh Chiang, Chi-Chih Chang, Natalia Frumkin, Kai-Chiang Wu, Mohamed S. Abdelfattah, Diana Marculescu
6+
description: "A Quantization Framework for Selective State Space Models"
7+
img: assets/img/publication_preview/quamba2_blog.jpg
8+
importance: 1
9+
category: research
10+
---
11+
12+
<style>
13+
li {
14+
font-size: 1.1rem; /* Adjust as needed */
15+
}
16+
</style>
17+
18+
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css">
19+
20+
<div style="text-align: center; padding-bottom: 1rem;">
21+
<!-- <abbr class="badge" style="background-color:#00369f; margin-left:0.1rem; margin-right:0.1rem; font-size:1.1rem;">ICLR 2025</abbr> -->
22+
<abbr class="badge" style="background-color:#BF5700; margin-left:0.1rem; margin-right:0.1rem; font-size:1.1rem; width:80px; display:inline-block; text-align:center;">Arxiv</abbr>
23+
</div>
24+
25+
<div class="authors">
26+
<a href="https://hychiang.info">Hung-Yueh Chiang</a><sup>1</sup>,
27+
<a href="https://ccchang.info/">Chi-Chih Chang</a><sup>2</sup>,
28+
<a href="https://www.nfrumkin.com/">Natalia Frumkin</a><sup>1</sup>,
29+
<br>
30+
<a href="https://people.cs.nycu.edu.tw/~kcw/">Kai-Chiang Wu</a><sup>3</sup>,
31+
<a href="https://www.mohsaied.com/">Mohamed S. Abdelfattah</a><sup>2</sup>,
32+
<a href="https://users.ece.utexas.edu/~dianam/">Diana Marculescu</a><sup>1</sup>
33+
</div>
34+
<div class="authors">
35+
<sup>1</sup>The University of Texas at Austin,
36+
<sup>2</sup>Cornell University,
37+
<sup>3</sup>National Yang Ming Chiao Tung University
38+
</div>
39+
<div style="text-align: center; margin-top:12px;">
40+
<a href="https://arxiv.org/abs/2503.22879"><i class="fa fa-file-pdf-o" style="font-size:24px;color"></i><b> Paper </b></a> &nbsp;
41+
<a href="https://github.com/enyac-group/Quamba"><i class="fa fa-github" style="font-size:24px;color"></i><b> Code </b></a> &nbsp;
42+
<a href="https://huggingface.co/ut-enyac"><span style="font-size: 22px;">&#129303;</span><b> Models </b></a>
43+
</div>
44+
45+
46+
<br>
47+
<div style="text-align: center;">
48+
<p style="font-family: Comic Neue; font-size: 1.4rem;">
49+
:small_red_triangle_down: <b>4<span>&#215;</span> memory reduction</b>&nbsp; &nbsp;
50+
:rocket: <b>13 Token-per-second on Orin Nano 8G </b>
51+
</p>
52+
</div>
53+
<div class="row mt-3">
54+
<div class="col-sm-8 mt-3 mt-md-0 offset-2">
55+
{% include figure.html path="assets/img/projects/quamba2/quamba2.png" title="example image" class="img-fluid rounded z-depth-1" %}
56+
</div>
57+
</div>
58+
<br>
59+
60+
# 4-bit Mamba1 and Mamba2 blocks
61+
- **W4A8**, **W4A16**, **W4AX**, and **W8A8** for both **Mamba1** and **Mamba2**
62+
- **Headto-toe (H2T)** 4/8-bit quantization from the embedding layer, SSM blocks, to the final output layer
63+
<div class="row">
64+
<div class="col-sm mt-3 mt-md-0">
65+
{% include gif.html path="assets/img/projects/quamba2/quamba2_supports.jpg" title="example image" class="img-fluid rounded z-depth-1" %}
66+
</div>
67+
</div>
68+
<br>
69+
70+
# Storage reduction
71+
- Achieve **4** $$\times$$ memory reduction by Head-to-toe (H2T) 4-bit quantization
72+
- Enable deploying Mamba2-8B on **Nano 8G**
73+
<div class="row mt-3">
74+
<div class="col-sm-10 mt-3 mt-md-0 offset-1">
75+
{% include figure.html path="assets/img/projects/quamba2/quamba2_size_2.png" title="example image" class="img-fluid rounded z-depth-1" %}
76+
</div>
77+
</div>
78+
<br>
79+
80+
# End-to-end latency speedup
81+
- Speedup the generation by **3** $$\times$$ on the A5000 GPU
82+
- Run **13** tokens/second on **Nano 8G**
83+
<div class="row mt-4">
84+
<div class="col-sm mt-3 mt-md-0 offset-0">
85+
{% include figure.html path="assets/img/projects/quamba2/quamba2_latency_2.jpg" title="example image" class="img-fluid rounded z-depth-1" %}
86+
</div>
87+
</div>
88+
<br>
89+
90+
# Generalization and robustness
91+
We search **W4A**$$X$$-mixed (the last row in red) to improve the generalization and robustness for low bit-width SSMs. We evaluate low bit-width SSMs on the large multitask dataset MMLU.
92+
<div class="row mt-4">
93+
<div class="col-sm mt-3 mt-md-0 offset-0">
94+
{% include figure.html path="assets/img/projects/quamba2/quamba2_searched_2.jpg" title="example image" class="img-fluid rounded z-depth-1" %}
95+
</div>
96+
</div>
97+
<br>
98+
99+
100+
# Zero-shot evaluation
101+
<div class="row mt-4">
102+
<div class="col-sm-10 mt-4 mt-md-0 offset-1">
103+
{% include figure.html path="assets/img/projects/quamba2/quamba2_main_table.png" title="example image" class="img-fluid rounded z-depth-1" %}
104+
</div>
105+
</div>
106+
<br>
107+
108+
109+
110+
111+
# Citation
112+
{% raw %}
113+
```latex
114+
@article{chiang2025quamba2,
115+
title={Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models},
116+
author={Chiang, Hung-Yueh and Chang, Chi-Chih and Frumkin, Natalia and Wu, Kai-Chiang, Abdelfattah, Mohamed S. and Marculescu, Diana},
117+
journal={arXiv preprint arXiv:2503.22879},
118+
year={2025}
119+
}
120+
```
121+
{% endraw %}
122+
123+
<br>
124+
# Acknowledgements
125+
This work was supported in part by the ONR Minerva program, NSF CCF Grant No. 2107085, iMAGiNE - the Intelligent Machine Engineering Consortium at UT Austin, UT Cockrell School of Engineering Doctoral Fellowships, NSF Grant No. 2339084, and Taiwan’s NSTC Grant No. 111-2221-E-A49-148-MY3.
142 KB
Loading
96.3 KB
Loading
819 KB
Loading
1.37 MB
Loading
473 KB
Loading

0 commit comments

Comments
 (0)