Skip to content

Commit b9f3a26

Browse files
committed
add uniql
1 parent 8ae8408 commit b9f3a26

File tree

9 files changed

+138
-2
lines changed

9 files changed

+138
-2
lines changed

_bibliography/papers.bib

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,19 @@
11
---
22
---
3-
3+
4+
@article{chiang2025uniql,
5+
title={UniQL: Unified Quantization and Low-rank Compression for Adaptive Edge LLMs},
6+
author={Chiang, Hung-Yueh and Chang, Chi-Chih and Lu, Yu-Chen and Lin, Chien-Yu and Wu, Kai-Chiang and Abdelfattah, Mohamed S. and Marculescu, Diana},
7+
journal={arXiv preprint arXiv:2512.03383},
8+
venue_type={arXiv},
9+
venue_url={https://arxiv.org/abs/2512.03383},
10+
year={2025},
11+
pdf={https://arxiv.org/pdf/2512.03383.pdf},
12+
preview={uniql_blog.jpg},
13+
bibtex_show={true},
14+
selected={true},
15+
}
16+
417
@inproceedings{chiang2025quamba2,
518
title={Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models},
619
author={Chiang, Hung-Yueh and Chang, Chi-Chih and Frumkin, Natalia and Wu, Kai-Chiang and Abdelfattah, Mohamed S. and Marculescu, Diana},

_projects/quamba2.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ full_title: "Quamba2: A Robust and Scalable Post-training Quantization Framework
55
authors: Hung-Yueh Chiang, Chi-Chih Chang, Natalia Frumkin, Kai-Chiang Wu, Mohamed S. Abdelfattah, Diana Marculescu
66
description: "A Quantization Framework for Selective State Space Models"
77
img: assets/img/publication_preview/quamba2_blog.jpg
8-
importance: 1
8+
importance: 7
99
category: research
1010
---
1111

_projects/uniql.md

Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
---
2+
layout: page
3+
title: UniQL
4+
full_title: "UniQL: Unified Quantization and Low-rank Compression for Adaptive Edge LLMs"
5+
authors: Hung-Yueh Chiang, Chi-Chih Chang, Yu-Chen Lu, Chien-Yu Lin, Kai-Chiang Wu, Mohamed S. Abdelfattah, Diana Marculescu
6+
description: "Quantization and Low-rank Compression for Edge LLMs"
7+
img: assets/img/publication_preview/uniql_blog.jpg
8+
importance: 1
9+
category: research
10+
---
11+
12+
<style>
13+
li {
14+
font-size: 1.1rem; /* Adjust as needed */
15+
}
16+
</style>
17+
18+
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css">
19+
20+
<div style="text-align: center; padding-bottom: 1rem;">
21+
<!-- <abbr class="badge" style="background-color:#00369f; margin-left:0.1rem; margin-right:0.1rem; font-size:1.1rem;">Arxiv</abbr> -->
22+
<abbr class="badge" style="background-color:#BF5700; margin-left:0.1rem; margin-right:0.1rem; font-size:1.1rem; width:80px; display:inline-block; text-align:center;">Arxiv</abbr>
23+
</div>
24+
25+
<div class="authors">
26+
<a href="https://hychiang.info">Hung-Yueh Chiang</a><sup>1</sup>,
27+
<a href="https://ccchang.info/">Chi-Chih Chang</a><sup>2</sup>,
28+
<a href="https://scholar.google.com/citations?user=iTlHWjwAAAAJ&hl=zh-TW">Yu-Chen Lu</a><sup>3</sup>,
29+
<a href="https://cylinbao.github.io/">Chien-Yu Lin</a><sup>4</sup>,
30+
<br>
31+
<a href="https://people.cs.nycu.edu.tw/~kcw/">Kai-Chiang Wu</a><sup>3</sup>,
32+
<a href="https://www.mohsaied.com/">Mohamed S. Abdelfattah</a><sup>2</sup>,
33+
<a href="https://users.ece.utexas.edu/~dianam/">Diana Marculescu</a><sup>1</sup>
34+
</div>
35+
<div class="authors">
36+
<sup>1</sup>The University of Texas at Austin,
37+
<sup>2</sup>Cornell University, <br>
38+
<sup>3</sup>National Yang Ming Chiao Tung University
39+
<sup>4</sup>University of Washington
40+
</div>
41+
<div style="text-align: center; margin-top:12px;">
42+
<a href="https://arxiv.org/abs/2512.03383"><i class="fa fa-file-pdf-o" style="font-size:24px;color"></i><b> Paper </b></a> &nbsp;
43+
<a href="https://github.com/enyac-group/UniQL"><i class="fa fa-github" style="font-size:24px;color"></i><b> Code </b></a> &nbsp;
44+
<a href="https://huggingface.co/ut-enyac"><span style="font-size: 22px;">&#129303;</span><b> Models </b></a>
45+
</div>
46+
47+
48+
<br>
49+
<div style="text-align: center;">
50+
<p style="font-family: Comic Neue; font-size: 1.4rem;">
51+
📚 Unified support Transformers, SSMs, and hybrid models <br>
52+
🔗 One-pass framework for quantization + structured low-rank pruning <br>
53+
⚡ <strong>2.7×–3.4×</strong> latency speedups, <strong>4×–5.7×</strong> memory reductions <br>
54+
</p>
55+
</div>
56+
<div class="row mt-3">
57+
<div class="col-sm-12 mt-3 mt-md-0 offset-0">
58+
{% include figure.html path="assets/img/projects/uniql/uniql.png" title="example image" class="img-fluid rounded z-depth-1" %}
59+
</div>
60+
</div>
61+
<br>
62+
63+
# Supporting for Transformer and Mamba blocks
64+
- Joint weight decomposition. (The group of weights is shown in the same background color)
65+
<div class="row">
66+
<div class="col-sm mt-3 mt-md-0">
67+
{% include gif.html path="assets/img/projects/uniql/modular.png" title="example image" class="img-fluid rounded z-depth-1" %}
68+
</div>
69+
</div>
70+
<br>
71+
72+
# Joint design quantization and structured pruning
73+
- Fused RoPE to support and accelerate pruned Q and K
74+
- Quantization-aware SVD decomposition to reduce the quantization errors
75+
<div class="row mt-3">
76+
<div class="col-sm-10 mt-3 mt-md-0 offset-1">
77+
{% include figure.html path="assets/img/projects/uniql/kernel_svd.png" title="example image" class="img-fluid rounded z-depth-1" %}
78+
</div>
79+
</div>
80+
<br>
81+
82+
# One-pass framework supporting all pruning rates
83+
- (a) <b>Pseudo-inverse-free</b>, <b>quantization-aware</b>, and <b>state-aware</b> matrix decomposition methods for the grouped weights to obtain sorted weights
84+
- (b) During fine-tuning, we sample global pruning rates, and masked out the weight channels
85+
- (c) The refined patches are fused into the weights, followed by model quantization
86+
for deployment
87+
- (d) Based on the system utilization, we perform <b>on-device adaptive pruning</b> of the quantized model.
88+
<div class="row mt-4">
89+
<div class="col-sm mt-3 mt-md-0 offset-0">
90+
{% include figure.html path="assets/img/projects/uniql/one-pass.png" title="example image" class="img-fluid rounded z-depth-1" %}
91+
</div>
92+
</div>
93+
<br>
94+
95+
96+
97+
# Main results
98+
<div class="row mt-4">
99+
<div class="col-sm-10 mt-4 mt-md-0 offset-1">
100+
{% include figure.html path="assets/img/projects/uniql/main_results.png" title="example image" class="img-fluid rounded z-depth-1" %}
101+
</div>
102+
</div>
103+
<br>
104+
105+
106+
107+
108+
# Citation
109+
{% raw %}
110+
```latex
111+
@article{chiang2025uniql,
112+
title={UniQL: Unified Quantization and Low-rank Compression for Adaptive Edge LLMs},
113+
author={Chiang, Hung-Yueh and Chang, Chi-Chih and Lu, Yu-Chen and Lin, Chien-Yu and Wu, Kai-Chiang and Abdelfattah, Mohamed S. and Marculescu, Diana},
114+
journal={arXiv preprint arXiv:2512.03383},
115+
year={2025},
116+
}
117+
118+
```
119+
{% endraw %}
120+
121+
<br>
122+
# Acknowledgements
123+
This work was supported in part by the ONR Minerva program, NSF CCF Grant No. 2107085, iMAGiNE - the Intelligent Machine Engineering Consortium at UT Austin, UT Cockrell School of Engineering Doctoral Fellowships, NSF CAREER Grant No. 2339084, Nvidia research gift, and Taiwan’s NSTC Grant No. 111-2221-E-A49-148-MY3.
407 KB
Loading
302 KB
Loading
331 KB
Loading
285 KB
Loading
887 KB
Loading
419 KB
Loading

0 commit comments

Comments
 (0)