|
| 1 | +--- |
| 2 | +layout: page |
| 3 | +title: UniQL |
| 4 | +full_title: "UniQL: Unified Quantization and Low-rank Compression for Adaptive Edge LLMs" |
| 5 | +authors: Hung-Yueh Chiang, Chi-Chih Chang, Yu-Chen Lu, Chien-Yu Lin, Kai-Chiang Wu, Mohamed S. Abdelfattah, Diana Marculescu |
| 6 | +description: "Quantization and Low-rank Compression for Edge LLMs" |
| 7 | +img: assets/img/publication_preview/uniql_blog.jpg |
| 8 | +importance: 1 |
| 9 | +category: research |
| 10 | +--- |
| 11 | + |
| 12 | +<style> |
| 13 | +li { |
| 14 | + font-size: 1.1rem; /* Adjust as needed */ |
| 15 | +} |
| 16 | +</style> |
| 17 | + |
| 18 | +<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css"> |
| 19 | + |
| 20 | +<div style="text-align: center; padding-bottom: 1rem;"> |
| 21 | +<!-- <abbr class="badge" style="background-color:#00369f; margin-left:0.1rem; margin-right:0.1rem; font-size:1.1rem;">Arxiv</abbr> --> |
| 22 | +<abbr class="badge" style="background-color:#BF5700; margin-left:0.1rem; margin-right:0.1rem; font-size:1.1rem; width:80px; display:inline-block; text-align:center;">Arxiv</abbr> |
| 23 | +</div> |
| 24 | + |
| 25 | +<div class="authors"> |
| 26 | + <a href="https://hychiang.info">Hung-Yueh Chiang</a><sup>1</sup>, |
| 27 | + <a href="https://ccchang.info/">Chi-Chih Chang</a><sup>2</sup>, |
| 28 | + <a href="https://scholar.google.com/citations?user=iTlHWjwAAAAJ&hl=zh-TW">Yu-Chen Lu</a><sup>3</sup>, |
| 29 | + <a href="https://cylinbao.github.io/">Chien-Yu Lin</a><sup>4</sup>, |
| 30 | + <br> |
| 31 | + <a href="https://people.cs.nycu.edu.tw/~kcw/">Kai-Chiang Wu</a><sup>3</sup>, |
| 32 | + <a href="https://www.mohsaied.com/">Mohamed S. Abdelfattah</a><sup>2</sup>, |
| 33 | + <a href="https://users.ece.utexas.edu/~dianam/">Diana Marculescu</a><sup>1</sup> |
| 34 | +</div> |
| 35 | +<div class="authors"> |
| 36 | + <sup>1</sup>The University of Texas at Austin, |
| 37 | + <sup>2</sup>Cornell University, <br> |
| 38 | + <sup>3</sup>National Yang Ming Chiao Tung University |
| 39 | + <sup>4</sup>University of Washington |
| 40 | +</div> |
| 41 | +<div style="text-align: center; margin-top:12px;"> |
| 42 | + <a href="https://arxiv.org/abs/2512.03383"><i class="fa fa-file-pdf-o" style="font-size:24px;color"></i><b> Paper </b></a> |
| 43 | + <a href="https://github.com/enyac-group/UniQL"><i class="fa fa-github" style="font-size:24px;color"></i><b> Code </b></a> |
| 44 | + <a href="https://huggingface.co/ut-enyac"><span style="font-size: 22px;">🤗</span><b> Models </b></a> |
| 45 | +</div> |
| 46 | + |
| 47 | + |
| 48 | +<br> |
| 49 | +<div style="text-align: center;"> |
| 50 | +<p style="font-family: Comic Neue; font-size: 1.4rem;"> |
| 51 | + 📚 Unified support Transformers, SSMs, and hybrid models <br> |
| 52 | + 🔗 One-pass framework for quantization + structured low-rank pruning <br> |
| 53 | + ⚡ <strong>2.7×–3.4×</strong> latency speedups, <strong>4×–5.7×</strong> memory reductions <br> |
| 54 | +</p> |
| 55 | +</div> |
| 56 | +<div class="row mt-3"> |
| 57 | + <div class="col-sm-12 mt-3 mt-md-0 offset-0"> |
| 58 | + {% include figure.html path="assets/img/projects/uniql/uniql.png" title="example image" class="img-fluid rounded z-depth-1" %} |
| 59 | + </div> |
| 60 | +</div> |
| 61 | +<br> |
| 62 | + |
| 63 | +# Supporting for Transformer and Mamba blocks |
| 64 | +- Joint weight decomposition. (The group of weights is shown in the same background color) |
| 65 | +<div class="row"> |
| 66 | + <div class="col-sm mt-3 mt-md-0"> |
| 67 | + {% include gif.html path="assets/img/projects/uniql/modular.png" title="example image" class="img-fluid rounded z-depth-1" %} |
| 68 | + </div> |
| 69 | +</div> |
| 70 | +<br> |
| 71 | + |
| 72 | +# Joint design quantization and structured pruning |
| 73 | +- Fused RoPE to support and accelerate pruned Q and K |
| 74 | +- Quantization-aware SVD decomposition to reduce the quantization errors |
| 75 | +<div class="row mt-3"> |
| 76 | + <div class="col-sm-10 mt-3 mt-md-0 offset-1"> |
| 77 | + {% include figure.html path="assets/img/projects/uniql/kernel_svd.png" title="example image" class="img-fluid rounded z-depth-1" %} |
| 78 | + </div> |
| 79 | +</div> |
| 80 | +<br> |
| 81 | + |
| 82 | +# One-pass framework supporting all pruning rates |
| 83 | +- (a) <b>Pseudo-inverse-free</b>, <b>quantization-aware</b>, and <b>state-aware</b> matrix decomposition methods for the grouped weights to obtain sorted weights |
| 84 | +- (b) During fine-tuning, we sample global pruning rates, and masked out the weight channels |
| 85 | +- (c) The refined patches are fused into the weights, followed by model quantization |
| 86 | +for deployment |
| 87 | +- (d) Based on the system utilization, we perform <b>on-device adaptive pruning</b> of the quantized model. |
| 88 | +<div class="row mt-4"> |
| 89 | + <div class="col-sm mt-3 mt-md-0 offset-0"> |
| 90 | + {% include figure.html path="assets/img/projects/uniql/one-pass.png" title="example image" class="img-fluid rounded z-depth-1" %} |
| 91 | + </div> |
| 92 | +</div> |
| 93 | +<br> |
| 94 | + |
| 95 | + |
| 96 | + |
| 97 | +# Main results |
| 98 | +<div class="row mt-4"> |
| 99 | + <div class="col-sm-10 mt-4 mt-md-0 offset-1"> |
| 100 | + {% include figure.html path="assets/img/projects/uniql/main_results.png" title="example image" class="img-fluid rounded z-depth-1" %} |
| 101 | + </div> |
| 102 | +</div> |
| 103 | +<br> |
| 104 | + |
| 105 | + |
| 106 | + |
| 107 | + |
| 108 | +# Citation |
| 109 | +{% raw %} |
| 110 | +```latex |
| 111 | +@article{chiang2025uniql, |
| 112 | + title={UniQL: Unified Quantization and Low-rank Compression for Adaptive Edge LLMs}, |
| 113 | + author={Chiang, Hung-Yueh and Chang, Chi-Chih and Lu, Yu-Chen and Lin, Chien-Yu and Wu, Kai-Chiang and Abdelfattah, Mohamed S. and Marculescu, Diana}, |
| 114 | + journal={arXiv preprint arXiv:2512.03383}, |
| 115 | + year={2025}, |
| 116 | +} |
| 117 | +
|
| 118 | +``` |
| 119 | +{% endraw %} |
| 120 | + |
| 121 | +<br> |
| 122 | +# Acknowledgements |
| 123 | +This work was supported in part by the ONR Minerva program, NSF CCF Grant No. 2107085, iMAGiNE - the Intelligent Machine Engineering Consortium at UT Austin, UT Cockrell School of Engineering Doctoral Fellowships, NSF CAREER Grant No. 2339084, Nvidia research gift, and Taiwan’s NSTC Grant No. 111-2221-E-A49-148-MY3. |
0 commit comments