Author: Shubham Dev
Paper: The Isotropic Tradeoff (Zenodo)
This repository contains the empirical evaluation suite accompanying the paper "The Isotropic Tradeoff: How Rotation-Based Quantization Exchanges Structural Weight Integrity for Euclidean Fidelity." We explore the critical boundary in post-training quantization (PTQ), identifying where global rotation is an engineering success and where it induces measurable signal degradation.
Following community feedback, this repository has been updated to explicitly delineate the boundaries of rotation-based quantization.
1. The KV Cache Distinction (Rotation as a Feature)
Modern SOTA methods (like TurboQuant) utilize global rotation (e.g., Hadamard transforms) to isotropize the KV Cache. For the dense activations of the KV cache, where attention is computed in the rotated space, spreading quantization error as generalized noise across all channels is a highly effective trade-off for memory efficiency. We fully acknowledge the success of rotational quantization in this domain.
2. The Focus: Model Weight Quantization (Rotation as a Tradeoff)
This research focuses on the application of global rotation to Model Weights (Linear layers). While spreading error is acceptable for fluid cache retrieval, applying global rotation to the static knowledge structures of an LLM introduces a structural cost: the smearing of high-magnitude weight outliers into the broader network noise floor.
Recent research (e.g., LLM.int8()) has established that LLMs are governed by highly sparse, high-magnitude outliers that are critical for model reasoning and performance. When global orthogonal rotation is applied to these weights, it acts as a distributional mixer:
- It takes these localized, high-magnitude signals and smears them across all dimensions to minimize global Mean Squared Error (MSE).
- While this optimizes for a "flat" Euclidean reconstruction, it replaces the precise localized geometry of the weights with a uniform noise floor.
- This results in Signal Pollution (measured here as "Induced Neuronal Firings"), where previously silent neurons are forced to fire due to the redistribution of outlier energy.
In our manuscript, we reference AIME25 benchmark results published by Georgi Gerganov in the llama.cpp repository. To clarify, these results are not represented as original experimental data of this paper. They are cited as external, supportive community evidence showing that at aggressive quantization levels (Q4), rotation provides only partial recovery. We hypothesize that this residual degradation is a symptom of the weight saliency degradation our code measures. This distinction is made to clarify the division between our core theoretical/mathematical contribution and contextual third-party benchmarks.
The evaluation notebook (evaluate_isotropic_fallacy.ipynb) tests Qwen/Qwen2.5-1.5B over a 2048-token WikiText sequence.
Rotation successfully optimizes standard geometric metrics, dropping outlier reconstruction error by over 98%, at the cost of baseline noise floor degradation.
| Metric | Naive 3-bit | Global Rotation (3-bit) | Delta |
|---|---|---|---|
| Outlier MSE (Top 1%) | 157.8024 | 2.6782 | -98.3% |
| Noise Floor MSE (Bottom 90%) | 2.2721 | 2.5339 | +11.5% |
Despite improved Euclidean metrics, rotation generates a surge in false firings (>1.0 magnitude) in previously silent neuronal dimensions due to the "Blender Effect" of global mixing.
| Metric | Naive 3-bit | Global Rotation (3-bit) |
|---|---|---|
| Induced False Firings (>1.0) | 0 | 367,539 |
git clone https://github.com/pheonix-delta/llm-isotropic-tradeoff.git
cd llm-isotropic-tradeoff
pip install torch transformers datasets scipy tqdm
jupyter notebook evaluate_isotropic_fallacy.ipynb@article{dev2026isotropic,
title={The Isotropic Tradeoff: Quantization and Outlier Signal Pollution in LLM Weights},
author={Dev, Shubham},
journal={arXiv preprint},
year={2026},
doi={10.5281/zenodo.19338651},
url={https://doi.org/10.5281/zenodo.19338651}
}