A comprehensive library for matrix-vector multiplication (GEMV) based on different lattice quantizers, including nested lattice quantizers and hierarchical nested lattice quantizers. This library provides efficient implementations of quantized matrix-vector operations with support for multiple processing strategies and adaptive quantization.
GEMV-Q provides efficient implementations of:
- Matrix-Vector Multiplication: Efficient GEMV operations using quantized matrices with different lattice quantizers
- Lattice Quantization: Quantizing vectors to points on lattices (D4, A2, E8, Z2, Z3) using latticequant library
- Nested Lattice Quantization: Single-level quantization for matrix compression
- Hierarchical Nested Lattice Quantization: Multi-level quantization for progressive refinement
- Coarse-to-Fine Decoding: Progressive reconstruction from coarse to fine levels
- Adaptive Quantization: Dynamic quantization based on data characteristics
- Multiple Processing Strategies: Column-wise, row-wise, SVD-based, and lookup table approaches
- Quantized Matrix Operations: Efficient matrix-vector multiplication using lattice-quantized matrices
- Multiple Processing Strategies: Column-wise, row-wise, SVD-based, and lookup table approaches
- Coarse-to-Fine Decoding: Progressive reconstruction from coarse to fine levels
- Monotonic Error Reduction: Error decreases as more quantization levels are used
- Variable Depth Decoding: Support for decoding at different quantization levels
- Adaptive Processing: Dynamic strategy selection based on data characteristics
- Nested Lattice Quantizer (NLQ): Single-level quantization for matrix compression
- Hierarchical Nested Lattice Quantizer (HNLQ): Multi-level quantization for progressive refinement
- Multiple Lattice Types: Support for D4, A2, E8, Z2, Z3 lattices from latticequant library
- D4: 4-dimensional checkerboard lattice
- A2: 2-dimensional hexagonal lattice
- E8: 8-dimensional Gosset lattice
- Z2/Z3: Integer lattices
- Extensive test suite with categorized tests
- Uniform random variable testing for controlled analysis
- Scaled matrix testing for better hierarchical behavior
- Error type analysis (cumulative vs tile-specific)
- Performance benchmarking and validation
- Layer-wise Histogram Processing: Adaptive quantization based on data distribution
- Dynamic Bit Allocation: Optimal bit rate allocation per column
- Sparsity Exploitation: Efficient handling of sparse vectors
- Rate-Distortion Optimization: Adaptive quantization for optimal performance
# Clone the repository
git clone <repository-url>
cd gemvq
# Install the package in development mode
pip install -e .
# Install dependencies
pip install -r requirements.txt
# For development
pip install -r requirements-dev.txtAfter installation, you can import the package directly:
# Clean imports from the installed package
from gemvq import NLQ, HNLQ, get_d4
from gemvq.gemv.columnwise import ColumnwiseMatVecProcessorimport numpy as np
from gemvq.gemv.columnwise import ColumnwiseMatVecProcessor
# Create a matrix and quantize it using D4 lattice
matrix = np.random.uniform(0, 1, (100, 50)) * 256 # Scale by q^M
processor = ColumnwiseMatVecProcessor(matrix, 'D4', M=3)
# Perform matrix-vector multiplication with coarse-to-fine decoding
vector = np.random.uniform(0, 1, 50) * 256
result = processor.multiply_coarse_to_fine(vector, max_level=0) # Use 1 level
print(f"Matrix-vector multiplication result shape: {result.shape}")# Using nested lattice quantizer (single-level)
from gemvq import NLQ, get_d4
G = get_d4()
nested_quantizer = NLQ(G=G, Q_nn=closest_point_Dn, q=4, beta=0.2, alpha=1/3, eps=1e-8)
# Using hierarchical nested lattice quantizer (multi-level)
from gemvq import HNLQ, closest_point_Dn
hierarchical_quantizer = HNLQ(
G=G, Q_nn=closest_point_Dn, q=4, beta=0.2,
alpha=1/3, eps=1e-8, dither=np.zeros(4), M=3
)
# Encode and decode with progressive refinement
x = np.random.uniform(0, 1, 4) * 64 # Scale by q^M
b_list, T = hierarchical_quantizer.encode(x, with_dither=False)
# Decode at different levels for progressive quality
for level in range(3):
reconstruction = hierarchical_quantizer.decode_coarse_to_fine(b_list, T, with_dither=False, max_level=level)
error = np.linalg.norm(reconstruction - x) / np.linalg.norm(x)
print(f"Level {level}: Error = {error:.6f}")from gemvq import get_d4
from gemvq.gemv.adaptive_processor import AdaptiveProcessor
# Create adaptive processor with D4 lattice
G = get_d4()
processor = AdaptiveProcessor(G=G, q=4, beta=0.2, M=3)
# Create matrix and vector
matrix = np.random.uniform(0, 1, (100, 50)) * 256
vector = np.random.uniform(0, 1, 50) * 256
# Perform adaptive matrix-vector multiplication
result = processor.adaptive_matvec(matrix, vector, target_rate=2.0)
print(f"Adaptive GEMV result shape: {result.shape}")gemvq/
├── 📁 gemvq/ # Main package (installable)
│ ├── 📁 quantizers/ # Quantization core
│ │ ├── nlq.py # Single-level quantization
│ │ ├── hnlq.py # Multi-level quantization
│ │ ├── utils.py # Lattice utilities (D4, A2, E8, etc.)
│ │ └── __init__.py
│ │
│ ├── 📁 gemv/ # Matrix-vector multiplication
│ │ ├── 📁 columnwise/ # Column-wise GEMV implementations
│ │ │ ├── columnwise_matvec_processor.py # Main column-wise processor
│ │ │ ├── columnwise_matvec_factory.py # Factory functions
│ │ │ ├── columnwise_processor.py # Column-wise processing logic
│ │ │ ├── column_wise_gemv.py # Column-wise GEMV implementation
│ │ │ ├── simple_columnwise_matvec.py # Simplified column-wise processor
│ │ │ ├── standard_dot_processor.py # Standard dot product
│ │ │ └── __init__.py
│ │ │
│ │ ├── 📁 rowwise/ # Row-wise GEMV implementations
│ │ │ ├── rowwise_processor.py # Row-wise processing logic
│ │ │ ├── row_wise_gemv.py # Row-wise GEMV implementation
│ │ │ └── __init__.py
│ │ │
│ │ ├── 📁 svd/ # SVD-based GEMV
│ │ │ ├── svd_gemv_processor.py # SVD decomposition approach
│ │ │ └── __init__.py
│ │ │
│ │ ├── 📁 utils/ # GEMV utilities
│ │ │ ├── padder.py # Matrix padding utilities
│ │ │ ├── lookup_table_processor.py # Lookup table approach
│ │ │ └── __init__.py
│ │ │
│ │ ├── gemv_processor.py # Base processor interface
│ │ ├── gemv_factory.py # Factory for creating GEMV processors
│ │ ├── adaptive_processor.py # Adaptive processing
│ │ └── __init__.py
│ │
│ └── __init__.py # Main package initialization
│
├── 📁 tests/ # Comprehensive test suite
│ ├── run_all_tests.py # Main test runner with categories
│ ├── test_nested_lattice_quantizer.py # Core quantization tests
│ ├── test_hierarchical_*.py # Hierarchical quantization tests
│ ├── test_columnwise_matvec_options.py # Column-wise GEMV tests
│ ├── test_adaptive_matvec.py # Adaptive GEMV tests
│ ├── test_layer_wise_histogram.py # Layer-wise histogram tests
│ ├── test_d4_lattice_simulation.py # D4 lattice simulation tests
│ ├── test_decoding_parameter.py # Decoding parameter tests
│ └── __init__.py
│
├── 📁 examples/ # Usage examples and demonstrations
│ ├── demo_coarse_to_fine.py # Coarse-to-fine decoding demo
│ ├── example_adaptive_matvec.py # Adaptive GEMV example
│ ├── example_coarse_to_fine.py # Coarse-to-fine example
│ ├── analyze_rate_distortion_results.py # Rate-distortion analysis
│ ├── compare_quantizer_distortion.py # Quantizer comparison
│ ├── estimate_*.py # Estimation examples
│ ├── plot_*.py # Visualization examples
│ ├── demo_hnlq_*.py # HNLQ demonstrations
│ ├── d4_nested_lattice_manim.py # D4 lattice visualization
│ └── __init__.py
│
├── 📁 docs/ # Documentation and analysis
│ ├── 📁 gemv/ # GEMV documentation
│ ├── 📁 lattices/ # Lattice documentation
│ ├── 📁 quantizers/ # Quantizer documentation
│ ├── 📁 notebooks/ # Jupyter notebooks
│ └── 📁 papers/ # Research papers
│
├── requirements.txt # Basic dependencies
├── requirements-dev.txt # Development dependencies
├── setup.py # Package setup
└── README.md # This file
Quantization Layer (gemvq/quantizers/)
nlq.py: Single-level nested lattice quantization for matrix compressionhnlq.py: Multi-level hierarchical nested lattice quantization with progressive refinementutils.py: Lattice utilities supporting D4, A2, E8, Z2, Z3 lattices with optimized closest point algorithms
GEMV Processing Layer (gemvq/gemv/)
columnwise/: Column-wise matrix-vector multiplication with multiple strategiescolumnwise_matvec_processor.py: Main processor with coarse-to-fine decodingcolumnwise_matvec_factory.py: Factory pattern for processor creationsimple_columnwise_matvec.py: Simplified implementation for basic use cases
rowwise/: Row-wise processing alternative for specific matrix structuressvd/: SVD-based decomposition for efficient processing of structured matricesutils/: Shared utilities including lookup tables and matrix paddingadaptive_processor.py: Dynamic strategy selection based on data characteristics
Adaptive Layer (gemvq/adaptive/)
adaptive_matvec.py: Core adaptive matrix-vector multiplication algorithmslayer_wise_histogram_matvec.py: Histogram-based adaptive quantizationdemo_adaptive_matvec.py: Demonstrations of adaptive processing capabilities
Architecture Benefits:
- Modular Design: Each layer has clear responsibilities and interfaces
- Strategy Pattern: Multiple processing strategies can be easily swapped
- Factory Pattern: Clean object creation with configurable parameters
- Progressive Refinement: Support for coarse-to-fine decoding at multiple levels
- Extensible Framework: Easy to add new lattice types and processing strategies
- Column-wise: Process matrix column by column (default approach)
- Row-wise: Process matrix row by row (alternative approach)
- SVD-based: Use singular value decomposition for efficient processing
- Lookup Table: Pre-computed lookup tables for fast computation
- Adaptive: Dynamic strategy selection based on data characteristics
- Categorized Tests: Organized by functionality (core, hierarchical, GEMV, etc.)
- Comprehensive Runner:
run_all_tests.pywith filtering and categorization - Specialized Tests: Error analysis, distortion comparison, parameter testing
- Usage Examples: Practical demonstrations of library capabilities
- Analysis Tools: Rate-distortion analysis, quantizer comparison
- Visualization: Plotting and result analysis utilities
python -m tests.run_all_tests# Core functionality
python -m tests.run_all_tests --category "Core Functionality"
# Coarse-to-fine decoding
python -m tests.run_all_tests --category "Coarse-to-Fine Decoding"
# Analysis and debugging
python -m tests.run_all_tests --category "Analysis & Debugging"# Run specific test modules
python -m tests.test_nested_lattice_quantizer
python -m tests.test_columnwise_matvec_options
python -m tests.test_adaptive_matvec
python -m tests.test_layer_wise_histogram
python -m tests.test_d4_lattice_simulation
python -m tests.test_decoding_parameter
# Run uniform matrix tests
python -m tests.run_all_tests --test "uniform"
# Run coarse-to-fine tests
python -m tests.run_all_tests --test "coarse_to_fine"python -m tests.run_all_tests --list# Clean, intuitive imports from the main package
from gemvq import NLQ, HNLQ, get_d4, closest_point_Dn
# Import specific modules directly
from gemvq.gemv.columnwise import ColumnwiseMatVecProcessor
from gemvq.gemv.rowwise import RowwiseGEMVProcessor
from gemvq.gemv.svd import SVDGEMVProcessor
from gemvq.gemv.utils import BlockingStrategy, LookupTableProcessor
from gemvq.gemv.adaptive_processor import AdaptiveProcessor
# Import quantizers directly
from gemvq.quantizers import NLQ, HNLQ, get_d4, get_a2, get_e8
# Run modules directly
python -m gemvq.gemv.columnwise.columnwise_matvec_processor
python -m gemvq.quantizers.utils
python -m tests.test_nested_lattice_quantizercolumnwise/: All column-wise implementations in one placerowwise/: All row-wise implementations in one placeutils/: Shared utilities and helper functionsdemos/: Demonstration scripts and examplesbase/: Base classes and factory patterns
The library provides efficient matrix-vector multiplication using different lattice quantizers:
- Quantized Matrices: Matrices are quantized using lattice quantizers (nested or hierarchical)
- Multiple Processing Strategies: Column-wise, row-wise, SVD-based, and lookup table approaches
- Progressive Decoding: Can decode matrix-vector products at different quantization levels
- Latticequant Integration: Builds on the latticequant library for core quantization algorithms
- Nested Lattice Quantizer (NLQ): Single-level quantization for matrix compression
- Hierarchical Nested Lattice Quantizer (HNLQ): Multi-level quantization with M levels:
- Level 0: Coarsest approximation (MSB - Most Significant Bits)
- Level M-1: Finest detail (LSB - Least Significant Bits)
- Progressive Reconstruction: Can decode from any level 0 to M-1
- Cumulative Error: Error calculated for complete reconstruction using levels 0 to max_level
- Monotonic Reduction: Error should decrease as more levels are used
- Progressive Quality: Quality improves progressively with each additional level
- Dynamic Bit Allocation: Optimal bit rate allocation per column based on importance
- Layer-wise Processing: Adaptive quantization based on data distribution characteristics
- Sparsity Exploitation: Efficient handling of sparse vectors with known patterns
- Rate-Distortion Optimization: Adaptive quantization for optimal performance
For optimal performance, scale input data by q^M:
- Uniform Random Variables: Provide controlled, predictable behavior
- Scale Factor: q^M where q is quantization parameter, M is number of levels
- Better Alignment: Ensures proper alignment with quantization levels
Comprehensive documentation is available in the docs/ folder:
- Column-wise Processing: Complete guide to column-wise matrix-vector multiplication
- Row-wise Processing: Alternative row-wise processing strategies
- SVD-based Processing: Singular value decomposition approaches
- Lookup Table Processing: Pre-computed table strategies
- D4 Lattice: 4-dimensional checkerboard lattice implementation
- A2 Lattice: 2-dimensional hexagonal lattice
- E8 Lattice: 8-dimensional Gosset lattice
- Z2/Z3 Lattices: Integer lattice implementations
- Nested Lattice Quantization: Single-level quantization guide
- Hierarchical Nested Lattice Quantization: Multi-level quantization guide
- Beta Selection: Parameter optimization for HNLQ
- Improvements: Recent enhancements and optimizations
- Mathematical Framework: Complete mathematical formulation of adaptive GEMV
- Column-wise Interpretation: Matrix-vector multiplication as linear combination of columns
- Adaptive Quantization Strategy: Dynamic bit rate allocation per column
- Sparsity Exploitation: Handling sparse vectors with known patterns
- Implementation Approach: Core components and architecture design
- Performance Characteristics: Computational complexity and memory requirements
- Applications: Neural networks, recommendation systems, signal processing
- Future Extensions: Multi-dimensional sparsity, dynamic adaptation, hardware acceleration
- Each column can have different target bit rates based on importance
- Dynamic rate allocation using energy-based or importance-based methods
- Rate-distortion optimization for overall performance
- Skip processing of zero vector elements for computational efficiency
- Pre-compute lookup tables only for active columns
- Memory and computation savings proportional to sparsity ratio
- Multi-level quantization for better rate-distortion performance
- Successive refinement capability with coarse-to-fine decoding
- Efficient inner product estimation using precomputed tables
- Support for different lattice types (Dₙ, A₂, E₈, Zⁿ)
- Optimized closest point algorithms for each lattice type
- Configurable quantization parameters for different use cases
- Encoding: O(|S| × M × d) where |S| is sparsity, M is hierarchical levels, d is dimension
- Decoding: O(|S| × M × d) for selective column decoding
- MatVec: O(|S| × M²) using precomputed lookup tables
- Encoded Matrix: O(|S| × M × d × log₂(q)) bits
- Lookup Tables: O(Σᵢ q⁽ⁱ⁾^(2M)) entries
- Working Memory: O(d) for intermediate computations
- Adaptive Allocation: Better overall rate-distortion than uniform allocation
- Sparsity Gain: Additional compression proportional to sparsity ratio
- Hierarchical Benefits: Improved performance with increasing M
- Compress weight matrices with different importance levels
- Handle sparse activations efficiently
- Adaptive quantization based on layer sensitivity
- Compress user-item matrices with varying sparsity
- Handle cold-start scenarios with known zero patterns
- Adaptive bit allocation based on popularity
- Compress correlation matrices with known structure
- Handle sparse frequency domain representations
- Adaptive quantization based on signal characteristics
- Compress sparse matrices from finite element methods
- Handle structured sparsity in PDE discretizations
- Adaptive precision based on physical constraints
- Uniform Random Variables: 100% monotonic quantizer performance
- Scaled Matrices: 2.6x better monotonicity than unscaled
- Overall Success Rate: ~74% with proper scaling and uniform variables
- M=2: ~8x compression
- M=3: ~5.3x compression
- M=4: ~4x compression
- Follow the existing code structure
- Add tests for new functionality
- Update documentation
- Ensure all tests pass
- Use uniform random variables for controlled testing
This project is licensed under the MIT License - see the LICENSE file for details.
The MIT License is a permissive license that allows for:
- Commercial use
- Modification
- Distribution
- Private use
While providing liability protection for the authors. See the full license text in the LICENSE file.
If you use this library in your research, please cite:
@article{gemv-q,
title={A library for GEMV based on Lattice Quantizers},
author={Soma S Dhavala},s
}
This library is based on the following foundational works:
-
Kaplan and O. Ordentlich, “High-Rate Nested-Lattice Quantized Matrix Multiplication with Small Lookup Tables”, to be presented in ISIT 2025, arXiv:2505.13164, 2025.
-
O. Ordentlich and Y. Polyanskiy, “Optimal quantization for matrix multiplication”, arXiv preprint arXiv:2410.13780, 2024.
-
J. Conway and N. Sloane, "Fast quantizing and decoding and algorithms for lattice quantizers and codes", in IEEE Transactions on Information Theory, vol. 28, no. 2, pp. 227-232, March 1982, doi: 10.1109/TIT.1982.1056484..
LatticeQuant Repository: The core implementation by Iris Kaplan provides the foundational M-leveled nested-lattice quantizer and Voronoi code quantizer implementations. This repository implements the work described in the papers above and serves as the base for GEMV-Q's quantization capabilities.