GEMV-Q

A comprehensive library for matrix-vector multiplication (GEMV) based on different lattice quantizers, including nested lattice quantizers and hierarchical nested lattice quantizers. This library provides efficient implementations of quantized matrix-vector operations with support for multiple processing strategies and adaptive quantization.

Overview

GEMV-Q provides efficient implementations of:

Matrix-Vector Multiplication: Efficient GEMV operations using quantized matrices with different lattice quantizers
Lattice Quantization: Quantizing vectors to points on lattices (D4, A2, E8, Z2, Z3) using latticequant library
Nested Lattice Quantization: Single-level quantization for matrix compression
Hierarchical Nested Lattice Quantization: Multi-level quantization for progressive refinement
Coarse-to-Fine Decoding: Progressive reconstruction from coarse to fine levels
Adaptive Quantization: Dynamic quantization based on data characteristics
Multiple Processing Strategies: Column-wise, row-wise, SVD-based, and lookup table approaches

Key Features

🎯 Matrix-Vector Multiplication (GEMV)

Quantized Matrix Operations: Efficient matrix-vector multiplication using lattice-quantized matrices
Multiple Processing Strategies: Column-wise, row-wise, SVD-based, and lookup table approaches
Coarse-to-Fine Decoding: Progressive reconstruction from coarse to fine levels
Monotonic Error Reduction: Error decreases as more quantization levels are used
Variable Depth Decoding: Support for decoding at different quantization levels
Adaptive Processing: Dynamic strategy selection based on data characteristics

🔧 Lattice Quantizer Support

Nested Lattice Quantizer (NLQ): Single-level quantization for matrix compression
Hierarchical Nested Lattice Quantizer (HNLQ): Multi-level quantization for progressive refinement
Multiple Lattice Types: Support for D4, A2, E8, Z2, Z3 lattices from latticequant library
D4: 4-dimensional checkerboard lattice
A2: 2-dimensional hexagonal lattice
E8: 8-dimensional Gosset lattice
Z2/Z3: Integer lattices

📊 Comprehensive Testing

Extensive test suite with categorized tests
Uniform random variable testing for controlled analysis
Scaled matrix testing for better hierarchical behavior
Error type analysis (cumulative vs tile-specific)
Performance benchmarking and validation

🎛️ Adaptive Features

Layer-wise Histogram Processing: Adaptive quantization based on data distribution
Dynamic Bit Allocation: Optimal bit rate allocation per column
Sparsity Exploitation: Efficient handling of sparse vectors
Rate-Distortion Optimization: Adaptive quantization for optimal performance

Installation

# Clone the repository
git clone <repository-url>
cd gemvq

# Install the package in development mode
pip install -e .

# Install dependencies
pip install -r requirements.txt

# For development
pip install -r requirements-dev.txt

Package Installation

After installation, you can import the package directly:

# Clean imports from the installed package
from gemvq import NLQ, HNLQ, get_d4
from gemvq.gemv.columnwise import ColumnwiseMatVecProcessor

Quick Start

Matrix-Vector Multiplication with Lattice Quantizers

import numpy as np
from gemvq.gemv.columnwise import ColumnwiseMatVecProcessor

# Create a matrix and quantize it using D4 lattice
matrix = np.random.uniform(0, 1, (100, 50)) * 256  # Scale by q^M
processor = ColumnwiseMatVecProcessor(matrix, 'D4', M=3)

# Perform matrix-vector multiplication with coarse-to-fine decoding
vector = np.random.uniform(0, 1, 50) * 256
result = processor.multiply_coarse_to_fine(vector, max_level=0)  # Use 1 level
print(f"Matrix-vector multiplication result shape: {result.shape}")

Using Different Lattice Quantizers

# Using nested lattice quantizer (single-level)
from gemvq import NLQ, get_d4

G = get_d4()
nested_quantizer = NLQ(G=G, Q_nn=closest_point_Dn, q=4, beta=0.2, alpha=1/3, eps=1e-8)

# Using hierarchical nested lattice quantizer (multi-level)
from gemvq import HNLQ, closest_point_Dn

hierarchical_quantizer = HNLQ(
    G=G, Q_nn=closest_point_Dn, q=4, beta=0.2,
    alpha=1/3, eps=1e-8, dither=np.zeros(4), M=3
)

# Encode and decode with progressive refinement
x = np.random.uniform(0, 1, 4) * 64  # Scale by q^M
b_list, T = hierarchical_quantizer.encode(x, with_dither=False)

# Decode at different levels for progressive quality
for level in range(3):
    reconstruction = hierarchical_quantizer.decode_coarse_to_fine(b_list, T, with_dither=False, max_level=level)
    error = np.linalg.norm(reconstruction - x) / np.linalg.norm(x)
    print(f"Level {level}: Error = {error:.6f}")

Adaptive Matrix-Vector Multiplication

from gemvq import get_d4
from gemvq.gemv.adaptive_processor import AdaptiveProcessor

# Create adaptive processor with D4 lattice
G = get_d4()
processor = AdaptiveProcessor(G=G, q=4, beta=0.2, M=3)

# Create matrix and vector
matrix = np.random.uniform(0, 1, (100, 50)) * 256
vector = np.random.uniform(0, 1, 50) * 256

# Perform adaptive matrix-vector multiplication
result = processor.adaptive_matvec(matrix, vector, target_rate=2.0)
print(f"Adaptive GEMV result shape: {result.shape}")

Project Structure

gemvq/
├── 📁 gemvq/                        # Main package (installable)
│   ├── 📁 quantizers/               # Quantization core
│   │   ├── nlq.py                   # Single-level quantization
│   │   ├── hnlq.py                  # Multi-level quantization
│   │   ├── utils.py                 # Lattice utilities (D4, A2, E8, etc.)
│   │   └── __init__.py
│   │
│   ├── 📁 gemv/                     # Matrix-vector multiplication
│   │   ├── 📁 columnwise/           # Column-wise GEMV implementations
│   │   │   ├── columnwise_matvec_processor.py # Main column-wise processor
│   │   │   ├── columnwise_matvec_factory.py  # Factory functions
│   │   │   ├── columnwise_processor.py       # Column-wise processing logic
│   │   │   ├── column_wise_gemv.py           # Column-wise GEMV implementation
│   │   │   ├── simple_columnwise_matvec.py   # Simplified column-wise processor
│   │   │   ├── standard_dot_processor.py     # Standard dot product
│   │   │   └── __init__.py
│   │   │
│   │   ├── 📁 rowwise/              # Row-wise GEMV implementations
│   │   │   ├── rowwise_processor.py          # Row-wise processing logic
│   │   │   ├── row_wise_gemv.py              # Row-wise GEMV implementation
│   │   │   └── __init__.py
│   │   │
│   │   ├── 📁 svd/                  # SVD-based GEMV
│   │   │   ├── svd_gemv_processor.py         # SVD decomposition approach
│   │   │   └── __init__.py
│   │   │
│   │   ├── 📁 utils/                # GEMV utilities
│   │   │   ├── padder.py            # Matrix padding utilities
│   │   │   ├── lookup_table_processor.py     # Lookup table approach
│   │   │   └── __init__.py
│   │   │
│   │   ├── gemv_processor.py        # Base processor interface
│   │   ├── gemv_factory.py          # Factory for creating GEMV processors
│   │   ├── adaptive_processor.py    # Adaptive processing
│   │   └── __init__.py
│   │
│   └── __init__.py                  # Main package initialization
│
├── 📁 tests/                        # Comprehensive test suite
│   ├── run_all_tests.py             # Main test runner with categories
│   ├── test_nested_lattice_quantizer.py     # Core quantization tests
│   ├── test_hierarchical_*.py       # Hierarchical quantization tests
│   ├── test_columnwise_matvec_options.py    # Column-wise GEMV tests
│   ├── test_adaptive_matvec.py              # Adaptive GEMV tests
│   ├── test_layer_wise_histogram.py         # Layer-wise histogram tests
│   ├── test_d4_lattice_simulation.py        # D4 lattice simulation tests
│   ├── test_decoding_parameter.py           # Decoding parameter tests
│   └── __init__.py
│
├── 📁 examples/                     # Usage examples and demonstrations
│   ├── demo_coarse_to_fine.py       # Coarse-to-fine decoding demo
│   ├── example_adaptive_matvec.py   # Adaptive GEMV example
│   ├── example_coarse_to_fine.py    # Coarse-to-fine example
│   ├── analyze_rate_distortion_results.py   # Rate-distortion analysis
│   ├── compare_quantizer_distortion.py      # Quantizer comparison
│   ├── estimate_*.py                # Estimation examples
│   ├── plot_*.py                    # Visualization examples
│   ├── demo_hnlq_*.py               # HNLQ demonstrations
│   ├── d4_nested_lattice_manim.py   # D4 lattice visualization
│   └── __init__.py
│
├── 📁 docs/                         # Documentation and analysis
│   ├── 📁 gemv/                     # GEMV documentation
│   ├── 📁 lattices/                 # Lattice documentation
│   ├── 📁 quantizers/               # Quantizer documentation
│   ├── 📁 notebooks/                # Jupyter notebooks
│   └── 📁 papers/                   # Research papers
│
├── requirements.txt                 # Basic dependencies
├── requirements-dev.txt             # Development dependencies
├── setup.py                         # Package setup
└── README.md                        # This file

Module Organization

🏗️ Core Architecture

Quantization Layer (gemvq/quantizers/)

nlq.py: Single-level nested lattice quantization for matrix compression
hnlq.py: Multi-level hierarchical nested lattice quantization with progressive refinement
utils.py: Lattice utilities supporting D4, A2, E8, Z2, Z3 lattices with optimized closest point algorithms

GEMV Processing Layer (gemvq/gemv/)

columnwise/: Column-wise matrix-vector multiplication with multiple strategies
- columnwise_matvec_processor.py: Main processor with coarse-to-fine decoding
- columnwise_matvec_factory.py: Factory pattern for processor creation
- simple_columnwise_matvec.py: Simplified implementation for basic use cases
rowwise/: Row-wise processing alternative for specific matrix structures
svd/: SVD-based decomposition for efficient processing of structured matrices
utils/: Shared utilities including lookup tables and matrix padding
adaptive_processor.py: Dynamic strategy selection based on data characteristics

Adaptive Layer (gemvq/adaptive/)

adaptive_matvec.py: Core adaptive matrix-vector multiplication algorithms
layer_wise_histogram_matvec.py: Histogram-based adaptive quantization
demo_adaptive_matvec.py: Demonstrations of adaptive processing capabilities

Architecture Benefits:

Modular Design: Each layer has clear responsibilities and interfaces
Strategy Pattern: Multiple processing strategies can be easily swapped
Factory Pattern: Clean object creation with configurable parameters
Progressive Refinement: Support for coarse-to-fine decoding at multiple levels
Extensible Framework: Easy to add new lattice types and processing strategies

🔧 GEMV Processing Strategies

Column-wise: Process matrix column by column (default approach)
Row-wise: Process matrix row by row (alternative approach)
SVD-based: Use singular value decomposition for efficient processing
Lookup Table: Pre-computed lookup tables for fast computation
Adaptive: Dynamic strategy selection based on data characteristics

🧪 Testing Framework

Categorized Tests: Organized by functionality (core, hierarchical, GEMV, etc.)
Comprehensive Runner: run_all_tests.py with filtering and categorization
Specialized Tests: Error analysis, distortion comparison, parameter testing

📚 Examples & Documentation

Usage Examples: Practical demonstrations of library capabilities
Analysis Tools: Rate-distortion analysis, quantizer comparison
Visualization: Plotting and result analysis utilities

Running Tests

Run All Tests

python -m tests.run_all_tests

Run Specific Categories

# Core functionality
python -m tests.run_all_tests --category "Core Functionality"

# Coarse-to-fine decoding
python -m tests.run_all_tests --category "Coarse-to-Fine Decoding"

# Analysis and debugging
python -m tests.run_all_tests --category "Analysis & Debugging"

Run Specific Tests

# Run specific test modules
python -m tests.test_nested_lattice_quantizer
python -m tests.test_columnwise_matvec_options
python -m tests.test_adaptive_matvec
python -m tests.test_layer_wise_histogram
python -m tests.test_d4_lattice_simulation
python -m tests.test_decoding_parameter

# Run uniform matrix tests
python -m tests.run_all_tests --test "uniform"

# Run coarse-to-fine tests
python -m tests.run_all_tests --test "coarse_to_fine"

List Available Tests

python -m tests.run_all_tests --list

Clean Structure and Import Patterns

📦 Import Patterns

# Clean, intuitive imports from the main package
from gemvq import NLQ, HNLQ, get_d4, closest_point_Dn

# Import specific modules directly
from gemvq.gemv.columnwise import ColumnwiseMatVecProcessor
from gemvq.gemv.rowwise import RowwiseGEMVProcessor
from gemvq.gemv.svd import SVDGEMVProcessor
from gemvq.gemv.utils import BlockingStrategy, LookupTableProcessor
from gemvq.gemv.adaptive_processor import AdaptiveProcessor

# Import quantizers directly
from gemvq.quantizers import NLQ, HNLQ, get_d4, get_a2, get_e8

# Run modules directly
python -m gemvq.gemv.columnwise.columnwise_matvec_processor
python -m gemvq.quantizers.utils
python -m tests.test_nested_lattice_quantizer

🏗️ Directory Structure Benefits

columnwise/: All column-wise implementations in one place
rowwise/: All row-wise implementations in one place
utils/: Shared utilities and helper functions
demos/: Demonstration scripts and examples
base/: Base classes and factory patterns

Key Concepts

Matrix-Vector Multiplication with Lattice Quantizers

The library provides efficient matrix-vector multiplication using different lattice quantizers:

Quantized Matrices: Matrices are quantized using lattice quantizers (nested or hierarchical)
Multiple Processing Strategies: Column-wise, row-wise, SVD-based, and lookup table approaches
Progressive Decoding: Can decode matrix-vector products at different quantization levels
Latticequant Integration: Builds on the latticequant library for core quantization algorithms

Lattice Quantizer Types

Nested Lattice Quantizer (NLQ): Single-level quantization for matrix compression
Hierarchical Nested Lattice Quantizer (HNLQ): Multi-level quantization with M levels:
- Level 0: Coarsest approximation (MSB - Most Significant Bits)
- Level M-1: Finest detail (LSB - Least Significant Bits)
- Progressive Reconstruction: Can decode from any level 0 to M-1

Coarse-to-Fine Decoding

Cumulative Error: Error calculated for complete reconstruction using levels 0 to max_level
Monotonic Reduction: Error should decrease as more levels are used
Progressive Quality: Quality improves progressively with each additional level

Adaptive Quantization

Dynamic Bit Allocation: Optimal bit rate allocation per column based on importance
Layer-wise Processing: Adaptive quantization based on data distribution characteristics
Sparsity Exploitation: Efficient handling of sparse vectors with known patterns
Rate-Distortion Optimization: Adaptive quantization for optimal performance

Scaling Strategy

For optimal performance, scale input data by q^M:

Uniform Random Variables: Provide controlled, predictable behavior
Scale Factor: q^M where q is quantization parameter, M is number of levels
Better Alignment: Ensures proper alignment with quantization levels

Documentation

Comprehensive documentation is available in the docs/ folder:

📖 Technical Documentation

GEMV Module (`docs/gemv/`)

Column-wise Processing: Complete guide to column-wise matrix-vector multiplication
Row-wise Processing: Alternative row-wise processing strategies
SVD-based Processing: Singular value decomposition approaches
Lookup Table Processing: Pre-computed table strategies

Lattice Documentation (`docs/lattices/`)

D4 Lattice: 4-dimensional checkerboard lattice implementation
A2 Lattice: 2-dimensional hexagonal lattice
E8 Lattice: 8-dimensional Gosset lattice
Z2/Z3 Lattices: Integer lattice implementations

Quantizer Documentation (`docs/quantizers/`)

Nested Lattice Quantization: Single-level quantization guide
Hierarchical Nested Lattice Quantization: Multi-level quantization guide
Beta Selection: Parameter optimization for HNLQ
Improvements: Recent enhancements and optimizations

Adaptive Approach (`docs/gemv/adaptive_matvec_approach.qmd`)

Mathematical Framework: Complete mathematical formulation of adaptive GEMV
Column-wise Interpretation: Matrix-vector multiplication as linear combination of columns
Adaptive Quantization Strategy: Dynamic bit rate allocation per column
Sparsity Exploitation: Handling sparse vectors with known patterns
Implementation Approach: Core components and architecture design
Performance Characteristics: Computational complexity and memory requirements
Applications: Neural networks, recommendation systems, signal processing
Future Extensions: Multi-dimensional sparsity, dynamic adaptation, hardware acceleration

🔬 Key Concepts Covered

Adaptive Bit Allocation

Each column can have different target bit rates based on importance
Dynamic rate allocation using energy-based or importance-based methods
Rate-distortion optimization for overall performance

Sparsity Handling

Skip processing of zero vector elements for computational efficiency
Pre-compute lookup tables only for active columns
Memory and computation savings proportional to sparsity ratio

Hierarchical Refinement

Multi-level quantization for better rate-distortion performance
Successive refinement capability with coarse-to-fine decoding
Efficient inner product estimation using precomputed tables

Flexible Lattice Support

Support for different lattice types (Dₙ, A₂, E₈, Zⁿ)
Optimized closest point algorithms for each lattice type
Configurable quantization parameters for different use cases

📊 Performance Analysis

Computational Complexity

Encoding: O(|S| × M × d) where |S| is sparsity, M is hierarchical levels, d is dimension
Decoding: O(|S| × M × d) for selective column decoding
MatVec: O(|S| × M²) using precomputed lookup tables

Memory Requirements

Encoded Matrix: O(|S| × M × d × log₂(q)) bits
Lookup Tables: O(Σᵢ q⁽ⁱ⁾^(2M)) entries
Working Memory: O(d) for intermediate computations

Rate-Distortion Performance

Adaptive Allocation: Better overall rate-distortion than uniform allocation
Sparsity Gain: Additional compression proportional to sparsity ratio
Hierarchical Benefits: Improved performance with increasing M

🎯 Practical Applications

Neural Network Compression

Compress weight matrices with different importance levels
Handle sparse activations efficiently
Adaptive quantization based on layer sensitivity

Recommendation Systems

Compress user-item matrices with varying sparsity
Handle cold-start scenarios with known zero patterns
Adaptive bit allocation based on popularity

Signal Processing

Compress correlation matrices with known structure
Handle sparse frequency domain representations
Adaptive quantization based on signal characteristics

Scientific Computing

Compress sparse matrices from finite element methods
Handle structured sparsity in PDE discretizations
Adaptive precision based on physical constraints

Performance Characteristics

Monotonicity Results

Uniform Random Variables: 100% monotonic quantizer performance
Scaled Matrices: 2.6x better monotonicity than unscaled
Overall Success Rate: ~74% with proper scaling and uniform variables

Compression Ratios

M=2: ~8x compression
M=3: ~5.3x compression
M=4: ~4x compression

Contributing

Follow the existing code structure
Add tests for new functionality
Update documentation
Ensure all tests pass
Use uniform random variables for controlled testing

License

This project is licensed under the MIT License - see the LICENSE file for details.

The MIT License is a permissive license that allows for:

Commercial use
Modification
Distribution
Private use

While providing liability protection for the authors. See the full license text in the LICENSE file.

Citation

If you use this library in your research, please cite:

@article{gemv-q,
  title={A library for GEMV based on Lattice Quantizers},
  author={Soma S Dhavala},s
}

References

This library is based on the following foundational works:

Kaplan and O. Ordentlich, “High-Rate Nested-Lattice Quantized Matrix Multiplication with Small Lookup Tables”, to be presented in ISIT 2025, arXiv:2505.13164, 2025.
O. Ordentlich and Y. Polyanskiy, “Optimal quantization for matrix multiplication”, arXiv preprint arXiv:2410.13780, 2024.
J. Conway and N. Sloane, "Fast quantizing and decoding and algorithms for lattice quantizers and codes", in IEEE Transactions on Information Theory, vol. 28, no. 2, pp. 227-232, March 1982, doi: 10.1109/TIT.1982.1056484..

Acknowledgments

LatticeQuant Repository: The core implementation by Iris Kaplan provides the foundational M-leveled nested-lattice quantizer and Voronoi code quantizer implementations. This repository implements the work described in the papers above and serves as the base for GEMV-Q's quantization capabilities.

Name		Name	Last commit message	Last commit date
Latest commit History 151 Commits
docs		docs
examples		examples
gemvq		gemvq
tests		tests
.gitIgnore		.gitIgnore
LICENSE		LICENSE
README.md		README.md
_quarto.yml		_quarto.yml
example_optional_parameters.py		example_optional_parameters.py
index.qmd		index.qmd
preface.qmd		preface.qmd
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
setup.py		setup.py
test_batch_hnlq.py		test_batch_hnlq.py
update_hnlq_configs.py		update_hnlq_configs.py

License

mlsquare/gemvq

Folders and files

Latest commit

History

Repository files navigation

GEMV-Q

Overview

Key Features

🎯 Matrix-Vector Multiplication (GEMV)

🔧 Lattice Quantizer Support

📊 Comprehensive Testing

🎛️ Adaptive Features

Installation

Package Installation

Quick Start

Matrix-Vector Multiplication with Lattice Quantizers

Using Different Lattice Quantizers

Adaptive Matrix-Vector Multiplication

Project Structure

Module Organization

🏗️ Core Architecture

🔧 GEMV Processing Strategies

🧪 Testing Framework

📚 Examples & Documentation

Running Tests

Run All Tests

Run Specific Categories

Run Specific Tests

List Available Tests

Clean Structure and Import Patterns

📦 Import Patterns

🏗️ Directory Structure Benefits

Key Concepts

Matrix-Vector Multiplication with Lattice Quantizers

Lattice Quantizer Types

Coarse-to-Fine Decoding

Adaptive Quantization

Scaling Strategy

Documentation

📖 Technical Documentation

GEMV Module (docs/gemv/)

Lattice Documentation (docs/lattices/)

Quantizer Documentation (docs/quantizers/)

Adaptive Approach (docs/gemv/adaptive_matvec_approach.qmd)

🔬 Key Concepts Covered

Adaptive Bit Allocation

Sparsity Handling

Hierarchical Refinement

Flexible Lattice Support

📊 Performance Analysis

Computational Complexity

Memory Requirements

Rate-Distortion Performance

🎯 Practical Applications

Neural Network Compression

Recommendation Systems

Signal Processing

Scientific Computing

Performance Characteristics

Monotonicity Results

Compression Ratios

Contributing

License

Citation

References

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

GEMV Module (`docs/gemv/`)

Lattice Documentation (`docs/lattices/`)

Quantizer Documentation (`docs/quantizers/`)

Adaptive Approach (`docs/gemv/adaptive_matvec_approach.qmd`)

Packages