HypeLoRA: Hypernetwork-Generated LoRA Adapters for Calibrated Language Model Fine-Tuning

Bartosz Trojan & Filip Gębala — Upper-Secondary Schools of Communications in Cracow

Abstract

Modern Transformer-based models frequently suffer from miscalibration, producing overconfident predictions that do not reflect true empirical frequencies. This work investigates the calibration dynamics of LoRA (Low-Rank Adaptation) and a novel hypernetwork-based adaptation framework as parameter-efficient alternatives to full fine-tuning for RoBERTa. Evaluating across the GLUE benchmark, we demonstrate that LoRA-based adaptation consistently achieves calibration parity with — and in specific tasks exceeds — full fine-tuning, while maintaining significantly higher parameter efficiency. We further explore a dynamic approach where a shared hypernetwork generates LoRA factors (A and B) to induce structural coupling across layers. Our results reveal a critical trade-off: constraining the adaptation space (e.g., freezing matrix A) acts as a powerful regularizer that enhances ECE, but necessitates a carefully balanced sacrifice in downstream task accuracy.

Overview

Standard LoRA adapters are static — once trained, each layer uses a fixed pair of low-rank matrices A and B. This project replaces the per-layer static B matrix (and optionally A) with the output of a compact shared hypernetwork (LoRAHyperNet). The hypernetwork conditions on a learned embedding for each transformer layer and generates coordinated adapter weights across all layers in a single forward pass, while all backbone parameters remain frozen.

Key ideas

Dynamic weight generation — adapter weights are produced on every forward pass by the hypernetwork, enabling parameter sharing across layers and reducing total adapter parameter count.
Two hypernetwork architectures — a 4-layer MLP (hidden size 2048, GELU) or a 2-layer Transformer encoder (256-dim, 16 heads), both conditioning on 128-dim learned layer embeddings.
Fixed-A vs. generated-A variants — matrix A can be frozen (Kaiming uniform init) while only B is generated, which acts as a regularizer improving calibration at some cost to task performance.
Noise blending — generated matrices can be mixed with the initial random matrices via add / multiply / replace modes; the blending coefficient noise_alpha is linearly annealed to 0 during training.
Calibration-aware evaluation — every evaluation step computes ECE, Classwise-ECE, MCE, ACE, Thresholded ACE, and Brier Score alongside the GLUE task metric.

Key Findings

LoRA ≈ Full Fine-Tuning in calibration — LoRA provides calibration comparable to full fine-tuning across most GLUE tasks while being significantly more parameter-efficient.
Hypernetwork does not universally improve calibration — fully generating both A and B via the hypernetwork yields metrics broadly similar to standard LoRA, suggesting that structural coupling across layers alone does not produce systematic confidence correction.
Fixing matrix A regularizes the model — freezing A while generating only B introduces a structured perturbation that modestly lowers ECE. The trade-off is a consistent drop in task performance (MCC on CoLA, accuracy on SST-2).
Extended training degrades calibration — across all methods, longer training progressively overfits the distribution and erodes uncertainty estimates.

Project Structure

.
├── run_experiment.py               # Main training entry point
├── calibration_metrics.py          # ECE, CECE, MCE, ACE, TACE, Brier Score
├── requirements.txt
│
├── models/
│   ├── hypernet.py                 # LoRAHyperNet (MLP) & LoRAHyperNetTransformer
│   ├── dynamic_lora_layer.py       # DynamicLoRALayer – applies hypernet output as LoRA
│   └── get_roberta.py              # Model builders (baseline & hypernet variants)
│
├── data_loading/
│   └── get_datasets.py             # GLUE dataset loading & tokenization
│
├── utils/
│   ├── alpha_callback.py           # Linearly anneals noise_alpha during training
│   ├── batch_generation_trainer.py # Custom Trainer: pre-generates B matrices per batch
│   ├── forward_pass_repetition_data_collator.py  # Gradient accumulation via repeated passes
│   ├── lr_scheduler_callback.py    # LR schedule utilities
│   ├── metrics.py                  # B-matrix statistics (mean / std across layers)
│   ├── metrics_trainer_callback.py # Saves per-epoch metrics to CSV
│   └── one_hot_encoding.py         # One-hot encoder (alternative to learned embeddings)
│
├── params/
│   ├── example_config_hypernet.py  # Template config — hypernet mode
│   ├── example_config_no_hypernet.py # Template config — LoRA / FT baseline
│   ├── ft_baselines/               # Full fine-tuning configs per GLUE task
│   ├── lora_baselines/             # LoRA baseline configs per GLUE task
│   ├── hypernet_mlp/               # MLP hypernet experiments (fixed_A / gen_A)
│   ├── transformer/                # Transformer hypernet experiments
│   └── roberta_large_baselines/    # RoBERTa-large FT & LoRA configs
│
├── pretrained_models/              # Saved checkpoints
└── results/                        # CSV metric logs per run

Installation

python -m venv venv
# Windows
venv\Scripts\activate
# Linux / macOS
source venv/bin/activate

pip install -r requirements.txt

Requirements: Python ≥ 3.9, PyTorch, Transformers, PEFT, Datasets, WandB, scikit-learn, accelerate.

Running Experiments

All experiments are driven by a single entry point that takes a Python config file:

python run_experiment.py --params <path/to/config.py>

The script loads the dataset, builds the model, runs num_runs independent seeds, logs to WandB, and saves metrics to results/.

Baselines

# Full fine-tuning
python run_experiment.py --params params/ft_baselines/cola.py

# LoRA baseline
python run_experiment.py --params params/lora_baselines/cola.py

Hypernetwork experiments

# MLP hypernet, fixed A matrix
python run_experiment.py --params params/hypernet_mlp/fixed_A/cola.py

# MLP hypernet, generated A matrix
python run_experiment.py --params params/hypernet_mlp/gen_A/cola.py

# Transformer hypernet
python run_experiment.py --params params/transformer/fixed_A/cola.py

Configuration Reference

Config files are plain Python dicts assigned to a params variable. Key parameters:

Parameter	Description
`glue_dataset_name`	GLUE task: `cola`, `sst2`, `mrpc`, `qqp`, `mnli`, `qnli`, `rte`, `stsb`
`model_name`	HuggingFace model ID or local checkpoint path
`use_hypernet`	`True` to use dynamic LoRA via hypernetwork; `False` for baseline
`use_peft`	Wrap model with PEFT LoRA config
`lora_r`	LoRA rank (default: 8)
`lora_alpha`	LoRA scaling factor
`layers_to_transform`	Encoder layers to apply LoRA to (default: all 12)
`layers_to_use_hypernet`	Subset of layers whose LoRA weights are generated by the hypernet
`hypernet_use_transformer`	`True` for Transformer hypernet; `False` for MLP
`hypernet_transformer_nhead`	Number of attention heads (Transformer hypernet)
`hypernet_transformer_num_layers`	Number of Transformer layers in hypernet
`hypernet_hidden_dim`	Hidden dimension of the MLP hypernet
`hypernet_embeddings_dim`	Dimension of the learned layer embedding (default: 128)
`hypernet_A_matrix`	How matrix A is handled: `"random"`, `"fixed"`, or `"generated"`
`hypernet_noise_type_A/B`	Blending mode for A/B: `"replace"`, `"add"`, `"multiply"`
`hypernet_noise_alpha`	Initial blending weight; annealed to 0 when `hypernet_reduce_noise_alpha=True`
`hypernet_large_model`	`True` for 4-layer MLP; `False` for 2-layer
`hypernet_use_batches`	Pre-generate B matrices once per batch
`forward_pass_reps`	Repeat forward pass N times per batch
`num_runs`	Number of independent runs (different seeds)

Calibration Metrics

After each evaluation step the following metrics are computed and logged to WandB and the results CSV:

Metric	Formula / Description
ECE	Weighted mean of per-bin \|accuracy − confidence\| gaps
Classwise ECE (CECE)	ECE computed per class, averaged over all classes
MCE	Maximum per-bin calibration error (worst-case bin)
ACE	ECE with equal-population bins, averaged per class
TACE	ACE restricted to predictions above a confidence threshold ε ∈ {0.01, 0.001, 0.0001}
Brier Score	Mean squared error between predicted probability vector and one-hot label

Experiment Tracking

All runs are logged to Weights & Biases. Each run is tagged with:

hypernet or baseline
GLUE dataset name
Run index

Metrics are also saved locally as CSV files in results/ for offline analysis.

Supported GLUE Tasks

Task	Metric
CoLA	Matthews Correlation (MCC)
SST-2	Accuracy
MRPC	F1
QQP	F1 / Accuracy
MNLI	Accuracy
QNLI	Accuracy
RTE	Accuracy
STS-B	Pearson / Spearman correlation

Citation

If you use this code, please cite:

@inproceedings{trojan2026hypelora,
  title     = {HypeLoRA: Hypernetwork-Generated LoRA Adapters for Calibrated Language Model Fine-Tuning},
  author    = {Trojan, Bartosz and Gębala, Filip},
  booktitle = {Proceedings of LNCS},
  year      = {2026}
}

Acknowledgements

The authors thank Dr. Kamil Książek, Dr. Tomasz Kuśmierczyk, and Prof. Jacek Tabor of the Jagiellonian University for their guidance and support throughout this work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HypeLoRA: Hypernetwork-Generated LoRA Adapters for Calibrated Language Model Fine-Tuning

Abstract

Overview

Key ideas

Key Findings

Project Structure

Installation

Running Experiments

Baselines

Hypernetwork experiments

Configuration Reference

Calibration Metrics

Experiment Tracking

Supported GLUE Tasks

Citation

Acknowledgements

About

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 282 Commits
data_loading		data_loading
models		models
params		params
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
calibration_metrics.py		calibration_metrics.py
requirements.txt		requirements.txt
run_experiment.py		run_experiment.py
teaser.png		teaser.png

Folders and files

Latest commit

History

Repository files navigation

HypeLoRA: Hypernetwork-Generated LoRA Adapters for Calibrated Language Model Fine-Tuning

Abstract

Overview

Key ideas

Key Findings

Project Structure

Installation

Running Experiments

Baselines

Hypernetwork experiments

Configuration Reference

Calibration Metrics

Experiment Tracking

Supported GLUE Tasks

Citation

Acknowledgements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages