(formerly known as ScandEval)
- Dan Saattrup Smart (@saattrupdan, dan.smart@alexandra.dk)
See the documentation for more information.
EuroEval exposes comprehensive finetuning hyperparameters that can be customized both via CLI and programmatically. All parameters have sensible defaults based on standard best practices.
| Parameter | Default | Description |
|---|---|---|
--learning-rate |
2e-5 |
Maximum learning rate for training |
--warmup-ratio |
0.01 |
Fraction of training steps for learning rate warmup |
--finetuning-batch-size |
32 |
Training batch size per device |
--max-steps |
10000 |
Maximum number of training steps |
| Parameter | Default | Description |
|---|---|---|
--eval-steps |
30 |
How often to evaluate the model during training |
--save-steps |
30 |
How often to save model checkpoints |
--save-total-limit |
1 |
Maximum number of checkpoints to keep |
--logging-steps |
30 |
How often to log training metrics |
| Parameter | Default | Description |
|---|---|---|
--optimizer-name |
adamw_torch |
Optimizer to use (adamw_torch, adamw_8bit, sgd, etc.) |
--weight-decay |
0.0 |
L2 regularization coefficient |
--lr-scheduler-type |
linear |
LR scheduler type (linear, cosine, polynomial, etc.) |
| Parameter | Default | Description |
|---|---|---|
--gradient-accumulation-base |
32 |
Base value for computing gradient accumulation (final: base ÷ batch_size) |
--eval-accumulation-steps |
32 |
Accumulation steps for evaluation |
--early-stopping-patience |
5 |
Evaluation steps with no improvement before stopping |
--per-device-eval-batch-size |
None |
Separate eval batch size (if None, uses training batch size) |
# Default hyperparameters
euroeval-benchmark --model bert-base-uncased --language en
# Custom learning rate and batch size
euroeval-benchmark --model bert-base-uncased --learning-rate 5e-5 --finetuning-batch-size 16
# Aggressive early stopping with custom optimizer
euroeval-benchmark --model bert-base-uncased \
--early-stopping-patience 3 \
--optimizer-name adamw_8bit \
--weight-decay 0.01
# Cosine annealing with custom evaluation frequency
euroeval-benchmark --model bert-base-uncased \
--lr-scheduler-type cosine \
--eval-steps 50 \
--logging-steps 50
# Different eval batch size for memory efficiency
euroeval-benchmark --model bert-base-uncased \
--finetuning-batch-size 32 \
--per-device-eval-batch-size 64from euroeval import Benchmarker
benchmarker = Benchmarker(
learning_rate=5e-5,
warmup_ratio=0.1,
finetuning_batch_size=16,
max_steps=5000,
eval_steps=50,
save_steps=50,
optimizer_name="adamw_8bit",
weight_decay=0.01,
lr_scheduler_type="cosine",
early_stopping_patience=3,
)
benchmarker.benchmark(model=["bert-base-uncased"])All datasets used in this project are generated using the scripts located in the src/scripts folder. To reproduce a dataset, run the corresponding script with the following command
uv run src/scripts/<name-of-script>.pyReplace with the specific script you wish to execute, e.g.,
uv run src/scripts/create_allocine.pyA huge thank you to all the contributors who have helped make this project a success!
We welcome contributions to EuroEval! Whether you're fixing bugs, adding features, or contributing new datasets, your help makes this project better for everyone.
- General contributions: Check out our contribution guidelines for information on how to get started.
- Adding datasets: If you're interested in adding a new dataset to EuroEval, we have a dedicated guide with step-by-step instructions.
- Thanks to Google for sponsoring Gemini credits as part of their Google Cloud for Researchers Program.
- Thanks @Mikeriess for evaluating many of the larger models on the leaderboards.
- Thanks to OpenAI for sponsoring OpenAI credits as part of their Researcher Access Program.
- Thanks to UWV and KU Leuven for sponsoring the Azure OpenAI credits used to evaluate GPT-4-turbo in Dutch.
- Thanks to Miðeind for sponsoring the OpenAI credits used to evaluate GPT-4-turbo in Icelandic and Faroese.
- Thanks to CHC for sponsoring the OpenAI credits used to evaluate GPT-4-turbo in German.
If you want to cite the framework then feel free to use this:
@article{smart2024encoder,
title={Encoder vs Decoder: Comparative Analysis of Encoder and Decoder Language Models on Multilingual NLU Tasks},
author={Smart, Dan Saattrup and Enevoldsen, Kenneth and Schneider-Kamp, Peter},
journal={arXiv preprint arXiv:2406.13469},
year={2024}
}
@inproceedings{smart2023scandeval,
author = {Smart, Dan Saattrup},
booktitle = {Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)},
month = may,
pages = {185--201},
title = {{ScandEval: A Benchmark for Scandinavian Natural Language Processing}},
year = {2023}
}