PAN Lab Shared Task @ CLEF 2025
- data/
- Contains PAN 2025 data for style analysis task. Download here.
- ensemble/
- ensemble-cls.py → Functions to determine best ensemble on validation set using finetuned models. Supports majority voting, avg probabilities, avg logits methods.
- models.py → BertStyleNN, BertPairDataset, StyleNN classes (imported in ensemble-cls.py). Compiled from training/bert-training.py and training/ffnn.py
- logs/
- bert-trained/ → Training logs for all models used in final ensemble method (and others)
- Other log files from baseline/naive experimentation i.e. using static embeddings without fine-tuning on style analysis task.
- training/
- bert-training.py → Fine-tuning code for encoder & binary classification head. Supports most HuggingFace encoder-only models (including BERT family) as well as many SentenceTransformers models. Comment in file specifies all models which definitely work.
- ffnn.py → Defines FFNN used as binary classification head.
- siamese.py → Siamese style network for fine-tuning embeddings. Did not work well (not used).
To reproduce our results on the shared task, you can download our fine-tuned model dictionaries from HuggingFace.
You can download fine-tuned model state dictionaries used in this submission to PAN 2025 directly from HuggingFace. You can view all available models here.
# Example downloading state dictionary for fine-tuned roberta-base to root directory.
from huggingface_hub import hf_hub_download
hf_hub_download(repo_id='denizbt/pan-style-analysis-models', filename='roberta-base.pth', local_dir='.')
or
# Download all files from your repository to current directory
snapshot_download(
repo_id='denizbt/pan-style-analysis-models',
local_dir='.',
local_dir_use_symlinks=False
)
To train your BertStyleNN, you can use our training script: training/bert-training.py. This script allows you to specify the pre-trained encoder for the model as well as many training hyperparameters including number of epochs, learning rate and learning rate scheduler. Please note that not every encoder from HuggingFace is out-of-the-box compatible with our script, a list of pre-tested models can be found in a comment at the top of bert-training.py.
Here's how you can use the script to train a model bert-base-cased as its encoder.
python3 bert-training.py --model-name="bert-base-cased" --num-epochs=10 --bert-lr=1e-4