Multi-Author Style Analysis

PAN Lab Shared Task @ CLEF 2025

📝 CLEF 2025 Paper

Repository Structure

data/
- Contains PAN 2025 data for style analysis task. Download here.
ensemble/
- ensemble-cls.py → Functions to determine best ensemble on validation set using finetuned models. Supports majority voting, avg probabilities, avg logits methods.
- models.py → BertStyleNN, BertPairDataset, StyleNN classes (imported in ensemble-cls.py). Compiled from training/bert-training.py and training/ffnn.py
logs/
- bert-trained/ → Training logs for all models used in final ensemble method (and others)
- Other log files from baseline/naive experimentation i.e. using static embeddings without fine-tuning on style analysis task.
training/
- bert-training.py → Fine-tuning code for encoder & binary classification head. Supports most HuggingFace encoder-only models (including BERT family) as well as many SentenceTransformers models. Comment in file specifies all models which definitely work.
- ffnn.py → Defines FFNN used as binary classification head.
- siamese.py → Siamese style network for fine-tuning embeddings. Did not work well (not used).

Reproduction

To reproduce our results on the shared task, you can download our fine-tuned model dictionaries from HuggingFace.

Download Model Files

You can download fine-tuned model state dictionaries used in this submission to PAN 2025 directly from HuggingFace. You can view all available models here.

# Example downloading state dictionary for fine-tuned roberta-base to root directory.
from huggingface_hub import hf_hub_download
hf_hub_download(repo_id='denizbt/pan-style-analysis-models', filename='roberta-base.pth', local_dir='.')

or

# Download all files from your repository to current directory
snapshot_download(
    repo_id='denizbt/pan-style-analysis-models',
    local_dir='.',
    local_dir_use_symlinks=False
)

Train a BertStyleNN from Scratch

To train your BertStyleNN, you can use our training script: training/bert-training.py. This script allows you to specify the pre-trained encoder for the model as well as many training hyperparameters including number of epochs, learning rate and learning rate scheduler. Please note that not every encoder from HuggingFace is out-of-the-box compatible with our script, a list of pre-tested models can be found in a comment at the top of bert-training.py.

Here's how you can use the script to train a model bert-base-cased as its encoder.

python3 bert-training.py --model-name="bert-base-cased" --num-epochs=10 --bert-lr=1e-4

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
data		data
ensemble		ensemble
logs		logs
training		training
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
data-exp.ipynb		data-exp.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-Author Style Analysis

Repository Structure

Reproduction

Download Model Files

Train a BertStyleNN from Scratch

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Multi-Author Style Analysis

Repository Structure

Reproduction

Download Model Files

Train a BertStyleNN from Scratch

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages