Information Representation Fairness in Long-Document Embeddings: The Peculiar Interaction of Positional and Language Bias - FairSentenceTransformers and Replication

Overview

This repository accompanies our preprint(under review) with source code for the examination of positional bias in long documents of multilingual embedding models and the fair-sentence-transformers extension.

Fair Sentence Transformers

We introduce an inference-time attention calibration method, implemented as an extension of Sentence Transformers called Fair Sentence Transformers. This tool aims to:

Provide a Wrapper Class for inference-time calibration techniques that improve fairness in embedding models.
Support existing and future embedding model releases through generic implementations configurable to each model's attributes.

Setup and Example use:

poetry install

from src.locobench.core.fair_sentence_transformer import FairSentenceTransformer
 
input_texts = [
    "What is the capital of Switzerland?",
    "How to make an omelette?",
    "Wie viele Einwohner hat Deutschland?",
]
 
model_name_or_path = "Alibaba-NLP/gte-multilingual-base"
model = FairSentenceTransformer(model_name_or_path)
 
# Standard SentenceTransformer embeddings
embeddings = model.encode(input_texts)  # shape: (3, 768)
 
# Fair SentenceTransformer embeddings
fair_embeddings = model.encode_positionally_fair(
    input_texts,
    calib_strength=0.5,
    calib_basket_size=128,
    calib_layers=6,
)  # shape: (3, 768)

pip install coming soon

Supported Models and Methods

encode_positionally_fair - Inference-time attention calibration to ensure fair representation of input from all positions.

Tested Models:

Extensibility: Our implementation can support additional models with a configuration and quick test of new additions. Feel free put a pull request with your favourite model.

Reproducing the Experiments

To fully replicate our results, please follow the instructions decipited in our Replication readme (REPL_README.md)

Citation

If you use these resources, please cite our paper:

@misc{schuhmacher2026informationrepresentationfairnesslongdocument,
      title={Information Representation Fairness in Long-Document Embeddings: The Peculiar Interaction of Positional and Language Bias}, 
      author={Elias Schuhmacher and Andrianos Michail and Juri Opitz and Rico Sennrich and Simon Clematide},
      year={2026},
      eprint={2601.16934},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2601.16934}, 
}

About Impresso

Impresso project

Impresso - Media Monitoring of the Past is an interdisciplinary research project that aims to develop and consolidate tools for processing and exploring large collections of media archives across modalities, time, languages and national borders. The first project (2017-2021) was funded by the Swiss National Science Foundation under grant No. CRSII5_173719 and the second project (2023-2027) by the SNSF under grant No. CRSII5_213585 and the Luxembourg National Research Fund under grant No. 17498891.

Copyright

License

This program is provided as open source under the GNU Affero General Public License v3 or later.

Name		Name	Last commit message	Last commit date
Latest commit History 119 Commits
config		config
notebooks		notebooks
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
REPL_README.md		REPL_README.md
REPL_run_all_configs.sh		REPL_run_all_configs.sh
REPL_run_attn_analyzer.sh		REPL_run_attn_analyzer.sh
REPL_run_wiki_parallel_embeds.sh		REPL_run_wiki_parallel_embeds.sh
REPL_run_wiki_parallel_steps.sh		REPL_run_wiki_parallel_steps.sh
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Information Representation Fairness in Long-Document Embeddings: The Peculiar Interaction of Positional and Language Bias - FairSentenceTransformers and Replication

Overview

Table of Contents

Fair Sentence Transformers

Setup and Example use:

Supported Models and Methods

Reproducing the Experiments

Citation

About Impresso

Impresso project

Copyright

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

impresso/fair-sentence-transformers

Folders and files

Latest commit

History

Repository files navigation

Information Representation Fairness in Long-Document Embeddings: The Peculiar Interaction of Positional and Language Bias - FairSentenceTransformers and Replication

Overview

Table of Contents

Fair Sentence Transformers

Setup and Example use:

Supported Models and Methods

Reproducing the Experiments

Citation

About Impresso

Impresso project

Copyright

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages