AI Evaluation and Benchmarking Framework

A FINOS Labs initiative for building a taxonomy, datasets, and tooling for evaluating AI systems in financial services.

Motivation

AI systems are non-deterministic and financial tasks rarely have a single "correct" answer. Existing evaluation benchmarks often fail to address the complexity, risks, and compliance needs of financial services.

This project anchors evaluations in financial use cases, creating a taxonomy that links:

Use Cases → Risks → Metrics

By doing so, it bridges technical benchmarking with real business value, helping reduce compliance risk and improve trust in AI deployments.

Deliverables

For each financial use case, the framework will provide:

Test datasets
Synthetic data generation pipelines
Reference architectures & implementation strategies
Metrics & thresholds
Evaluation guidelines

Roadmap

The initiative follows a staged approach:

✅ Gather workshop and techsprint artefacts (Sept 2025) -> PDF

🔄 Literature review, infra setup, repo launch (Oct–Nov 2025)

🚧 Publish template repos & examples (Nov 2025)

🧪 Pilot with financial institutions (Q1 2026)

🌍 Expand shared taxonomy across industry (Q2 2026)

Milestones

Each milestone listed in the image below is linked to a github issue

License

Distributed under the Apache License, Version 2.0.

SPDX-License-Identifier: Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
.gitignore		.gitignore
202509 - FINOS AI Evals.pdf		202509 - FINOS AI Evals.pdf
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
LICENSE.spdx		LICENSE.spdx
NOTICE		NOTICE
README.md		README.md
README.template.md		README.template.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Evaluation and Benchmarking Framework

Motivation

Deliverables

Roadmap

Milestones

License

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

License

finos-labs/ai-evals-framework

Folders and files

Latest commit

History

Repository files navigation

AI Evaluation and Benchmarking Framework

Motivation

Deliverables

Roadmap

Milestones

License

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Packages