Skip to content

finos-labs/ai-evals-framework

badge-labs

AI Evaluation and Benchmarking Framework

A FINOS Labs initiative for building a taxonomy, datasets, and tooling for evaluating AI systems in financial services.

Motivation

AI systems are non-deterministic and financial tasks rarely have a single "correct" answer. Existing evaluation benchmarks often fail to address the complexity, risks, and compliance needs of financial services.

This project anchors evaluations in financial use cases, creating a taxonomy that links:

Use Cases → Risks → Metrics

By doing so, it bridges technical benchmarking with real business value, helping reduce compliance risk and improve trust in AI deployments.

Deliverables

For each financial use case, the framework will provide:

  • Test datasets
  • Synthetic data generation pipelines
  • Reference architectures & implementation strategies
  • Metrics & thresholds
  • Evaluation guidelines

Roadmap

The initiative follows a staged approach:

✅ Gather workshop and techsprint artefacts (Sept 2025) -> PDF

🔄 Literature review, infra setup, repo launch (Oct–Nov 2025)

🚧 Publish template repos & examples (Nov 2025)

🧪 Pilot with financial institutions (Q1 2026)

🌍 Expand shared taxonomy across industry (Q2 2026)

Milestones

Each milestone listed in the image below is linked to a github issue

202509 - FINOS AI Evals

License

Copyright 2025 FINOS

Distributed under the Apache License, Version 2.0.

SPDX-License-Identifier: Apache-2.0

About

No description, website, or topics provided.

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •