FinCDM

📊 Overview

FinCDM (Financial Cognitive Diagnosis Model) is a comprehensive evaluation framework for financial large language models. It moves beyond traditional score-level evaluation by providing knowledge-skill level diagnosis, identifying what financial skills and knowledge models possess or lack.

This project introduces a new paradigm for financial LLM evaluation by enabling interpretable, skill-aware diagnosis that supports more trustworthy and targeted model development.

📄 Paper

From Scores to Skills: A Cognitive Diagnosis Framework for Evaluating Financial Large Language Models

📖 Paper: Hugging Face Paper Page
📝 arXiv: arXiv:2508.13491

Abstract

We introduce FinCDM, the first cognitive diagnosis evaluation framework tailored for financial LLMs. Unlike existing benchmarks that rely on single aggregate scores, FinCDM evaluates models at the knowledge-skill level, revealing hidden knowledge gaps and identifying under-tested areas such as tax and regulatory reasoning often overlooked by traditional benchmarks.

📚 Datasets

We provide two comprehensive datasets for financial LLM evaluation:

1. FinCDM-FinEval-KQA

🤗 Hugging Face: https://huggingface.co/datasets/NextGenWhu/FinCDM-FinEval-KQA
Description:
A knowledge–skill annotated extension of the FinEval benchmark, designed to support fine-grained evaluation of financial reasoning capabilities.
Features:
- Fine-grained financial knowledge and skill labels
- Coverage of multiple financial sub-domains
- Expert-validated annotations

2. FinCDM-Fin-KQA

🤗 Hugging Face: https://huggingface.co/datasets/NextGenWhu/FinCDM-Fin-KQA
Description:
A unified cognitively informed financial knowledge–skill evaluation dataset derived from professional certification examinations, integrating materials from both the Certified Public Accountant (CPA) and Chartered Financial Analyst (CFA) curricula.
Composition:
- CFA-KQA
  - 123 evaluation questions
  - File: CFA-KQA.json
  - Focuses on advanced investment analysis, portfolio management, and professional ethics
- CPA-KQA
  - 1,050 training questions
    - File: CPA-KQA-training.json
  - 210 evaluation questions
    - File: CPA-KQA-test.json
  - Covers real-world accounting, auditing, and financial reporting skills
Features:
- Comprehensive coverage of professional-level financial and accounting competencies
- Fine-grained knowledge and skill annotations
- Expert annotations with high inter-annotator agreement
- Designed to support both training and rigorous evaluation settings

🚀 Features

Cognitive diagnosis framework for financial LLMs
Knowledge-skill level evaluation beyond simple scores
Two comprehensive evaluation datasets (FinEval-KQA and CPA-KQA)
Evaluation scripts and tools (coming soon)
Model proficiency visualization
Skill acquisition pattern analysis
Behavioral cluster identification

🛠️ Key Innovations

Knowledge-Skill Level Diagnosis : Unlike traditional benchmarks that provide single scores, FinCDM reveals specific strengths and weaknesses across different financial skills
Comprehensive Coverage : Tests previously overlooked areas like:

Tax and regulatory reasoning
Deferred tax liabilities
Lease classification
Regulatory ratios

Model Clustering Analysis : Identifies latent associations between financial concepts and reveals distinct clusters of models with similar skill acquisition patterns

📋 Prerequisites

Python 3.8+
Git
PyTorch >= 1.12.0
Transformers >= 4.25.0
NumPy, Pandas, Scikit-learn

💻 Installation

# Clone the repository
git clone https://github.com/WHUNextGen/FinCDM.git
cd FinCDM

# Install dependencies (once available)
pip install -r requirements.txt

📖 Usage

Loading Datasets

from datasets import load_dataset

# Load FinEval-KQA dataset
fineval_data = load_dataset("NextGenWhu/FinCDM-FinEval-KQA")

# Load CPA-KQA dataset  
cpa_data = load_dataset("NextGenWhu/FinCDM-CPA-KQA")

Running Evaluation

from fincdm import FinCDMEvaluator

# Initialize evaluator
evaluator = FinCDMEvaluator(data_root=".")

# Evaluate your model
results = FinCDMEvaluator().evaluate(
    q_path="",
    a_path="",
)
print(results.metrics)

# Get knowledge-skill diagnosis
diagnosis = evaluator.diagnose(results，export_csv="SK_df.csv")

📊 Experimental Results

Our extensive experiments on 30+ LLMs including:

Proprietary models (GPT-4, GPT-3.5, Claude)
Open-source models (LLaMA, Mistral, Qwen)
Domain-specific models (FinGPT, FinMA, FinQwen)

Key findings:

Reveals hidden knowledge gaps in state-of-the-art models
Identifies behavioral clusters among different model families
Uncovers specialization strategies in domain-specific models

🤝 Contributing

We welcome contributions! Please feel free to:

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

📝 Citation

If you use FinCDM in your research, please cite our paper:

@article{fincdm2024,
  title={From Scores to Skills: A Cognitive Diagnosis Framework for Evaluating Financial Large Language Models},
  author={Kuang, Ziyan and others},
  journal={arXiv preprint arXiv:2508.13491},
  year={2024}
}

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

👥 Team

WHU NextGen Team
Contributors from Wuhan University

🔗 Links

GitHub Repository : https://github.com/WHUNextGen/FinCDM
Paper : https://huggingface.co/papers/2412.06264
FinEval-KQA Dataset : https://huggingface.co/datasets/NextGenWhu/FinCDM-FinEval-KQA
CPA-KQA Dataset : https://huggingface.co/datasets/NextGenWhu/FinCDM-CPA-KQA

📧 Contact

For questions and feedback, please:

Open an issue on GitHub
Contact the WHU NextGen Team

⭐ Star this repository if you find it helpful!

🔥 Check out our datasets on Hugging Face for financial LLM evaluation!

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
data		data
docs		docs
fig		fig
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FinCDM

📊 Overview

📄 Paper

Abstract

📚 Datasets

1. FinCDM-FinEval-KQA

2. FinCDM-Fin-KQA

🚀 Features

🛠️ Key Innovations

📋 Prerequisites

💻 Installation

📖 Usage

Loading Datasets

Running Evaluation

📊 Experimental Results

🤝 Contributing

📝 Citation

📄 License

👥 Team

🔗 Links

📧 Contact

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

WHUNextGen/FinCDM

Folders and files

Latest commit

History

Repository files navigation

FinCDM

📊 Overview

📄 Paper

Abstract

📚 Datasets

1. FinCDM-FinEval-KQA

2. FinCDM-Fin-KQA

🚀 Features

🛠️ Key Innovations

📋 Prerequisites

💻 Installation

📖 Usage

Loading Datasets

Running Evaluation

📊 Experimental Results

🤝 Contributing

📝 Citation

📄 License

👥 Team

🔗 Links

📧 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages