Skip to content

WHUNextGen/FinCDM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FinCDM

📊 Overview

FinCDM

FinCDM (Financial Cognitive Diagnosis Model) is a comprehensive evaluation framework for financial large language models. It moves beyond traditional score-level evaluation by providing knowledge-skill level diagnosis, identifying what financial skills and knowledge models possess or lack.

This project introduces a new paradigm for financial LLM evaluation by enabling interpretable, skill-aware diagnosis that supports more trustworthy and targeted model development.

📄 Paper

From Scores to Skills: A Cognitive Diagnosis Framework for Evaluating Financial Large Language Models

Abstract

We introduce FinCDM, the first cognitive diagnosis evaluation framework tailored for financial LLMs. Unlike existing benchmarks that rely on single aggregate scores, FinCDM evaluates models at the knowledge-skill level, revealing hidden knowledge gaps and identifying under-tested areas such as tax and regulatory reasoning often overlooked by traditional benchmarks.

📚 Datasets

We provide two comprehensive datasets for financial LLM evaluation:

1. FinCDM-FinEval-KQA

  • 🤗 Hugging Face: https://huggingface.co/datasets/NextGenWhu/FinCDM-FinEval-KQA
  • Description:
    A knowledge–skill annotated extension of the FinEval benchmark, designed to support fine-grained evaluation of financial reasoning capabilities.
  • Features:
    • Fine-grained financial knowledge and skill labels
    • Coverage of multiple financial sub-domains
    • Expert-validated annotations

2. FinCDM-Fin-KQA

  • 🤗 Hugging Face: https://huggingface.co/datasets/NextGenWhu/FinCDM-Fin-KQA
  • Description:
    A unified cognitively informed financial knowledge–skill evaluation dataset derived from professional certification examinations, integrating materials from both the Certified Public Accountant (CPA) and Chartered Financial Analyst (CFA) curricula.
  • Composition:
    • CFA-KQA
      • 123 evaluation questions
      • File: CFA-KQA.json
      • Focuses on advanced investment analysis, portfolio management, and professional ethics
    • CPA-KQA
      • 1,050 training questions
        • File: CPA-KQA-training.json
      • 210 evaluation questions
        • File: CPA-KQA-test.json
      • Covers real-world accounting, auditing, and financial reporting skills
  • Features:
    • Comprehensive coverage of professional-level financial and accounting competencies
    • Fine-grained knowledge and skill annotations
    • Expert annotations with high inter-annotator agreement
    • Designed to support both training and rigorous evaluation settings

🚀 Features

  • Cognitive diagnosis framework for financial LLMs
  • Knowledge-skill level evaluation beyond simple scores
  • Two comprehensive evaluation datasets (FinEval-KQA and CPA-KQA)
  • Evaluation scripts and tools (coming soon)
  • Model proficiency visualization
  • Skill acquisition pattern analysis
  • Behavioral cluster identification

🛠️ Key Innovations

  1. Knowledge-Skill Level Diagnosis : Unlike traditional benchmarks that provide single scores, FinCDM reveals specific strengths and weaknesses across different financial skills
  2. Comprehensive Coverage : Tests previously overlooked areas like:
  • Tax and regulatory reasoning
  • Deferred tax liabilities
  • Lease classification
  • Regulatory ratios
  1. Model Clustering Analysis : Identifies latent associations between financial concepts and reveals distinct clusters of models with similar skill acquisition patterns

📋 Prerequisites

  • Python 3.8+
  • Git
  • PyTorch >= 1.12.0
  • Transformers >= 4.25.0
  • NumPy, Pandas, Scikit-learn

💻 Installation

# Clone the repository
git clone https://github.com/WHUNextGen/FinCDM.git
cd FinCDM

# Install dependencies (once available)
pip install -r requirements.txt

📖 Usage

Loading Datasets

from datasets import load_dataset

# Load FinEval-KQA dataset
fineval_data = load_dataset("NextGenWhu/FinCDM-FinEval-KQA")

# Load CPA-KQA dataset  
cpa_data = load_dataset("NextGenWhu/FinCDM-CPA-KQA")

Running Evaluation

from fincdm import FinCDMEvaluator

# Initialize evaluator
evaluator = FinCDMEvaluator(data_root=".")

# Evaluate your model
results = FinCDMEvaluator().evaluate(
    q_path="",
    a_path="",
)
print(results.metrics)

# Get knowledge-skill diagnosis
diagnosis = evaluator.diagnose(resultsexport_csv="SK_df.csv")

📊 Experimental Results

Our extensive experiments on 30+ LLMs including:

  • Proprietary models (GPT-4, GPT-3.5, Claude)
  • Open-source models (LLaMA, Mistral, Qwen)
  • Domain-specific models (FinGPT, FinMA, FinQwen)

Key findings:

  • Reveals hidden knowledge gaps in state-of-the-art models
  • Identifies behavioral clusters among different model families
  • Uncovers specialization strategies in domain-specific models

🤝 Contributing

We welcome contributions! Please feel free to:

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

📝 Citation

If you use FinCDM in your research, please cite our paper:

@article{fincdm2024,
  title={From Scores to Skills: A Cognitive Diagnosis Framework for Evaluating Financial Large Language Models},
  author={Kuang, Ziyan and others},
  journal={arXiv preprint arXiv:2508.13491},
  year={2024}
}

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

👥 Team

  • WHU NextGen Team
  • Contributors from Wuhan University

🔗 Links

📧 Contact

For questions and feedback, please:

  • Open an issue on GitHub
  • Contact the WHU NextGen Team

Star this repository if you find it helpful!

🔥 Check out our datasets on Hugging Face for financial LLM evaluation!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages