3rd Place Solution for the ACM ICAIF 2025 Agentic Retrieval Grand Challenge.
Team members: Chun Chet Ng and Jia Yu Lim.
PRISM (Prompt-Refined In-Context System Modeling) addresses the challenge of financial document retrieval through a two-stage pipeline architecture. The framework integrates three core methodologies: (1) prompt engineering for task-specific optimization (Prompt-Refined), (2) in-context few-shot learning for semantic alignment (In-Context), and (3) multi-agent system modeling for collaborative reasoning (System Modeling). We evaluate both non-agentic and agentic variants on the document and chunk ranking tasks introduced in FinAgentBench.
git clone https://github.com/yourusername/prism_finagentbench.git
cd prism_finagentbench
pip install -r requirements.txt
cp .env.example .env
# Edit .env with your Azure OpenAI credentialsLLM Provider: Configure Azure OpenAI endpoint and key in the .env file.
This research was supported by Azure credits through the Microsoft for Startups program.
Data Setup: Download the dataset from the Kaggle competition and place the JSONL files in the following structure:
prism_finagentbench/
└── data/
├── chunk_ranking_kaggle_dev.jsonl
├── chunk_ranking_kaggle_eval.jsonl
├── document_ranking_kaggle_dev.jsonl
└── document_ranking_kaggle_eval.jsonl
Edit configuration in main.py:
dry_run = False # Set True for testing
agentic_workflow = False # False: non-agentic, True: agentic workflow
agentic_version = 4 # Agentic version (1-4), only used if agentic_workflow=True
agent_concurrency = 2 # Concurrent agents (agentic only)
use_doc_icl = True # Enable ICL for document ranking
use_chunk_icl = True # Enable ICL for chunk ranking
icl_n = 5 # Number of ICL examplesRun the pipeline:
python main.py| Workflow | Architecture | Speed | Cost | Characteristics |
|---|---|---|---|---|
| Non-Agentic | Split-based processing | Fast | Low | Direct LLM prompting with split-based chunk processing |
| Agentic V1 | Multi-role (4 agents) | Slow | High | Democratic consensus across diverse organizational perspectives |
| Agentic V2 | Three-phase filtering | Medium | Medium-High | Progressive noise reduction through specialized phases |
| Agentic V3 | Adaptive filtering | Medium-Fast | Medium | Confidence-based dynamic filtering with batch support |
| Agentic V4 | Dual-analyst | Fast | Low-Medium | Balanced quantitative-qualitative evaluation |
V1 - Multi-Role Democratic Consensus
- Agents: CEO, Financial Analyst, Operations Manager, Risk Analyst
- Consensus: Arithmetic mean of all agent scores
V2 - Three-Phase Specialized Processing
- Phase 1: Noise removal (keeps 100-200 chunks)
- Phase 2: Candidate selection (keeps 50-100 chunks)
- Phase 3: Deep scoring with 4 specialized agents
- Consensus: Weighted ensemble (Relevance: 0.35, Context: 0.35, Evidence: 0.2, Diversity: -0.15)
V3 - Two-Stage Adaptive Filtering
- Stage 1: Confidence-based adaptive filtering (30-70% retention)
- Stage 2: Parallel deep scoring by 3 analytical agents
- Supports batch processing for cost optimization
V4 - Dual-Analyst
- Agents: Financial Analyst (quantitative) + Risk Analyst (qualitative)
- Consensus: Equal-weighted averaging
- Submission:
./submission_files/{run_id}_{timestamp}/{run_id}_{timestamp}_kaggle_submission.csv - LLM Outputs:
./llm_output/doc_output/{run_id}_{timestamp}and./llm_output/chunk_output/{run_id}_{timestamp} - Checkpoints:
./checkpoints/(resume interrupted runs)
Single-pass LLM evaluation with efficient split-based processing and lower LLM cost. Document Ranking: Direct LLM prompting with In-Context Learning (ICL) examples → Top-5 document type selection
Chunk Ranking:
- Split chunks into manageable subsets (default: 5 splits)
- Rank chunks within each split using LLM with ICL
- Extract top candidates from each split
- Re-rank all candidates to produce final top-5
Multi-agent collaboration with LangGraph-orchestrated workflows and higher interpretability through agent reasoning. Document Ranking: Question analysis → Parallel evaluation by specialized document agents (10-K, 10-Q, 8-K, DEF14A, Earnings) → Weighted consensus → Top-5 selection
Chunk Ranking (Version-Dependent):
- V1: Parallel evaluation by 4 organizational role agents → Cross-agent discussion → Democratic consensus
- V2: Noise removal → Candidate selection → Deep scoring by 4 specialized agents → Weighted ensemble
- V3: Confidence-based quick filtering → Deep scoring by 3 analytical agents → Confidence-weighted aggregation
- V4: Parallel evaluation by 2 specialized analysts → Cross-analyst discussion → Equal-weighted consensus
This project is licensed under the AGPL-3.0 License. See LICENSE for details.
Research Collaboration: chunchet.ng [at] ailensgroup [dot] com, alexlow [at] ailensgroup [dot] com
© 2025 AI Lens Sdn. Bhd. (Company No. 1547854-U). Commercial rights reserved.