feat: add Eval System for end-to-end agent evaluation by singhhnitin · Pull Request #6506 · aden-hive/hive

singhhnitin · 2026-03-15T17:11:31Z

Description

Implements the Eval System from the official roadmap — the first ever end-to-end agent graph evaluation framework for Aden Hive. Defines a new framework.eval module that lets developers benchmark agent quality across content correctness, latency, cost, tool usage, and semantic quality via LLM-as-judge.

Type of Change

New feature (non-breaking change that adds functionality)

Related Issues

Closes eval-system item from roadmap.md

Changes Made

New core/framework/eval/ module (6 files, zero new dependencies)
EvalCase / EvalSuite — YAML-driven eval definitions with tag filtering and weighted scoring
EvalScorer — multi-dimension scoring: content, latency, cost, tool usage, LLM-as-judge
EvalRunner — async runner with concurrency control via asyncio.Semaphore
EvalReport — aggregate report with JSON and Markdown export
hive eval run and hive eval report CLI commands wired into framework/cli.py
--fail-under flag for CI/CD pass rate gating
Example eval suite at core/framework/eval/basic_agent_eval.yaml
17 unit tests covering all scoring dimensions

Testing

Unit tests pass (cd core && pytest tests/test_eval/ — 17/17 passed)
Lint passes (cd core && ruff check .)
Manual testing performed

Checklist

My code follows the project's style guidelines
I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Usage

hive eval run --suite core/framework/eval/basic_agent_eval.yaml --agent exports/my-agent --verbose
hive eval run --suite my_suite.yaml --agent exports/agent --fail-under 0.8 --output-json report.json
hive eval report report.json

- New framework.eval module with EvalCase, EvalSuite, EvalScorer, EvalRunner, EvalReport - Multi-dimension scoring: content, performance, tool usage, LLM-as-judge - YAML-defined eval suites with tag filtering and weighted scoring - CLI: hive eval run --suite <yaml> --agent <path> - CLI: hive eval report <json> - JSON and Markdown report export - CI-friendly --fail-under threshold for pass rate gating - Example eval suite in core/framework/eval/basic_agent_eval.yaml Closes #eval-system roadmap item

github-actions · 2026-03-15T17:11:39Z

PR Requirements Warning

This PR does not meet the contribution requirements.
If the issue is not fixed within ~24 hours, it may be automatically closed.

Missing: No linked issue found.

To fix:

Create or find an existing issue for this work
Assign yourself to the issue
Re-open this PR and add Fixes #123 in the description

Exception: To bypass this requirement, you can:

Add the micro-fix label or include micro-fix in your PR title for trivial fixes
Add the documentation label or include doc/docs in your PR title for documentation changes

Micro-fix requirements (must meet ALL):

Qualifies	Disqualifies
< 20 lines changed	Any functional bug fix
Typos & Documentation & Linting	Refactoring for "clean code"
No logic/API/DB changes	New features (even tiny ones)

Why is this required? See #472 for details.

Nitin Singh added 2 commits March 15, 2026 22:31

test: add unit tests for eval system (17 tests, all passing)

2f0ca89

github-actions bot added the pr-requirements-warning PR doesn't follow contribution guidelines. Please fix or it will be auto-closed. label Mar 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Eval System for end-to-end agent evaluation#6506

feat: add Eval System for end-to-end agent evaluation#6506
singhhnitin wants to merge 2 commits intoaden-hive:mainfrom
singhhnitin:feature/eval-system

singhhnitin commented Mar 15, 2026

Uh oh!

github-actions bot commented Mar 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

singhhnitin commented Mar 15, 2026

Description

Type of Change

Related Issues

Changes Made

Testing

Checklist

Usage

Uh oh!

github-actions bot commented Mar 15, 2026

PR Requirements Warning

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant