Skip to content

Releases: enkronos/agentevalops

Public Alpha — v0.1.0-alpha.1

14 Mar 18:42
55bd4a2

Choose a tag to compare

AgentEvalOps — Public Alpha

Failure-mode evaluation harness for agent systems.

AgentEvalOps provides structured scenarios and evaluation tooling
to test how agent systems behave under failure conditions.

Key capabilities in this release:

• Scenario specification for agent failure modes
• Scenario validation via JSON schema
• Evaluation runner and scorecard generation
• Pack workflows for reproducible evaluation suites
• CLI interface for validation, inspection, and evaluation
• Schema-validated reports for evaluation results
• CI, tests, and release infrastructure

Example CLI commands:

agentevalops scenario validate
agentevalops scenario inspect
agentevalops evaluate

agentevalops pack validate
agentevalops pack inspect
agentevalops pack evaluate

Project status:

This is an early public alpha release.

APIs, schemas, and CLI behavior may evolve
as the project matures.

Relationship to other projects:

AgentEvalOps is part of a collection of small open infrastructure
modules for reliable and governable agent systems, including:

• SkillPulse — detect and heal broken agent skills
• Agent Contracts — structured task delegation
• CapToken — capability tokens for authorization
• TraceForge — portable execution traces
• GuardMesh — runtime policy enforcement (in development)

Feedback and contributions are welcome.