Skip to content

venturevd/agent-tool-spec

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

Agent Tool Interop Spec

A minimal, framework-agnostic specification for agent tooling primitives.

Inspired by The Tooling Desert — a call for shared standards so the agent ecosystem stops reinventing the wheel.

The Problem

Every agent framework defines its own tool schema, trace format, and evaluation contract. This makes debuggers, testing frameworks, and monitoring tools framework-specific. An evaluator built for LangChain doesn't work with LlamaIndex. A tracer for Semantic Kernel doesn't work with CrewAI.

What This Is

Three common abstractions + a reference Python implementation:

Primitive What it is
ToolSpec Framework-agnostic tool definition. Exports to OpenAI and Anthropic formats.
ReasoningTrace Structured execution record. Emitted by agents, consumed by debuggers and monitors.
EvalCase / EvalResult Property-based evaluation. Handles non-determinism — assertions are properties, not exact matches.

Install

No dependencies beyond Python 3.10+. Just copy the agentool/ directory into your project, or install:

pip install agentool   # when published — for now, copy agentool/

Usage

Define a tool

from agentool import ToolSpec

spec = ToolSpec(
    name="search_web",
    description="Search the web. Use when you need current information not in your training data.",
    parameters={
        "type": "object",
        "properties": {
            "query": {"type": "string", "description": "The search query"},
            "limit": {"type": "integer", "default": 5},
        },
        "required": ["query"],
    },
    metadata={"cost_tier": "medium", "side_effects": False},
)

# Export to any framework
spec.to_openai()     # OpenAI function calling format
spec.to_anthropic()  # Anthropic tool use format
spec.to_dict()       # Canonical spec format

Record a trace

from agentool import ReasoningTrace, TraceStatus
from agentool.tool import ToolCall, ToolResult

trace = ReasoningTrace(agent_id="my-research-agent", input="What debugging tools exist for agents?")

trace.add_reasoning("I need to search for current agent debugging tools.")

call = ToolCall(name="search_web", arguments={"query": "agent debugging tools 2026"})
trace.add_tool_call(call)
trace.add_tool_result(ToolResult(call_id=call.id, output=[...], duration_ms=1240))

trace.add_reasoning("Found 3 results. Synthesizing an answer.")
trace.finish(output="The main tools are LangSmith, Braintrust, and...", status=TraceStatus.SUCCESS)

print(trace.to_dict())        # Full JSON-serializable trace
print(trace.has_loop())       # Loop detection
print(trace.tool_calls())     # All tool calls

Write eval cases

from agentool import EvalCase, Assertion
from agentool.eval import AssertionType, evaluate, evaluate_structural

case = EvalCase(
    case_id="search_and_summarize_001",
    description="Agent should search and cite specific tool names",
    input="What new agent debugging tools came out this year?",
    assertions=[
        Assertion(
            type=AssertionType.CONTAINS_TOOL_CALL,
            tool_name="search_web",
            description="Must search the web",
        ),
        Assertion(
            type=AssertionType.TOOL_CALL_COUNT,
            min_calls=1, max_calls=3,
            description="Shouldn't over-call tools",
        ),
        Assertion(
            type=AssertionType.OUTPUT_PROPERTY,
            property="mentions at least 2 specific tool names",
            description="Should cite concrete tools, not just categories",
        ),
        Assertion(
            type=AssertionType.LATENCY,
            max_ms=30000,
            description="Should respond within 30 seconds",
        ),
    ],
    tags=["retrieval", "synthesis"],
)

# Evaluate without an LLM judge (structural assertions only):
structural_results = evaluate_structural(case, trace)

# Evaluate with an LLM judge (all assertions):
def my_judge(output, context, property_desc):
    # Call your LLM here — return (passed, score, reason)
    ...

result = evaluate(case, trace, judge_fn=my_judge, judge_model="gpt-4o-mini")
print(result.passed, result.composite_score)
print(result.failed_assertions)

Spec

Full specification in SPEC.md. Covers:

  • ToolSpec JSON schema
  • ReasoningTrace format and step types
  • EvalCase assertion types
  • ToolRegistry format (v0.2)

Compatibility

Format ToolSpec Trace
OpenAI function calling to_openai() / from_openai()
Anthropic tool use to_anthropic() / from_anthropic()
OpenTelemetry LLM spans planned v0.3
LangSmith traces compatible (step types map to span events)

Contributing

This spec is intentionally minimal. If you're building agent tooling and want to target this spec, open an issue or PR. The goal is convergence, not lock-in.

About

Agent Tool Interoperability Spec — standard protocol for tool discovery and invocation across AI agents

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages