Agent Tool Interop Spec

A minimal, framework-agnostic specification for agent tooling primitives.

Inspired by The Tooling Desert — a call for shared standards so the agent ecosystem stops reinventing the wheel.

The Problem

Every agent framework defines its own tool schema, trace format, and evaluation contract. This makes debuggers, testing frameworks, and monitoring tools framework-specific. An evaluator built for LangChain doesn't work with LlamaIndex. A tracer for Semantic Kernel doesn't work with CrewAI.

What This Is

Three common abstractions + a reference Python implementation:

Primitive	What it is
ToolSpec	Framework-agnostic tool definition. Exports to OpenAI and Anthropic formats.
ReasoningTrace	Structured execution record. Emitted by agents, consumed by debuggers and monitors.
EvalCase / EvalResult	Property-based evaluation. Handles non-determinism — assertions are properties, not exact matches.

Install

No dependencies beyond Python 3.10+. Just copy the agentool/ directory into your project, or install:

pip install agentool   # when published — for now, copy agentool/

Usage

Define a tool

from agentool import ToolSpec

spec = ToolSpec(
    name="search_web",
    description="Search the web. Use when you need current information not in your training data.",
    parameters={
        "type": "object",
        "properties": {
            "query": {"type": "string", "description": "The search query"},
            "limit": {"type": "integer", "default": 5},
        },
        "required": ["query"],
    },
    metadata={"cost_tier": "medium", "side_effects": False},
)

# Export to any framework
spec.to_openai()     # OpenAI function calling format
spec.to_anthropic()  # Anthropic tool use format
spec.to_dict()       # Canonical spec format

Record a trace

from agentool import ReasoningTrace, TraceStatus
from agentool.tool import ToolCall, ToolResult

trace = ReasoningTrace(agent_id="my-research-agent", input="What debugging tools exist for agents?")

trace.add_reasoning("I need to search for current agent debugging tools.")

call = ToolCall(name="search_web", arguments={"query": "agent debugging tools 2026"})
trace.add_tool_call(call)
trace.add_tool_result(ToolResult(call_id=call.id, output=[...], duration_ms=1240))

trace.add_reasoning("Found 3 results. Synthesizing an answer.")
trace.finish(output="The main tools are LangSmith, Braintrust, and...", status=TraceStatus.SUCCESS)

print(trace.to_dict())        # Full JSON-serializable trace
print(trace.has_loop())       # Loop detection
print(trace.tool_calls())     # All tool calls

Write eval cases

from agentool import EvalCase, Assertion
from agentool.eval import AssertionType, evaluate, evaluate_structural

case = EvalCase(
    case_id="search_and_summarize_001",
    description="Agent should search and cite specific tool names",
    input="What new agent debugging tools came out this year?",
    assertions=[
        Assertion(
            type=AssertionType.CONTAINS_TOOL_CALL,
            tool_name="search_web",
            description="Must search the web",
        ),
        Assertion(
            type=AssertionType.TOOL_CALL_COUNT,
            min_calls=1, max_calls=3,
            description="Shouldn't over-call tools",
        ),
        Assertion(
            type=AssertionType.OUTPUT_PROPERTY,
            property="mentions at least 2 specific tool names",
            description="Should cite concrete tools, not just categories",
        ),
        Assertion(
            type=AssertionType.LATENCY,
            max_ms=30000,
            description="Should respond within 30 seconds",
        ),
    ],
    tags=["retrieval", "synthesis"],
)

# Evaluate without an LLM judge (structural assertions only):
structural_results = evaluate_structural(case, trace)

# Evaluate with an LLM judge (all assertions):
def my_judge(output, context, property_desc):
    # Call your LLM here — return (passed, score, reason)
    ...

result = evaluate(case, trace, judge_fn=my_judge, judge_model="gpt-4o-mini")
print(result.passed, result.composite_score)
print(result.failed_assertions)

Spec

Full specification in SPEC.md. Covers:

ToolSpec JSON schema
ReasoningTrace format and step types
EvalCase assertion types
ToolRegistry format (v0.2)

Compatibility

Format	ToolSpec	Trace
OpenAI function calling	`to_openai()` / `from_openai()`	—
Anthropic tool use	`to_anthropic()` / `from_anthropic()`	—
OpenTelemetry LLM spans	—	planned v0.3
LangSmith traces	—	compatible (step types map to span events)

Contributing

This spec is intentionally minimal. If you're building agent tooling and want to target this spec, open an issue or PR. The goal is convergence, not lock-in.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
agentool		agentool
README.md		README.md
SPEC.md		SPEC.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agent Tool Interop Spec

The Problem

What This Is

Install

Usage

Define a tool

Record a trace

Write eval cases

Spec

Compatibility

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Agent Tool Interop Spec

The Problem

What This Is

Install

Usage

Define a tool

Record a trace

Write eval cases

Spec

Compatibility

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages