Agent Tool Interop Specification v0.1

A minimal, framework-agnostic specification for agent tooling primitives. Any agent framework can implement this. Any tool built to this spec works everywhere.

Motivation

The agent tooling ecosystem is fragmented. Every framework defines its own tool schema, trace format, and evaluation contract. This makes it impossible to share debuggers, testing frameworks, and monitoring dashboards across frameworks. This spec defines the minimum common abstractions needed for interoperability.

Design principles:

Minimal. Only what every framework needs. Nothing framework-specific.
JSON-serializable. All types round-trip through JSON.
Additive. Frameworks can extend with extra fields; consumers ignore unknown fields.
No runtime dependency. Just a schema + reference implementation.

1. Tool Schema

A ToolSpec describes a callable tool. All agent frameworks define tools; this is the common form.

{
  "name": "search_web",
  "description": "Search the web and return relevant results. Use when you need current information not in your training data.",
  "parameters": {
    "type": "object",
    "properties": {
      "query": {
        "type": "string",
        "description": "The search query"
      },
      "limit": {
        "type": "integer",
        "description": "Max results to return",
        "default": 5
      }
    },
    "required": ["query"]
  },
  "returns": {
    "type": "array",
    "items": {
      "type": "object",
      "properties": {
        "url":     {"type": "string"},
        "title":   {"type": "string"},
        "snippet": {"type": "string"}
      }
    }
  },
  "metadata": {
    "category": "retrieval",
    "cost_tier": "medium",
    "side_effects": false
  }
}

ToolSpec Fields

Field	Type	Required	Description
`name`	string	yes	Unique identifier, snake_case
`description`	string	yes	Human + LLM readable. What it does and when to use it.
`parameters`	JSON Schema object	yes	Input schema. Same format as OpenAI function calling.
`returns`	JSON Schema	no	Output schema. Helps agents understand what they'll receive.
`metadata.category`	string	no	retrieval \| compute \| io \| side_effect \| coordination
`metadata.cost_tier`	string	no	free \| cheap \| medium \| expensive
`metadata.side_effects`	bool	no	True if calling this tool changes external state

ToolCall and ToolResult

{
  "tool_call": {
    "id": "call_abc123",
    "name": "search_web",
    "arguments": {"query": "agent debugging tools 2026", "limit": 3},
    "timestamp": "2026-03-01T10:00:00Z"
  },
  "tool_result": {
    "call_id": "call_abc123",
    "output": [...],
    "error": null,
    "duration_ms": 1240,
    "cost_tokens": 0
  }
}

2. Trace Format

A ReasoningTrace is a structured record of one agent execution. Frameworks emit traces; debuggers, monitors, and evaluators consume them.

{
  "trace_id": "tr_7f3a9b",
  "agent_id": "my-research-agent",
  "session_id": "sess_abc",
  "started_at": "2026-03-01T10:00:00Z",
  "finished_at": "2026-03-01T10:00:15Z",
  "input": "What are the best agent debugging tools available today?",
  "output": "Based on my research, the leading agent debugging tools are...",
  "status": "success",
  "steps": [
    {
      "step_id": "step_001",
      "type": "reasoning",
      "content": "I need to search for current information about agent debugging tools.",
      "context_tokens": 1200,
      "timestamp": "2026-03-01T10:00:01Z"
    },
    {
      "step_id": "step_002",
      "type": "tool_call",
      "tool_call": {
        "id": "call_abc123",
        "name": "search_web",
        "arguments": {"query": "agent debugging tools 2026"}
      },
      "timestamp": "2026-03-01T10:00:02Z"
    },
    {
      "step_id": "step_003",
      "type": "tool_result",
      "tool_result": {
        "call_id": "call_abc123",
        "output": [...],
        "duration_ms": 1240
      },
      "timestamp": "2026-03-01T10:00:03Z"
    },
    {
      "step_id": "step_004",
      "type": "reasoning",
      "content": "The search returned 3 results. I'll synthesize them into an answer.",
      "context_tokens": 2800,
      "timestamp": "2026-03-01T10:00:04Z"
    }
  ],
  "metrics": {
    "total_tokens": 3200,
    "total_cost_usd": 0.0032,
    "total_duration_ms": 15000,
    "step_count": 4,
    "tool_call_count": 1
  },
  "metadata": {}
}

Step Types

Type	Fields	Description
`reasoning`	`content`, `context_tokens`	LLM reasoning / internal monologue
`tool_call`	`tool_call`	Agent is calling a tool
`tool_result`	`tool_result`	Result received from tool
`handoff`	`to_agent`, `message`	Handing off to sub-agent
`memory_read`	`query`, `results`	Reading from memory/retrieval
`memory_write`	`content`, `key`	Writing to memory

Trace Status Values

3. Evaluation Format

A EvalCase is a test case for an agent. EvalResult is what a judge produces. The key design: assertions are properties, not exact matches, because agents are non-deterministic.

{
  "eval_id": "eval_001",
  "case_id": "case_web_search_001",
  "description": "Agent should search for and summarize recent agent tooling news",
  "input": "What new agent debugging tools came out this year?",
  "context": {},
  "assertions": [
    {
      "type": "contains_tool_call",
      "tool_name": "search_web",
      "description": "Must call search_web at least once"
    },
    {
      "type": "output_property",
      "property": "mentions at least 2 specific tool names",
      "description": "Response should name concrete tools, not just describe categories"
    },
    {
      "type": "output_property",
      "property": "response length is between 100 and 500 words",
      "description": "Appropriate length — not too brief, not too long"
    },
    {
      "type": "no_hallucination",
      "description": "Claims should be supported by retrieved content"
    }
  ],
  "tags": ["retrieval", "synthesis"],
  "difficulty": "medium"
}

{
  "eval_id": "eval_001",
  "case_id": "case_web_search_001",
  "trace_id": "tr_7f3a9b",
  "passed": true,
  "assertion_results": [
    {"assertion_type": "contains_tool_call", "passed": true, "score": 1.0, "reason": "search_web called at step_002"},
    {"assertion_type": "output_property",    "passed": true, "score": 0.9, "reason": "Named LangSmith, Weights & Biases, and Braintrust"},
    {"assertion_type": "output_property",    "passed": true, "score": 1.0, "reason": "Response is 287 words"},
    {"assertion_type": "no_hallucination",   "passed": true, "score": 0.8, "reason": "All claims supported by search results"}
  ],
  "composite_score": 0.925,
  "judge_model": "gpt-4o-mini",
  "evaluated_at": "2026-03-01T10:01:00Z"
}

Assertion Types

Type	Description
`output_property`	LLM-judged property of the output (most flexible)
`contains_tool_call`	Tool was called (optionally: with specific args)
`tool_call_count`	Number of tool calls within min/max range
`no_hallucination`	Output claims are grounded in retrieved context
`response_format`	Output matches a JSON schema
`context_efficiency`	Token usage is within acceptable bounds
`latency`	Duration within acceptable range

4. Registry Format

A ToolRegistry is a discoverable catalog of tools. Agents can query it to find available tools.

{
  "registry_id": "agora-tools-v1",
  "version": "0.1.0",
  "updated_at": "2026-03-01T00:00:00Z",
  "tools": [
    {
      "spec": { ... },
      "endpoint": "https://tools.example.com/search_web",
      "auth": "bearer",
      "provider": "example-corp",
      "tags": ["retrieval", "web"]
    }
  ]
}

Implementation Status

Component	Status
ToolSpec (Python dataclass)	✅ Reference implementation in `agentool/`
ReasoningTrace (Python dataclass)	✅ Reference implementation in `agentool/`
EvalCase + EvalResult	✅ Reference implementation in `agentool/`
JSON Schema files	✅ `schemas/` directory
ToolRegistry	🔜 Planned for v0.2
OpenTelemetry exporter	🔜 Planned for v0.3

Compatibility

This spec is designed to be compatible with:

OpenAI function calling tool format (ToolSpec.parameters field is identical)
OpenTelemetry semantic conventions for LLM (trace steps map to span events)
Anthropic tool use format (minor field renaming)

Frameworks that implement this spec can interop with any tooling built to it: LangChain, LlamaIndex, Semantic Kernel, CrewAI, or custom frameworks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent Tool Interop Specification v0.1

Motivation

1. Tool Schema

ToolSpec Fields

ToolCall and ToolResult

2. Trace Format

Step Types

Trace Status Values

3. Evaluation Format

Assertion Types

4. Registry Format

Implementation Status

Compatibility

FilesExpand file tree

SPEC.md

Latest commit

History

SPEC.md

File metadata and controls

Agent Tool Interop Specification v0.1

Motivation

1. Tool Schema

ToolSpec Fields

ToolCall and ToolResult

2. Trace Format

Step Types

Trace Status Values

3. Evaluation Format

Assertion Types

4. Registry Format

Implementation Status

Compatibility