This repository contains a comprehensive testing framework for Sentry's AI SDK integrations. Sentry's @sentry/node (JavaScript) and sentry-sdk (Python) have auto-enabled integrations for popular AI SDKs. This project ensures those integrations work correctly across all supported AI SDKs and captures breakages when new AI SDK versions are released.
- Catch integration breakages early - Detect when new AI SDK versions break Sentry instrumentation
- Comprehensive coverage - Test all popular AI SDKs that Sentry supports
- Language parity - Identical test behavior across JavaScript and Python
- Clear error messages - When tests fail, show exactly what's wrong
- Template-based test generation - Nunjucks templates generate runnable test files for each framework
Do NOT add check names to skip.checks in config.json just because they fail. The entire purpose of this framework is to surface failures. If a check fails, it means the Sentry integration has a real issue that needs to be fixed — that is a valid, expected test result. Skipping it hides the problem and makes this tool useless. Failing tests are the output of this project, not a problem to be worked around.
This project uses a template-based test generation approach. Test definitions (TypeScript) combined with framework templates (Nunjucks) generate runnable test files. A span collector HTTP server captures Sentry data for validation.
testing-ai-sdk-integrations/
├── src/ # TypeScript source code (ES modules)
│ ├── cli.ts # CLI entry point
│ ├── orchestrator.ts # Main test coordinator
│ ├── types.ts # Core type definitions
│ ├── validator.ts # Test validation logic
│ ├── setup.ts # Setup utilities
│ ├── concurrency.ts # Parallel execution support
│ ├── test-cases/ # Test definitions
│ │ ├── index.ts # Test registry
│ │ ├── checks.ts # Reusable check functions (LLM/agent/embeddings)
│ │ ├── mcp-checks.ts # MCP-specific check functions
│ │ ├── utils.ts # Test utilities (skip, assertions)
│ │ ├── llm/ # LLM test cases
│ │ │ ├── basic.ts # Basic single completion test
│ │ │ ├── multi-turn.ts # Multi-turn conversation test
│ │ │ ├── basic-error.ts # Error handling test
│ │ │ ├── vision.ts # Vision/image input test
│ │ │ └── long-input.ts # Long input trimming test
│ │ ├── agents/ # Agent test cases
│ │ │ ├── basic.ts # Basic agent (no tools)
│ │ │ ├── tool-call.ts # Agent with tool calling
│ │ │ ├── tool-error.ts # Tool error handling
│ │ │ ├── vision.ts # Vision agent test
│ │ │ └── long-input.ts # Long input agent test
│ │ ├── embeddings/ # Embeddings test cases
│ │ │ └── basic.ts # Basic embedding test
│ │ └── mcp/ # MCP server test cases
│ │ ├── basic-tool.ts # Basic tool call test
│ │ ├── tool-error.ts # Tool error handling test
│ │ ├── multi-tool.ts # Multiple tool calls test
│ │ ├── resource-read.ts # Resource read test
│ │ └── prompt-get.ts # Prompt retrieval test
│ ├── runner/ # Test execution
│ │ ├── runner.ts # Main runner
│ │ ├── javascript-runner.ts # JS (Node + Next.js) execution
│ │ ├── python-runner.ts # Python execution
│ │ ├── browser-runner.ts # Browser execution (Playwright)
│ │ ├── cloudflare-runner.ts # Cloudflare Workers execution (wrangler dev)
│ │ ├── framework-config.ts # Framework configuration types
│ │ ├── framework-discovery.ts # Auto-discovers frameworks
│ │ ├── template-renderer.ts # Nunjucks template rendering
│ │ └── templates/ # Framework templates (see below)
│ ├── span-collector/ # HTTP server to capture Sentry data
│ │ ├── server.ts # Hono HTTP server
│ │ └── store.ts # In-memory span storage
│ └── reporters/ # Test output reporters
│ ├── ctrf-reporter.ts # CTRF JSON report generator
│ └── live-status.ts # Real-time test status display
├── dist/ # Compiled JavaScript output
├── runs/ # Generated test files per run
├── test-results/ # Generated reports
│ ├── ctrf-report-*.json
│ └── test-report-*.html
├── docs/ # Documentation
└── package.json
Templates are organized by category (llm, agents, embeddings), then platform (node, python, browser), then framework name. The framework folder name is the SDK or framework that Sentry instruments (e.g. openai = OpenAI SDK, langchain = LangChain); the fact that a template calls a given provider (e.g. LangChain using OpenAI) is an implementation detail. See src/runner/templates/README.md for the full naming convention and options to reduce confusion.
src/runner/templates/
├── base.node.njk # Base JavaScript (Node) template
├── base.python.njk # Base Python template
├── base.browser.njk # Base JavaScript (browser) template
├── base.nextjs.njk # Base Next.js template
├── base.cloudflare.njk # Base Cloudflare Workers template
├── base.php.njk # Base PHP (Laravel) template
├── llm/ # Low-level LLM frameworks
│ ├── node/
│ │ ├── anthropic/ # config.json + template.njk
│ │ ├── google-genai/
│ │ ├── langchain/
│ │ └── openai/
│ └── python/
│ │ ├── anthropic/
│ │ ├── langchain/
│ │ ├── litellm/
│ │ └── openai/
│ ├── browser/
│ │ ├── anthropic/
│ │ ├── google-genai/
│ │ ├── langchain/
│ │ └── openai/
│ ├── cloudflare/
│ │ ├── anthropic/
│ │ ├── google-genai/
│ │ └── openai/
│ └── nextjs/
│ ├── anthropic/
│ ├── google-genai/
│ ├── langchain/
│ └── openai/
├── agents/ # Agentic frameworks
│ ├── node/
│ │ ├── langgraph/
│ │ ├── mastra/
│ │ └── vercel/
│ ├── browser/
│ │ └── langgraph/
│ └── python/
│ │ ├── google-genai/
│ │ ├── langgraph/
│ │ ├── openai-agents/
│ │ └── pydantic-ai/
│ ├── cloudflare/
│ │ └── vercel/
│ ├── nextjs/
│ │ ├── mastra/
│ │ └── vercel/
│ └── php/
│ └── laravel/ # config.json + template.njk + agent.php.njk + tool.php.njk
├── embeddings/ # Embedding frameworks
| ├── node/
| │ ├── google-genai/
| │ ├── langchain/
| │ ├── openai/
| │ └── vercel/
| ├── browser/
| │ ├── google-genai/
| │ ├── langchain/
| │ └── openai/
| ├── cloudflare/
| │ ├── google-genai/
| │ ├── openai/
| │ └── vercel/
| ├── nextjs/
| │ ├── google-genai/
| │ ├── langchain/
| │ ├── openai/
| │ └── vercel/
| ├── python/
| │ ├── google-genai/
| │ ├── langchain/
| │ ├── litellm/
| │ ├── manual/ # Manual instrumentation (no SDK dependency)
| │ └── openai/
| └── php/
| └── laravel/
└── mcp/ # MCP server frameworks
└── python/
├── fastmcp/
└── mcp/
# Install dependencies
npm install
# Build TypeScript
npm run build
# List all discovered frameworks
npm run test list
# Run all tests
npm run test run
# Run tests for a specific framework
npm run test -- --framework openai
# Run tests for a specific platform (node, python, browser, nextjs, php, cloudflare, or js)
npm run test -- --platform python
npm run test -- --platform browser
npm run test -- --platform nextjs
npm run test -- --platform php # PHP platform (Laravel)
npm run test -- --platform cloudflare # Cloudflare Workers platform
npm run test -- --platform js # all JS platforms (node + browser + cloudflare)
# Run tests for a specific type (llm, agents, embeddings, mcp)
npm run test -- --type embeddings
npm run test -- --type mcp
npm run test -- --type llm --platform python
# Filter by framework option (for frameworks with generic options)
npm run test -- --framework mcp --option apiStyle=highlevel
# Run with verbose output
npm run test -- --framework openai --verbose
# Run only streaming tests
npm run test -- --streaming
# Run only sync tests (Python)
npm run test -- --platform python --sync
# Run tests in parallel
npm run test -- -j=4
# Setup only (generate test files without running)
npm run test setup -- --framework openaiUsage:
npm run test [command] [options]
Commands:
run Run tests (default)
setup Setup environments and render templates (no test execution)
list List discovered frameworks
Options:
--framework <name> Filter by framework name
--test <name> Filter by test name
--type <type> Filter by framework type (llm, agents, embeddings, mcp)
--platform <node|python|browser|nextjs|php|cloudflare|js> Filter by platform (js = node + browser + cloudflare)
--sync Run only sync tests (default: both)
--async Run only async tests (default: both)
--streaming Run only streaming tests (default: both)
--blocking Run only blocking (non-streaming) tests (default: both)
--parallel, -j <N> Run up to N tests in parallel (default: 1)
--verbose, -v Show detailed output (test execution logs, etc.)
--live-status Enable live status display (real-time tree view)
--option <key=value> Filter by framework option (repeatable, e.g., --option apiStyle=highlevel)
--open Open HTML report in browser after test run
--sentry-python <path> Use local Sentry Python SDK (editable install)
--sentry-javascript <path> Use local Sentry JavaScript SDK (link)
--sentry-php <path> Use local Sentry PHP SDK (core sentry/sentry-php)
--sentry-laravel <path> Use local Sentry Laravel SDK (composer path repository)
--help, -h Show this help message
- Discovery:
framework-discovery.tsscanstemplates/directory forconfig.jsonfiles - Matrix Generation: Creates test matrix (framework x test definition x execution modes)
- Template Rendering: Uses Nunjucks to generate runnable test files from templates
- Execution: Runs generated tests with Sentry DSN pointing to span collector
- Validation: Runs check functions from
checksarray against captured spans - Reporting: Generates console output + CTRF JSON + HTML reports
TestDefinition (TypeScript) + Framework Template (Nunjucks)
↓
Template Renderer generates test file
↓
Runner executes test file
↓
Sentry SDK sends spans to Span Collector
↓
Validator runs checks array on captured spans
↓
Reporter outputs results
| Platform | SDK | Category | Type | Streaming | Execution Modes |
|---|---|---|---|---|---|
| JavaScript (Node) | openai |
llm | llm-only | both | - |
| JavaScript (Node) | anthropic |
llm | llm-only | both | - |
| JavaScript (Node) | google-genai |
llm | llm-only | both | - |
| JavaScript (Node) | langchain |
llm | llm-only | both | - |
| JavaScript (Node) | vercel |
agents | agentic | - | - |
| JavaScript (Node) | langgraph |
agents | agentic | - | - |
| JavaScript (Node) | mastra |
agents | agentic | - | - |
| Browser | openai |
llm | llm-only | both | - |
| Browser | anthropic |
llm | llm-only | both | - |
| Browser | google-genai |
llm | llm-only | both | - |
| Browser | langchain |
llm | llm-only | both | - |
| Browser | langgraph |
agents | agentic | both | - |
| Next.js | openai |
llm | llm-only | both | - |
| Next.js | anthropic |
llm | llm-only | both | - |
| Next.js | google-genai |
llm | llm-only | both | - |
| Next.js | langchain |
llm | llm-only | both | - |
| Next.js | vercel |
agents | agentic | - | - |
| Next.js | mastra |
agents | agentic | - | - |
| Python | openai |
llm | llm-only | both | sync/async |
| Python | anthropic |
llm | llm-only | both | sync/async |
| Python | langchain |
llm | llm-only | both | sync/async |
| Python | litellm |
llm | llm-only | both | sync/async |
| Python | openai-agents |
agents | agentic | - | async |
| Python | langgraph |
agents | agentic | - | sync/async |
| Python | pydantic-ai |
agents | agentic | - | async |
| Python | google-genai |
agents | agentic | - | sync/async |
| PHP (Laravel) | laravel |
agents | agentic | - | - |
| Cloudflare Workers | openai |
llm | llm-only | both | - |
| Cloudflare Workers | anthropic |
llm | llm-only | both | - |
| Cloudflare Workers | google-genai |
llm | llm-only | both | - |
| Cloudflare Workers | vercel |
agents | agentic | - | - |
| JavaScript (Node) | openai |
embeddings | embeddings | - | - |
| JavaScript (Node) | google-genai |
embeddings | embeddings | - | - |
| JavaScript (Node) | langchain |
embeddings | embeddings | - | - |
| JavaScript (Node) | vercel |
embeddings | embeddings | - | - |
| Browser | openai |
embeddings | embeddings | - | - |
| Browser | google-genai |
embeddings | embeddings | - | - |
| Browser | langchain |
embeddings | embeddings | - | - |
| Next.js | openai |
embeddings | embeddings | - | - |
| Next.js | google-genai |
embeddings | embeddings | - | - |
| Next.js | langchain |
embeddings | embeddings | - | - |
| Next.js | vercel |
embeddings | embeddings | - | - |
| Python | manual |
embeddings | embeddings | - | sync/async |
| Python | openai |
embeddings | embeddings | - | sync/async |
| Python | litellm |
embeddings | embeddings | - | sync/async |
| Python | langchain |
embeddings | embeddings | - | sync/async |
| Python | google-genai |
embeddings | embeddings | - | sync/async |
| Cloudflare Workers | openai |
embeddings | embeddings | - | - |
| Cloudflare Workers | google-genai |
embeddings | embeddings | - | - |
| Cloudflare Workers | vercel |
embeddings | embeddings | - | - |
| PHP (Laravel) | laravel |
embeddings | embeddings | - | - |
| Python | mcp |
mcp | mcp-server | - | async |
| Python | fastmcp |
mcp | mcp-server | - | async |
Test cases are TypeScript files in src/test-cases/ that define:
- name: Human-readable test name
- description: What the test validates
- type:
"llm","agent","embeddings", or"mcp"(determines which frameworks can run it) - inputs: Test input data (model, messages or input text)
- checks: Array of check functions that validate captured spans
| Test | Description |
|---|---|
Basic LLM Test |
Single completion with system message |
Multi Turn LLM Test |
Multi-turn conversation (3 turns) |
Basic Error LLM Test |
Tests API error handling |
Vision LLM Test |
Image input processing |
Long Input LLM Test |
Message trimming for large inputs (>20KB) |
| Test | Description |
|---|---|
Basic Agent Test |
Agent without tools (simple completion) |
Tool Call Agent Test |
Agent with successful tool calling |
Tool Error Agent Test |
Agent with tool that raises exception |
Vision Agent Test |
Agent that processes images |
Long Input Agent Test |
Agent with large input trimming |
| Test | Description |
|---|---|
Basic Embeddings Test |
Single embedding call with text input |
| Test | Description | Transport |
|---|---|---|
Basic MCP Tool Call Test |
Single tool call with parameters | stdio/sse |
MCP Tool Error Test |
Tool that raises an exception | stdio/sse |
MCP Multiple Tool Calls Test |
Multiple tools called in sequence | stdio/sse |
MCP Resource Read Test |
Reading a resource by URI | stdio/sse |
MCP Prompt Get Test |
Retrieving a prompt template | stdio/sse |
Test definitions use an explicit checks array with reusable check functions:
// src/test-cases/llm/basic.ts
import { TestDefinition } from "../../types.js";
import {
checkAISpanCount,
checkChatSpanAttributes,
checkValidTokenUsage,
checkInputMessagesSchema,
} from "../checks.js";
export const basicLLMTest: TestDefinition = {
name: "Basic LLM Test",
description: "Single completion call with system message",
type: "llm",
inputs: [
{
model: "gpt-4o-mini",
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "What is the capital of France?" },
],
},
],
checks: [
checkAISpanCount(1),
checkChatSpanAttributes,
checkValidTokenUsage,
checkInputMessagesSchema,
],
};Reusable check functions are defined in src/test-cases/checks.ts. Each check is an object with a name and fn:
interface Check {
name: string;
fn: (
spans: CapturedSpan[],
config: FrameworkConfig,
testDef: TestDefinition,
) => void;
}| Check | Description |
|---|---|
checkAISpanCount(n) |
Factory: validate AI span count (exact or min/max) |
checkChatSpanAttributes |
Validates chat/completion spans (model, messages) |
checkAgentSpanAttributes |
Validates agent invocation spans |
checkToolSpanAttributes |
Validates tool execution spans |
checkValidTokenUsage |
Token counts exist and are valid |
checkInputTokensCached |
Cached tokens ≤ input tokens |
checkOutputTokensReasoning |
Reasoning tokens ≤ output tokens |
checkInputMessagesSchema |
Validates message schema follows Sentry conventions |
checkAgentHierarchy |
Agent span hierarchy and name propagation |
checkAvailableTools |
Validates gen_ai.request.available_tools |
checkResponseToolCalls([]) |
Factory: validate tool calls in LLM response |
checkToolCalls([]) |
Factory: validate tool execution spans |
checkMessageTrimming |
Messages are trimmed below 15KB |
checkBinaryRedaction |
Binary content (images) is redacted |
checkResponseModel |
Warns when gen_ai.response.model is missing (warning) |
checkEmbeddingSpanAttributes |
Validates embedding spans (model, input, description) |
checkEmbeddingTokenUsage |
Embedding token usage (input_tokens, total_tokens) |
checkMCPSpanCount(n) |
Factory: correct number of MCP spans |
checkMCPToolSpanAttributes |
Tool spans have op=mcp.server, correct description |
checkMCPToolResult |
Tool result content exists, is_error is false |
checkMCPToolError |
Tool result is_error is true, span status=error |
checkMCPResourceSpanAttributes |
Resource spans with URI and protocol |
checkMCPPromptSpanAttributes |
Prompt spans with name and message count |
checkMCPServerAttributes |
Common MCP attributes (transport, session.id) |
checkMCPMultipleTools(expected) |
Factory: validates N tool spans with names |
The project uses sentry-conventions as a git submodule to dynamically track deprecated GenAI attributes. This ensures the test framework stays aligned with OpenTelemetry standards while maintaining backward compatibility.
- Dynamic Loading: The deprecation loader (
src/deprecation/loader.ts) scans thesentry-conventions/model/attributes/gen_ai/directory at runtime to identify deprecated attributes - Automatic Fallback: Checks use new OTEL attributes first, automatically falling back to legacy attributes if the new ones aren't present
- Non-Blocking Warnings: When legacy attributes are detected, deprecation warnings are logged (visible in console output) but tests continue to pass
- Graceful Degradation: If the submodule isn't available, the system continues to work with fallback behavior
The following attributes have been migrated to OpenTelemetry standards:
| Legacy Attribute | OTEL Replacement | Status |
|---|---|---|
gen_ai.request.messages |
gen_ai.input.messages |
Deprecated |
gen_ai.response.text |
gen_ai.output.messages |
Deprecated |
gen_ai.response.tool_calls |
gen_ai.output.messages (embedded) |
Deprecated |
gen_ai.request.available_tools |
gen_ai.tool.definitions |
Deprecated |
gen_ai.tool.input |
gen_ai.tool.call.arguments |
Pending deprecation |
gen_ai.tool.output |
gen_ai.tool.call.result |
Pending deprecation |
To pull the latest attribute definitions from sentry-conventions:
npm run update-conventions
npm run buildThe loader will automatically detect any new deprecations added to the conventions repository.
Each framework has a config.json file that defines its capabilities:
{
"name": "openai",
"displayName": "OpenAI JavaScript SDK",
"type": "llm-only",
"platform": "node",
"streamingMode": "both",
"dependencies": [{ "package": "openai", "version": "framework" }],
"versions": ["4.96.0"],
"sentryVersions": ["latest"]
}| Field | Description |
|---|---|
name |
Framework identifier |
displayName |
Human-readable name |
type |
"llm-only", "agentic", or "embeddings", or "mcp-server" |
platform |
"node", "python", "browser", "nextjs", "php", or "cloudflare" (CLI also accepts "js" as meta-platform for node + browser + cloudflare) |
streamingMode |
"streaming", "blocking", or "both" |
executionMode |
Python only: "sync", "async", or "both" |
transportMode |
MCP only: "stdio", "sse", or "both" |
dependencies |
NPM/uv packages to install |
versions |
Framework versions to test |
sentryVersions |
Sentry SDK versions to test against |
options |
Generic options expanding the test matrix (e.g., { "apiStyle": ["highlevel", "lowlevel"] }) |
modelOverrides |
Override model names for request/response validation |
skip |
Tests or checks to skip for this framework |
Frameworks can define options in their config.json to create additional test matrix dimensions. Each option key maps to an array of possible values. The cartesian product of all option values expands the test count.
{
"name": "my-framework",
"options": {
"apiStyle": ["highlevel", "lowlevel"]
}
}This doubles the test count — each test runs once per apiStyle value. Multiple options multiply further (e.g., 2 x 3 = 6x tests).
- In templates: Resolved option values are available as top-level template variables (e.g.,
{{ apiStyle }}) - In filenames: Option values are appended to the test filename (e.g.,
test-basic-...-highlevel.py) - CLI filtering: Use
--option key=value(repeatable) to run only specific option values:npm run test -- --framework my-framework --option apiStyle=highlevel
Available in src/test-cases/utils.ts:
| Function | Purpose |
|---|---|
skip(reason) |
Skip the current check with a reason |
skipIf(cond, reason) |
Conditionally skip a check |
extractGenAISpans() |
Filter spans for gen_ai.* operations |
findAgentSpans() |
Find invoke_agent spans |
findChatSpans() |
Find chat/completion spans |
findToolSpans() |
Find tool execution spans |
findEmbeddingSpans() |
Find embeddings spans |
extractMCPSpans() |
Filter spans for mcp.* operations |
findMCPToolSpans() |
Find tools/call MCP spans |
findMCPResourceSpans() |
Find resources/read MCP spans |
findMCPPromptSpans() |
Find prompts/get MCP spans |
assertAttributes() |
Schema-based attribute validation |
printSpanSummary() |
Debug helper to print captured spans |
The assertAttributes function supports:
true: Attribute must exist (any value)false: Attribute must NOT exist"pattern*": Wildcard pattern matching"exact"/123: Exact value match
assertAttributes(spans, {
"gen_ai.operation.name": true, // Must exist
"gen_ai.request.model": "gpt-4", // Exact match
"gen_ai.response.model": "gpt-4*", // Pattern match
sensitive_field: false, // Must NOT exist
});mkdir -p src/runner/templates/{llm|agents|mcp}/{node|python|browser|nextjs|php|cloudflare}/your-framework{
"name": "your-framework",
"displayName": "Your Framework SDK",
"type": "llm-only",
"platform": "node",
"streamingMode": "both",
"dependencies": [{ "package": "your-framework", "version": "framework" }],
"versions": ["1.0.0"],
"sentryVersions": ["latest"]
}Templates extend the base template and implement required blocks. Use base.node.njk for Node, base.py.njk for Python, base.browser.njk for browser, base.nextjs.njk for Next.js, or base.cloudflare.njk for Cloudflare Workers.
{% extends "base.node.njk" %}
{% block setup %}
let client;
{% endblock %}
{% block dynamic_imports %}
const SDK = (await import("your-framework")).default;
client = new SDK();
{% endblock %}
{% block test %}
{% for input in inputs %}
const response = await client.complete({
model: "{{ input.model }}",
messages: {{ input.messages | dump }},
});
console.log("Response:", response.content);
{% endfor %}
{% endblock %}npm run build
npm run test -- --framework your-framework --verbose// src/test-cases/llm/your-test.ts
import { TestDefinition } from "../../types.js";
import { checkAISpanCount, checkChatSpanAttributes } from "../checks.js";
export const yourTest: TestDefinition = {
name: "Your Test Name",
description: "What this test validates",
type: "llm", // or 'agent'
inputs: [
{
model: "gpt-4o-mini",
messages: [{ role: "user", content: "Test prompt" }],
},
],
checks: [checkAISpanCount({ min: 1 }), checkChatSpanAttributes],
};
export default yourTest;// src/test-cases/index.ts
import { yourTest } from "./llm/your-test.js";
export const testCases = {
llm: {
// ... existing tests
yourTest: yourTest,
},
};npm run build
npm run test -- --test "Your Test Name" --verboseinterface TestDefinition {
name: string;
description: string;
type: "llm" | "agent" | "embeddings" | "mcp";
inputs: TestInput[];
agent?: AgentDefinition; // For agent tests
mcpServer?: MCPServerDefinition; // For MCP tests
causeAPIError?: boolean; // Trigger API errors
checks: Check[]; // Array of check functions
}interface FrameworkConfig {
name: string;
platform: "node" | "python" | "browser" | "nextjs" | "php" | "cloudflare";
type: "llm-only" | "agentic" | "embeddings" | "mcp-server";
version: string;
sentryVersion: string;
templatePath?: string;
executionMode?: "sync" | "async" | "both";
streamingMode?: "streaming" | "blocking" | "both";
transportMode?: "stdio" | "sse" | "both";
options?: Record<string, string[]>; // Generic options expanding test matrix
resolvedOptions?: Record<string, string>; // Single values after matrix expansion
modelOverrides?: { request?: string; response?: string };
skip?: { tests?: string[]; checks?: { [testName: string]: string[] } };
}interface CapturedSpan {
span_id: string;
trace_id: string;
op: string;
description?: string;
start_timestamp: number;
timestamp: number;
data?: Record<string, any>;
tags?: Record<string, any>;
}All API keys should be in a root .env file (gitignored):
# .env
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_GENAI_API_KEY=...Use printSpanSummary() in your check methods:
import { printSpanSummary } from "../utils.js";
const debugCheck: Check = {
name: "debugCheck",
fn: (spans) => {
printSpanSummary(spans);
},
};npm run test -- --framework openai --verbosenpm run test -- --framework openai --live-statusnpm run test setup -- --framework openai
# Check runs/ directory for generated test filesEach test validates that Sentry captures:
- Performance tracing - Spans with proper timing and hierarchy
- AI monitoring data - Model name, token counts, operation names
- Error tracking - Exceptions with context (for error tests)
- Message handling - Proper schema, trimming, binary redaction
A test passes when:
- Test code runs without exceptions
- All check functions pass (or are skipped with reason)
- Required spans are captured with correct attributes
Mastra uses its own Sentry integration (@mastra/sentry) rather than @sentry/node. Key differences:
- Uses
SentryExporterwith Mastra'sObservabilitysystem - Attribute names follow newer OpenTelemetry conventions (
gen_ai.input.messagesinstead ofgen_ai.request.messages) - Template is standalone (does not extend base.node.njk)
LangGraph browser tests use a single langgraph framework folder with a generic variant option that expands the test matrix. Each variant isolates a specific instrumentation approach:
| Variant | Sentry API Used | Known issue |
|---|---|---|
graph |
Sentry.instrumentLangGraph() only |
No chat spans; streaming produces no invoke_agent span |
langchain |
Sentry.createLangChainCallbackHandler() only |
Chat spans missing token usage and input messages; agent spans missing gen_ai.agent.name |
combined |
Both APIs together | Duplicate invoke_agent spans; attribute gaps |
compiled |
instrumentLangGraph() on compiled graph |
Crashes with TypeError |
custom-state |
instrumentLangGraph() with custom state |
recordInputs/recordOutputs silently records nothing |
All variants run both streaming and blocking modes. Use --option variant=<name> to filter:
npm run test -- --framework langgraph --platform browser --option variant=graphLaravel uses a split-file template setup unique among the platforms:
- Uses
composer create-project laravel/laravelfor environment setup - Sentry is integrated via
sentry/sentry-laravel(Composer package) - AI functionality comes from
laravel/aipackage - Templates generate multiple PHP files: agent classes (
app/Ai/Agents/), tool classes (app/Ai/Tools/), and artisan commands (app/Console/Commands/) - Tests are executed via
php artisan test:<test-case-id>rather than running a script file directly - The
PhpRunnerhandles Composer project creation, dependency installation, and artisan command execution
The mcp framework tests Sentry's MCP server instrumentation using the official mcp Python package. Key differences from LLM/agent frameworks:
- Uses
sentry_sdk.integrations.mcp.MCPIntegrationinstead of auto-enabled AI integrations - Spans use
op: "mcp.server"withmcp.*attributes (notgen_ai.*) - No LLM API keys needed — tests are self-contained with in-process or local SSE servers
- Supports two transport modes:
stdio(in-process via memory streams) andsse(HTTP via uvicorn) - Uses the generic
optionssystem withapiStyle: ["highlevel", "lowlevel"]:- highlevel: Uses
mcp.server.fastmcp.FastMCPwith decorator-based tool/resource/prompt registration - lowlevel: Uses
mcp.server.lowlevel.Serverwith manual handler registration (@server.list_tools(),@server.call_tool(), etc.)
- highlevel: Uses
- Client uses
mcp.client.session.ClientSessionfor all modes - In-process (stdio) mode uses
anyio.create_memory_object_stream()for client-server communication - Resolved options (e.g.,
apiStyle) are exposed as top-level template variables (e.g.,{{ apiStyle }})
Cloudflare Workers use @sentry/cloudflare instead of @sentry/node. Key differences:
- Uses
Sentry.withSentry()handler wrapper instead ofSentry.init() - AI integrations use manual client instrumentation functions (
Sentry.instrumentOpenAiClient(client),Sentry.instrumentAnthropicAiClient(client),Sentry.instrumentGoogleGenAIClient(client)) rather than auto-enabled integrations - Only
Sentry.vercelAIIntegration()uses the integration-style API (added to theintegrationsarray) - API keys are accessed via
envparameter (from.dev.vars) rather thanprocess.env - The
CloudflareRunnermanages thewrangler devlifecycle: spawns the dev server, waits for ready, sends HTTP request to trigger the worker, then kills the process - Generated files include
wrangler.json(withnodejs_compatflag),.dev.vars(secrets), andpackage.json(withwranglerdev dependency)
- Sentry JavaScript SDK: https://github.com/getsentry/sentry-javascript
- Sentry Python SDK: https://github.com/getsentry/sentry-python
- Vercel AI SDK: https://sdk.vercel.ai/docs
- OpenAI Python SDK: https://github.com/openai/openai-python
- Mastra AI Framework: https://mastra.ai/docs
- MCP Python SDK: https://github.com/modelcontextprotocol/python-sdk
- Sentry MCP Integration: https://docs.sentry.io/platforms/python/integrations/mcp/