This repo (hopefully) contains everything needed to test Sentry SDK AI integrations for Python and JavaScript.
Quick start and other goodies can be found in README.md.
The entire repo was made with Claude Code, and all of the major changes (like refactorings, adding SDKs, etc.) should be done by an agent. Most directories contain README.md files that the agent is instructed to read and update when needed. .claude/settings.json makes sure it can't read this file or your .env
- Test definitions (TypeScript) + framework templates (Nunjucks) = generated test files
- A span collector HTTP server acts as a mock Sentry endpoint
- Tests make real LLM calls and the Sentry SDK captures spans
- The validator runs check functions against captured spans
- Results are reported as CTRF JSON, HTML, and printed to the console
Assert that AI integrations:
- correctly initialize
- capture all relevant spans in correct order/hierarchy
- correctly capture available attributes (model, tokens, messages, tool calls)
- properly handle streaming vs blocking modes
- properly handle sync vs async execution (Python)
- trim long messages and redact binary content
Assert that:
- captured spans are accepted by Relay
- attributes added/derived during ingestion are present and correct (model cost and span buffer)
The test cases are defined once in TypeScript and then rendered for each framework using Nunjucks templates. While the templates differ between JS and Python, they aim to produce equivalent behavior. Framework-specific quirks are handled in templates and the skip configuration.
- Should be a matter of prompting an agent to do so
- Make sure to repeat that it should be consistent with the other SDKs
- Double-check if it wrote BS tests just to have them pass
- Make sure it DID NOT change any check functions or skip configurations to make the tests pass
- Make sure the package versions are pinned in
config.json
- The check functions should be written and double-checked by a human
- The case should be implemented for all SDKs where it makes sense
- Test cases are split by type:
llm(low-level SDKs) andagent(agentic frameworks) - Use existing check functions from
checks.tswhen possible - Add new checks to
checks.tsif they're reusable across tests
- Every framework has an independent Sentry SDK version specified in
config.json - The
sentryVersionsarray can include specific versions or"latest" - Framework versions are specified in the
versionsarray - Dependencies use
"framework"as version to inherit from the framework version