WIP | Add input and output versioning schema docs#225
WIP | Add input and output versioning schema docs#225nikhilNava wants to merge 1 commit intomainfrom
Conversation
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Snapshot WarningsEnsure that dependencies are being submitted on PR branches and consider enabling retry-on-snapshot-warnings. See the documentation for more information and troubleshooting advice. Scanned FilesNone |
There was a problem hiding this comment.
Pull request overview
Adds versioned documentation (JSON Schemas + reference docs/examples) for the gen_ai.input.messages and gen_ai.output.messages span attributes in the Observability Runtime docs.
Changes:
- Added JSON Schema definitions for A365 input and output message formats (based on OTel GenAI semconv v1.40.0).
- Added schema versioning/compatibility documentation and a changelog.
- Added example payloads for input/output message attributes.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
| src/Observability/Runtime/docs/schemas/a365-output-messages.json | Defines the output messages JSON schema (roles, parts, finish reasons). |
| src/Observability/Runtime/docs/schemas/a365-input-messages.json | Defines the input messages JSON schema (roles, parts). |
| src/Observability/Runtime/docs/schemas/SCHEMA-VERSION.md | Documents current schema version, OTel baseline/commit, and applicability. |
| src/Observability/Runtime/docs/schemas/EXAMPLES.md | Provides concrete JSON examples for input/output message attributes. |
| "type": { "const": "uri" }, | ||
| "mime_type": { "anyOf": [{ "type": "string" }, { "type": "null" }], "default": null }, | ||
| "modality": { "anyOf": [{ "$ref": "#/$defs/Modality" }, { "type": "string" }] }, | ||
| "uri": { "type": "string", "description": "URI referencing attached data." } |
There was a problem hiding this comment.
The uri field is intended to be a URI, but it’s currently only typed as string. Adding format: "uri" (or uri-reference) would make the schema validate examples/values more effectively.
| "uri": { "type": "string", "description": "URI referencing attached data." } | |
| "uri": { "type": "string", "format": "uri", "description": "URI referencing attached data." } |
| "type": { "const": "blob" }, | ||
| "mime_type": { "anyOf": [{ "type": "string" }, { "type": "null" }], "default": null, "description": "IANA MIME type." }, | ||
| "modality": { "anyOf": [{ "$ref": "#/$defs/Modality" }, { "type": "string" }], "description": "General modality of the data." }, | ||
| "content": { "type": "string", "format": "binary", "description": "Base64-encoded binary data." } |
There was a problem hiding this comment.
BlobPart.content is documented as base64-encoded data, but the schema uses format: "binary", which is not a portable way to express base64 in JSON Schema. Consider using contentEncoding: "base64" (and optionally contentMediaType) so validators can reliably enforce the encoding.
| "content": { "type": "string", "format": "binary", "description": "Base64-encoded binary data." } | |
| "content": { "type": "string", "contentEncoding": "base64", "description": "Base64-encoded binary data." } |
| "type": { "const": "uri" }, | ||
| "mime_type": { "anyOf": [{ "type": "string" }, { "type": "null" }], "default": null }, | ||
| "modality": { "anyOf": [{ "$ref": "#/$defs/Modality" }, { "type": "string" }] }, | ||
| "uri": { "type": "string", "description": "URI referencing attached data." } |
There was a problem hiding this comment.
The uri field is intended to be a URI, but it’s currently only typed as string. Adding format: "uri" (or uri-reference) would make the schema validate examples/values more effectively.
| "uri": { "type": "string", "description": "URI referencing attached data." } | |
| "uri": { "type": "string", "format": "uri", "description": "URI referencing attached data." } |
| { "type": "text", "content": "Summarize the attached document, check the weather in Seattle, and describe the whiteboard photo." }, | ||
| { | ||
| "type": "file", | ||
| "modality": "image", |
There was a problem hiding this comment.
This example sets modality to image while mime_type is application/pdf. That combination is confusing for readers and doesn’t match the modality meaning in the schema description; consider changing modality to a more appropriate value (or update the example to use an actual image MIME type).
| "modality": "image", | |
| "modality": "document", |
| "type": "blob", | ||
| "modality": "audio", | ||
| "mime_type": "audio/wav", | ||
| "content": "/9j/4AAQSkZJRg..." |
There was a problem hiding this comment.
The blob example declares mime_type: audio/wav and modality: audio, but the base64 prefix /9j/ is commonly associated with JPEG data. Consider updating either the MIME/modality or the sample base64 so the example is internally consistent.
| "content": "/9j/4AAQSkZJRg..." | |
| "content": "UklGRiQAAABXQVZFZm10IBAAAAABAAEAIlYAAESsAAACABAAZGF0YQAA..." |
| ### Complex — All Output Part Types | ||
|
|
||
| A response demonstrating every output part type: reasoning, text, tool_call, server_tool_call with response, and a custom generic part. Also shows multiple finish reasons. |
There was a problem hiding this comment.
This section claims the JSON demonstrates “every output part type” and includes “a custom generic part”, but the example shown does not include tool_call_response, file, or any custom GenericPart instance. Either adjust the claim/header, or extend the example to actually include the missing part types.
| ### Complex — All Output Part Types | |
| A response demonstrating every output part type: reasoning, text, tool_call, server_tool_call with response, and a custom generic part. Also shows multiple finish reasons. | |
| ### Complex — Multiple Output Part Types | |
| A response demonstrating several output part types: reasoning, text, server_tool_call with response, and image content as both blob and URI. |
| | A365 Scope | Uses This Schema? | Notes | | ||
| |---|---|---| | ||
| | `InvokeAgentScope` | ✅ | Full conversation context | | ||
| | `InferenceScope` | ✅ | Chat history + model response | | ||
| | `OutputScope` | ✅ | Outgoing messages to user | | ||
| | `ExecuteToolScope` | ❌ | Uses `gen_ai.tool.*` attributes instead | |
There was a problem hiding this comment.
The schema describes gen_ai.input.messages as a JSON array of structured message objects, but the current runtime scopes (e.g., InferenceScope/InvokeAgentScope/OutputScope) set this tag to a comma-separated string via string.Join. Either update this doc to clarify it’s a JSON-encoded value produced by specific instrumentations (and not by the built-in scopes today), or update the scopes/tests to emit JSON conforming to this schema.
| "type": { "const": "blob" }, | ||
| "mime_type": { "anyOf": [{ "type": "string" }, { "type": "null" }], "default": null, "description": "IANA MIME type." }, | ||
| "modality": { "anyOf": [{ "$ref": "#/$defs/Modality" }, { "type": "string" }], "description": "General modality of the data." }, | ||
| "content": { "type": "string", "format": "binary", "description": "Base64-encoded binary data." } |
There was a problem hiding this comment.
BlobPart.content is documented as base64-encoded data, but the schema uses format: "binary", which is not a portable way to express base64 in JSON Schema. Consider using contentEncoding: "base64" (and optionally contentMediaType) so validators can reliably enforce the encoding.
| "content": { "type": "string", "format": "binary", "description": "Base64-encoded binary data." } | |
| "content": { "type": "string", "contentEncoding": "base64", "description": "Base64-encoded binary data." } |
No description provided.