Skip to content

feat(models): add guardrails configuration and redaction support to Bedrock provider#573

Draft
Unshure wants to merge 5 commits intomainfrom
agent-tasks/564
Draft

feat(models): add guardrails configuration and redaction support to Bedrock provider#573
Unshure wants to merge 5 commits intomainfrom
agent-tasks/564

Conversation

@Unshure
Copy link
Member

@Unshure Unshure commented Feb 25, 2026

Summary

This PR implements proper handling of modelRedactContentEvent in the Agent SDK, enabling guardrail-based content redaction for both user input and assistant output.

Why This Change Is Needed

When Bedrock Guardrails detect content that should be blocked (either inappropriate user input or harmful assistant output), they emit modelRedactContentEvent events with redaction messages. The SDK needs to:

  1. Capture these events during streaming
  2. Replace blocked content with the redaction messages
  3. Maintain consistency in the message history

Public API Changes

StreamAggregatedResult Interface (model.ts)

The result returned by streamAggregated now includes an optional redactionMessage:

interface StreamAggregatedResult {
  message: Message
  stopReason: StopReason
  metadata?: ModelMetadataEvent
  redactionMessage?: string  // For user input redaction
}

ModelStopData Interface (events.ts)

The AfterModelCallEvent now includes redaction info:

interface ModelStopData {
  readonly message: Message
  readonly stopReason: StopReason
  readonly redactionMessage?: string  // For user input redaction
}

Architecture

Separation of Concerns

  1. Model Layer (streamAggregated):

    • Handles modelRedactContentEvent events
    • Directly redacts assistant output by updating the message content
    • Returns redactionMessage for agent to handle user input redaction
  2. Agent Layer (_stream):

    • Receives redactionMessage from model via invokeModel
    • Handles user message redaction using _redactLastMessage helper
    • Exposes redactionMessage via AfterModelCallEvent.stopData

Implementation Details

  • _redactLastMessage: Redacts the last message in the conversation history (preserves role)
  • Assistant output redaction is handled directly by the model
  • User input redaction info is passed to agent via redactionMessage

Testing

  • Tests verify redactionMessage is returned correctly for user input redaction
  • Tests verify assistant message is redacted directly by model
  • Tests verify model does NOT modify user messages (agent handles this)

Resolves #564

…edrock provider

- Add GuardrailConfig interface with guardrailIdentifier, guardrailVersion, trace, streamProcessingMode, and redaction options
- Add guardrailConfig to BedrockModelConfig for Bedrock model configuration
- Update _formatRequest to include guardrailConfig in Bedrock API requests
- Add ModelRedactContentEvent and ModelRedactContentEventData to streaming.ts
- Add _hasBlockedGuardrail method for recursive detection of blocked policies
- Add _generateRedactionEvents method for creating redaction events based on config
- Update metadata event handling to emit redaction events when guardrails block content
- Support both streaming and non-streaming modes
- Add comprehensive tests for guardrail configuration, request formatting, blocked detection, and redaction events

Resolves #564
@github-actions github-actions bot added the strands-running <strands-managed> Whether or not an agent is currently running label Feb 25, 2026
@github-actions
Copy link

Assessment: Request Changes

This PR adds a well-structured guardrails configuration API for Bedrock with good test coverage and documentation. The implementation follows existing patterns and handles both streaming and non-streaming modes correctly.

Review Summary
  • Missing Exports: The new public types (ModelRedactContentEvent, ModelRedactContentEventData, GuardrailConfig) need to be exported from src/index.ts for customers to properly type-check redaction events and configure guardrails.

  • API Design: The API follows a clean pattern with sensible defaults (redactInput: true, redactOutput: false). The PR description includes good documentation of use cases and API signatures.

Nice work on the comprehensive test coverage! 🎉

@github-actions github-actions bot removed the strands-running <strands-managed> Whether or not an agent is currently running label Feb 25, 2026
@Unshure
Copy link
Member Author

Unshure commented Feb 25, 2026

/strands how does this pull request update the agent.messages array? If a user message, or assistant message, is redacted, the message in the messages array should be updated as well with the new content. This can likely be done in the model.ts file, where if a redact event is encountered, it can overwrite the return value from the model.

@github-actions github-actions bot added the strands-running <strands-managed> Whether or not an agent is currently running label Feb 25, 2026
- Update _convert_to_class_event to handle modelRedactContentEvent
- Add redaction handling in streamAggregated to modify messages array
- When redactUserContentMessage is present, redact the last user message
- When redactAssistantContentMessage is present, replace assistant content blocks
- Add comprehensive tests for message redaction in model.test.ts

This completes the guardrails Phase 1 implementation by ensuring that when
guardrails block content, the messages array is updated with the redacted
content so consumers have access to the sanitized messages.
@github-actions
Copy link

I've implemented the message redaction support in streamAggregated as requested.

Changes Made

src/models/model.ts

  • Added import for ModelRedactContentEvent
  • Updated _convert_to_class_event to handle the new event type
  • Added redaction handling in streamAggregated:
    • User redaction: When redactUserContentMessage is present, finds and replaces the last user message in the messages array with a new Message containing the redaction text
    • Assistant redaction: When redactAssistantContentMessage is present, clears contentBlocks, adds the redaction message, and recreates stoppedMessage with the redacted content

src/models/__tests__/model.test.ts

Added 4 new tests:

  • Redacts last user message when redactUserContentMessage is present
  • Redacts assistant content when redactAssistantContentMessage is present
  • Redacts both user and assistant content when both are present
  • Finds and redacts only the last user message in multi-turn conversation

All 1033 tests pass.

@github-actions github-actions bot removed the strands-running <strands-managed> Whether or not an agent is currently running label Feb 25, 2026
@Unshure
Copy link
Member Author

Unshure commented Feb 26, 2026

/strands Im liking this better, but can you update the AggregatedStreamResult to include the redact content, and update the agent class to overwrite the latest message with this redacted content if its present?

@github-actions github-actions bot added the strands-running <strands-managed> Whether or not an agent is currently running label Feb 26, 2026
@github-actions github-actions bot added strands-running <strands-managed> Whether or not an agent is currently running and removed strands-running <strands-managed> Whether or not an agent is currently running labels Feb 26, 2026
- Model now only returns redactContent info without modifying messages
- Agent handles both user and assistant message redaction
- Simplifies model layer by keeping all message manipulation in agent
@github-actions github-actions bot removed the strands-running <strands-managed> Whether or not an agent is currently running label Feb 26, 2026
const { message, stopReason } = yield* this._streamFromModel(this.messages, streamOptions)
const { message, stopReason, redactContent } = yield* this._streamFromModel(this.messages, streamOptions)

const afterModelCallEvent = new AfterModelCallEvent({ agent: this, stopData: { message, stopReason } })
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add redactContent to AfterModelCallEvent?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done - added redactionMessage to ModelStopData interface and updated AfterModelCallEvent to include it in stopData.


// result.done is true, result.value contains the return value
return result.value
return {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can this still just be result.value?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done - simplified to return result.value.

Comment on lines 721 to 739
/**
* Redacts the last user message in the conversation history.
* Called when guardrails block user input and redaction is enabled.
*
* @param redactMessage - The redaction message to replace the user content with
*/
private _redactLastUserMessage(redactMessage: string): void {
// Find and redact the last user message
for (let i = this.messages.length - 1; i >= 0; i--) {
const message = this.messages[i]
if (message && message.role === 'user') {
this.messages[i] = new Message({
role: 'user',
content: [new TextBlock(redactMessage)],
})
break
}
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update this so that it just redacts the last message, not the last user message?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done - renamed to _redactLastMessage and simplified to redact the last message regardless of role.

Comment on lines 109 to 114
redactUserContentMessage?: string

/**
* Message to replace assistant output with when output was blocked by guardrails.
*/
redactAssistantContentMessage?: string
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can just be redactionMessage. We will only ever return this value if the redactUserContentMessage is set. If redactAssistantContentMessage event is returned from the underlying model, then we just update the message of the StreamAggregatedResult so there is no need to redact.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done - simplified to just redactionMessage. Assistant redaction is now handled directly in the model by updating the message content.

@Unshure Unshure marked this pull request as draft February 26, 2026 20:55
@Unshure
Copy link
Member Author

Unshure commented Feb 26, 2026

/strands

@github-actions github-actions bot added the strands-running <strands-managed> Whether or not an agent is currently running label Feb 26, 2026
- Add redactionMessage to ModelStopData and AfterModelCallEvent
- Simplify _streamFromModel to return result.value directly
- Rename _redactLastUserMessage to _redactLastMessage (redacts any last message)
- Simplify StreamAggregatedResult to use single redactionMessage field
- Model now handles assistant redaction directly, returns redactionMessage for user redaction
@github-actions github-actions bot removed the strands-running <strands-managed> Whether or not an agent is currently running label Feb 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[V1] Models - Guardrails: Configuration & Redaction (Phase 1)

2 participants