Skip to content

[V1] Models - Guardrails: Latest Message Evaluation (Phase 2) #565

@github-actions

Description

@github-actions

Summary

Add latestMessage guardrail evaluation support to the Bedrock model provider. This feature allows evaluating only the latest user message with guardrails instead of the entire conversation. This is Phase 2 of the guardrails implementation (parent issue: #484).

Usage

import { BedrockModel } from '@strands-agents/sdk/models'

const model = new BedrockModel({
  modelId: 'us.anthropic.claude-sonnet-4-20250514-v1:0',
  guardrailConfig: {
    guardrailIdentifier: 'my-guardrail-id',
    guardrailVersion: '1',
    trace: 'enabled',
    // Only evaluate the latest user message
    latestMessage: true,
  },
})

Background

When latestMessage is enabled, only the most recent user message is sent to guardrails for evaluation instead of the entire conversation. This can:

  • Improve performance in multi-turn conversations
  • Reduce costs (fewer tokens evaluated)
  • Avoid re-evaluating messages that have already been validated

The implementation wraps the latest user message content in guardContent blocks, which signals to Bedrock's guardrails to evaluate only that content.

Implementation Requirements

1. Extended GuardrailConfig

Add latestMessage option to GuardrailConfig (in bedrock.ts):

interface GuardrailConfig {
  // ... existing options from Phase 1 ...
  
  /**
   * Only evaluate the latest user message with guardrails.
   * When true, wraps latest user message content in guardContent blocks.
   * This can improve performance and reduce costs in multi-turn conversations.
   * @default false
   */
  latestMessage?: boolean
}

2. Message Formatting for Latest Message

Update _formatMessages to wrap the latest user message in guardContent blocks when enabled:

private _formatMessages(messages: Message[]): BedrockMessage[] {
  const latestMessage = this._config.guardrailConfig?.latestMessage ?? false
  
  return messages.reduce<BedrockMessage[]>((acc, message, idx) => {
    const content = message.content
      .map((block) => {
        let formattedBlock = this._formatContentBlock(block)
        
        // Wrap in guardContent if this is the last user message and latestMessage is enabled
        if (
          latestMessage &&
          idx === messages.length - 1 &&
          message.role === 'user' &&
          formattedBlock
        ) {
          if ('text' in formattedBlock) {
            formattedBlock = {
              guardContent: {
                text: {
                  text: formattedBlock.text,
                  qualifiers: [] // or appropriate qualifiers
                }
              }
            }
          } else if ('image' in formattedBlock) {
            formattedBlock = {
              guardContent: {
                image: formattedBlock.image
              }
            }
          }
        }
        
        return formattedBlock
      })
      .filter((block) => block !== undefined)

    if (content.length > 0) {
      acc.push({ role: message.role, content })
    }

    return acc
  }, [])
}

3. Considerations

  • Only text and image content blocks should be wrapped in guardContent
  • Other content types (toolUse, toolResult, etc.) should pass through unchanged
  • The wrapping should only apply to the last message when role === 'user'
  • Existing GuardContentBlock in messages should be preserved as-is

Files to Modify

  1. src/models/bedrock.ts

    • Add latestMessage option to GuardrailConfig
    • Update _formatMessages to wrap latest user message content
  2. src/models/__tests__/bedrock.test.ts

    • Tests for latestMessage wrapping text content
    • Tests for latestMessage wrapping image content
    • Tests for latestMessage disabled (default behavior)
    • Tests for non-user messages not being wrapped
    • Tests for multi-turn conversations

Acceptance Criteria

  • latestMessage option added to GuardrailConfig
  • When latestMessage: true, latest user message text is wrapped in guardContent
  • When latestMessage: true, latest user message images are wrapped in guardContent
  • Non-user messages are not wrapped
  • Non-last messages are not wrapped
  • Default behavior (latestMessage: false) unchanged
  • Existing GuardContentBlock in messages preserved
  • Unit tests cover all scenarios
  • TSDoc comments updated

Reference

Dependencies

  • Requires Phase 1 completion: [V1] Models - Guardrails: Configuration & Redaction (Phase 1)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Design

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions