Skip to content

Amazon Bedrock AgentCore Samples - [Agent with tools hosted on AWS ECS Fargate] #1191

@darenwkt

Description

@darenwkt

Describe the improvement request

Add a new example digital-preservation-agent under 04-infrastructure-as-code/cdk/typescript/ that demonstrates how to build a digital preservation agent using Amazon Bedrock AgentCore with multiple containerized file analysis tools running on ECS Fargate: Apache Tika, Siegfried, DROID, and MediaInfo. An AgentCore Gateway exposes all tools via MCP (Model Context Protocol), Lambda functions bridge tool calls to each service, and an AgentCore Runtime hosts a Strands agent that orchestrates analysis workflows.

This addresses a recurring community need — hosting heavy document processing tools on AWS has been a common question (see StackOverflow: Run Tika Server in AWS Lambda and AWS Post), and internally at AWS we have seen customers adopting this pattern as well. Tools like Tika, DROID (which requires a JVM), and MediaInfo have memory and startup requirements that make them a poor fit for Lambda, but a natural fit for Fargate behind an internal ALB. This example is also extensible — users can replace or add containerized tools of their choice using the same architecture pattern.

What are your suggestions?

The example deploys a complete stack using TypeScript CDK:

  • A VPC with private subnets, NAT gateway, and S3 gateway endpoint
  • Four ECS Fargate services behind an internal ALB with path-based routing:
    • Apache Tika (apache/tika:3.2.3.0-full) — text extraction, metadata extraction, MIME type detection; handles archives (ZIP, TAR) directly
    • Siegfried (ghcr.io/keeps/siegfried:v1.10.1) — file format identification using the PRONOM registry
    • DROID (custom image, eclipse-temurin:17-jre base) — file format profiling via the Digital Record Object Identification tool
    • MediaInfo (custom image, alpine:3.20 base) — technical metadata analysis for audio, video, and image files
  • Six Python Lambda functions bridging AgentCore Gateway tool calls to the Fargate services and S3:
    • tika_handler — single tika_process tool (text, metadata, MIME detection)
    • siegfried_handlersiegfried_identify tool
    • droid_handlerdroid_profile tool
    • mediainfo_handlermediainfo_analyze tool
    • extract_handlerextract_archive tool (ZIP/TAR extraction to S3)
    • s3_report_handlersave_report_to_s3 tool (persist analysis reports)
  • An AgentCore Gateway with six MCP tool targets (one per Lambda)
  • An AgentCore Runtime hosting a Strands agent (Claude 3 Haiku) that orchestrates multi-tool analysis workflows
  • An S3 bucket for document uploads and analysis reports

The stack passes cdk-nag AwsSolutionsChecks with VPC flow logs, Container Insights, ALB access logs, S3 server access logs, restricted security groups, and explicit NagSuppressions for CDK-generated wildcard IAM permissions.

ALB Path-Based Routing

Path Pattern Target Port
/tika*, /detect/*, /meta* Apache Tika 9998
/identify/* Siegfried 5138
/api/* DROID 8080
/mediainfo/* MediaInfo 8081

Architecture

User
  └──▶ AgentCore Runtime (Strands Agent, Claude 3 Haiku)
          │  MCP
          ▼
        AgentCore Gateway (6 tool targets)
          │
          ▼
        Lambda functions (tool bridges)
          │
          ▼
        Internal ALB (path-based routing)
          ├── /tika*, /detect/*, /meta*  ──▶ ECS Fargate (Apache Tika :9998)
          ├── /identify/*               ──▶ ECS Fargate (Siegfried :5138)
          ├── /api/*                    ──▶ ECS Fargate (DROID :8080)
          └── /mediainfo/*             ──▶ ECS Fargate (MediaInfo :8081)
          │
          ▼
        S3 Bucket (document uploads + reports)

Describe alternatives you've considered

  • Running Tika/DROID directly in Lambda: Not viable due to memory footprint, JVM cold start times, and the 250 MB deployment package limit. The StackOverflow thread highlights this exact limitation. DROID requires a full JVM runtime with signature database files.
  • Using EC2 instead of Fargate: Fargate is simpler to manage, scales to zero cost when idle (with desired count adjustments), and avoids patching overhead.
  • Using Bedrock Agent action groups instead of AgentCore Gateway: The original Python CDK example uses this approach. AgentCore Gateway with MCP provides a more standardized tool protocol, better observability, and native integration with AgentCore Runtime.
  • Using a Knowledge Base with a custom data source instead of on-demand tools: This would work for pre-indexed content but doesn't support on-demand processing of newly uploaded documents, which is the primary use case here.
  • Single monolithic container with all tools: Would simplify deployment but increases image size, couples tool lifecycles, and prevents independent scaling. Separate containers allow each tool to scale based on its own resource needs.

Screenshots

N/A — this is a CDK infrastructure example deployed via CLI.

Additional context

  • The pattern of "AgentCore Runtime → AgentCore Gateway (MCP) → Lambda → internal ALB → Fargate container" is generalizable beyond digital preservation. Any containerized tool (e.g., LibreOffice for document conversion, Tesseract for OCR, ffmpeg for media processing) could be swapped in using the same architecture.
  • Apache Tika can process archive files (ZIP, TAR, etc.) directly without extraction — it recursively parses all contained files. Siegfried, DROID, and MediaInfo require individual files, so the extract_archive tool is provided for batch processing archived collections.
  • DROID and MediaInfo container images are built with --platform linux/amd64 to ensure compatibility with ECS Fargate when building from Apple Silicon (ARM) machines.
  • Siegfried uses the pre-built ghcr.io/keeps/siegfried:v1.10.1 image directly from GHCR (no custom Dockerfile needed).
  • Hosting Apache Tika on AWS has been a recurring community question: https://stackoverflow.com/questions/45626105/run-tika-server-in-aws-lambda
  • Internally at AWS, we have seen customers using this pattern to integrate heavy document processing tools with Bedrock Agents.
  • The example follows TypeScript CDK conventions with @aws-cdk/aws-bedrock-agentcore-alpha constructs, cdk-nag compliance, and a comprehensive README with deployment and cleanup instructions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions