-
Notifications
You must be signed in to change notification settings - Fork 1k
Amazon Bedrock AgentCore Samples - [Agent with tools hosted on AWS ECS Fargate] #1191
Description
Describe the improvement request
Add a new example digital-preservation-agent under 04-infrastructure-as-code/cdk/typescript/ that demonstrates how to build a digital preservation agent using Amazon Bedrock AgentCore with multiple containerized file analysis tools running on ECS Fargate: Apache Tika, Siegfried, DROID, and MediaInfo. An AgentCore Gateway exposes all tools via MCP (Model Context Protocol), Lambda functions bridge tool calls to each service, and an AgentCore Runtime hosts a Strands agent that orchestrates analysis workflows.
This addresses a recurring community need — hosting heavy document processing tools on AWS has been a common question (see StackOverflow: Run Tika Server in AWS Lambda and AWS Post), and internally at AWS we have seen customers adopting this pattern as well. Tools like Tika, DROID (which requires a JVM), and MediaInfo have memory and startup requirements that make them a poor fit for Lambda, but a natural fit for Fargate behind an internal ALB. This example is also extensible — users can replace or add containerized tools of their choice using the same architecture pattern.
What are your suggestions?
The example deploys a complete stack using TypeScript CDK:
- A VPC with private subnets, NAT gateway, and S3 gateway endpoint
- Four ECS Fargate services behind an internal ALB with path-based routing:
- Apache Tika (
apache/tika:3.2.3.0-full) — text extraction, metadata extraction, MIME type detection; handles archives (ZIP, TAR) directly - Siegfried (
ghcr.io/keeps/siegfried:v1.10.1) — file format identification using the PRONOM registry - DROID (custom image,
eclipse-temurin:17-jrebase) — file format profiling via the Digital Record Object Identification tool - MediaInfo (custom image,
alpine:3.20base) — technical metadata analysis for audio, video, and image files
- Apache Tika (
- Six Python Lambda functions bridging AgentCore Gateway tool calls to the Fargate services and S3:
tika_handler— singletika_processtool (text, metadata, MIME detection)siegfried_handler—siegfried_identifytooldroid_handler—droid_profiletoolmediainfo_handler—mediainfo_analyzetoolextract_handler—extract_archivetool (ZIP/TAR extraction to S3)s3_report_handler—save_report_to_s3tool (persist analysis reports)
- An AgentCore Gateway with six MCP tool targets (one per Lambda)
- An AgentCore Runtime hosting a Strands agent (Claude 3 Haiku) that orchestrates multi-tool analysis workflows
- An S3 bucket for document uploads and analysis reports
The stack passes cdk-nag AwsSolutionsChecks with VPC flow logs, Container Insights, ALB access logs, S3 server access logs, restricted security groups, and explicit NagSuppressions for CDK-generated wildcard IAM permissions.
ALB Path-Based Routing
| Path Pattern | Target | Port |
|---|---|---|
/tika*, /detect/*, /meta* |
Apache Tika | 9998 |
/identify/* |
Siegfried | 5138 |
/api/* |
DROID | 8080 |
/mediainfo/* |
MediaInfo | 8081 |
Architecture
User
└──▶ AgentCore Runtime (Strands Agent, Claude 3 Haiku)
│ MCP
▼
AgentCore Gateway (6 tool targets)
│
▼
Lambda functions (tool bridges)
│
▼
Internal ALB (path-based routing)
├── /tika*, /detect/*, /meta* ──▶ ECS Fargate (Apache Tika :9998)
├── /identify/* ──▶ ECS Fargate (Siegfried :5138)
├── /api/* ──▶ ECS Fargate (DROID :8080)
└── /mediainfo/* ──▶ ECS Fargate (MediaInfo :8081)
│
▼
S3 Bucket (document uploads + reports)
Describe alternatives you've considered
- Running Tika/DROID directly in Lambda: Not viable due to memory footprint, JVM cold start times, and the 250 MB deployment package limit. The StackOverflow thread highlights this exact limitation. DROID requires a full JVM runtime with signature database files.
- Using EC2 instead of Fargate: Fargate is simpler to manage, scales to zero cost when idle (with desired count adjustments), and avoids patching overhead.
- Using Bedrock Agent action groups instead of AgentCore Gateway: The original Python CDK example uses this approach. AgentCore Gateway with MCP provides a more standardized tool protocol, better observability, and native integration with AgentCore Runtime.
- Using a Knowledge Base with a custom data source instead of on-demand tools: This would work for pre-indexed content but doesn't support on-demand processing of newly uploaded documents, which is the primary use case here.
- Single monolithic container with all tools: Would simplify deployment but increases image size, couples tool lifecycles, and prevents independent scaling. Separate containers allow each tool to scale based on its own resource needs.
Screenshots
N/A — this is a CDK infrastructure example deployed via CLI.
Additional context
- The pattern of "AgentCore Runtime → AgentCore Gateway (MCP) → Lambda → internal ALB → Fargate container" is generalizable beyond digital preservation. Any containerized tool (e.g., LibreOffice for document conversion, Tesseract for OCR, ffmpeg for media processing) could be swapped in using the same architecture.
- Apache Tika can process archive files (ZIP, TAR, etc.) directly without extraction — it recursively parses all contained files. Siegfried, DROID, and MediaInfo require individual files, so the
extract_archivetool is provided for batch processing archived collections. - DROID and MediaInfo container images are built with
--platform linux/amd64to ensure compatibility with ECS Fargate when building from Apple Silicon (ARM) machines. - Siegfried uses the pre-built
ghcr.io/keeps/siegfried:v1.10.1image directly from GHCR (no custom Dockerfile needed). - Hosting Apache Tika on AWS has been a recurring community question: https://stackoverflow.com/questions/45626105/run-tika-server-in-aws-lambda
- Internally at AWS, we have seen customers using this pattern to integrate heavy document processing tools with Bedrock Agents.
- The example follows TypeScript CDK conventions with
@aws-cdk/aws-bedrock-agentcore-alphaconstructs, cdk-nag compliance, and a comprehensive README with deployment and cleanup instructions.