Self-Optimizing Multi-Agent AI System on AWS
An AWS-native, serverless multi-agent orchestration framework that dynamically routes heterogeneous tasks to specialized AI agents β optimizing for cost, quality, and latency β with built-in verification, self-correction, and vector-cached memory.
Quick Start β’ Architecture β’ Features β’ Deployment β’ API β’ Dashboard β’ Roadmap
Submit any task β Intelligent routing β Specialized agents β Verified output β Cached for reuse
Most LLM-powered applications send every request to a single expensive model, regardless of task complexity. This leads to:
| Issue | Impact |
|---|---|
| πΈ Wasted spend | Simple summarization tasks hit GPT-4βclass models |
| π Unnecessary latency | Heavyweight models process lightweight tasks |
| π Redundant computation | Identical tasks are reprocessed from scratch |
| β Silent failures | Bad outputs are returned without quality checks |
Agentic Mesh introduces a broker-worker-verifier architecture that:
1. Analyzes the task β understands type & complexity
2. Routes to the best agent β coding, research, or summarization
3. Verifies the output β LLM-as-a-Judge scores accuracy, completeness, relevance
4. Self-corrects failures β re-generates with a more capable model if quality < 7/10
5. Caches successes β embeds results in vector memory for future reuse
Result: Up to 60% cost reduction with higher reliability than single-model architectures.
| Feature | Description |
|---|---|
| π§ Intelligent Broker | Llama 3 8B analyzes task type & complexity in <100ms for pennies |
| π€ Specialized Agents | Dedicated Coder, Researcher, and Summarizer agents with tuned prompts |
| π‘οΈ Bedrock Guardrails | PII anonymization, content filtering, and prompt injection protection |
| π Shadow Verification | Every response is graded on accuracy, completeness, and relevance (1β10) |
| π Self-Correction Loop | Failed outputs are automatically regenerated with enhanced prompts |
| 𧬠Vector Memory | Titan Embeddings + OpenSearch cache solved tasks for instant reuse |
| π Cost Tracking | Per-invocation cost tracking with CloudWatch dashboards |
| π Observability | AWS X-Ray tracing, structured logging, and 6-widget CloudWatch dashboard |
| π₯οΈ Web Dashboard | Dark-mode glassmorphism UI with real-time pipeline visualization |
| π Zero Servers | 100% serverless β API Gateway, Lambda, Step Functions, SQS, DynamoDB |
graph TD
Client["π₯οΈ Client / Dashboard"]
API["π API Gateway"]
SQS["π¬ SQS Queue"]
Orch["βοΈ Orchestrator Lambda"]
SF["π Step Functions"]
Guard["π‘οΈ Guardrail Check"]
Broker["π§ Broker Agent<br/><small>Llama 3 8B</small>"]
Cache["𧬠Vector Memory<br/><small>OpenSearch + Titan Embeddings</small>"]
CacheHit{"Cache Hit?"}
Route{"Route to Worker"}
Coder["π» Coder Agent<br/><small>Claude Sonnet 4.5</small>"]
Researcher["π Research Agent<br/><small>Claude Sonnet 4.5</small>"]
Summarizer["π Summarizer Agent<br/><small>Claude Haiku 4.5</small>"]
Verify["π Verification Agent<br/><small>LLM-as-a-Judge</small>"]
Check{"Quality β₯ 7/10?"}
SelfCorrect["π Self-Correction<br/><small>Escalate to Sonnet 4.5</small>"]
Save["πΎ Save Results"]
DDB["ποΈ DynamoDB"]
CW["π CloudWatch"]
Client -->|POST /task| API
API --> SQS
SQS --> Orch
Orch --> SF
SF --> Guard
Guard -->|Safe| Broker
Guard -->|Blocked| Save
Broker -.->|Embedding lookup| Cache
Broker --> CacheHit
CacheHit -->|Yes| Verify
CacheHit -->|No| Route
Route -->|coding| Coder
Route -->|research| Researcher
Route -->|summarize| Summarizer
Coder --> Verify
Researcher --> Verify
Summarizer --> Verify
Verify --> Check
Check -->|Yes β
| Save
Check -->|No β| SelfCorrect
SelfCorrect --> Save
Save --> DDB
Save -.->|Embed + Index| Cache
Save -.->|Metrics| CW
style Client fill:#6366f1,stroke:#4f46e5,color:#fff
style Broker fill:#8b5cf6,stroke:#7c3aed,color:#fff
style Coder fill:#a78bfa,stroke:#8b5cf6,color:#fff
style Researcher fill:#818cf8,stroke:#6366f1,color:#fff
style Summarizer fill:#c4b5fd,stroke:#a78bfa,color:#000
style Verify fill:#22d3ee,stroke:#06b6d4,color:#000
style SelfCorrect fill:#f43f5e,stroke:#e11d48,color:#fff
style Cache fill:#34d399,stroke:#10b981,color:#000
style Save fill:#fbbf24,stroke:#f59e0b,color:#000
Client submits task
β
βΌ
βββββββββββββββββββ
β 1. GUARDRAIL βββββ PII detected? β Anonymize & continue
β CHECK βββββ Harmful content? β BLOCK task
ββββββββββ¬βββββββββ
β β
Safe
βΌ
βββββββββββββββββββ
β 2. BROKER βββββ Generate task embedding
β AGENT βββββ Search vector cache (similarity β₯ 0.85)
β (Llama 3 8B) βββββ Cache hit + quality β₯ 7? β Use cached answer
β βββββ No cache β Predict type & complexity
ββββββββββ¬βββββββββ
β
ββββββ΄βββββ¬βββββββββββββ
βΌ βΌ βΌ
ββββββββββ ββββββββββββ βββββββββββββ
β CODER β βRESEARCHERβ βSUMMARIZER β
βSonnet β β Sonnet β β Haiku β
β 4.5 β β 4.5 β β 4.5 β
βββββ¬βββββ ββββββ¬ββββββ βββββββ¬ββββββ
ββββββββββββββΌββββββββββββββ
βΌ
βββββββββββββββββββ
β 4. VERIFICATION βββββ Scores: Accuracy, Completeness, Relevance
β (LLM-as-Judge) βββββ Overall score β₯ 7/10 β β
PASS
β βββββ Score < 7/10 β β FAIL β Self-Correct
ββββββββββ¬βββββββββ
β
ββββββ΄ββββ Score < 7?
βΌ βΌ
ββββββββββ ββββββββββββββββ
β PASS β βSELF-CORRECTIONβ
β β β Re-generate β
β β β with enhanced β
β β β prompt + more β
β β β capable model β
βββββ¬βββββ ββββββββ¬ββββββββ
ββββββββββββββββββ
β
βΌ
βββββββββββββββββββ
β 5. SAVE βββββ Store in DynamoDB
β RESULTS βββββ Embed answer β Index in OpenSearch
β βββββ Publish CloudWatch metrics
βββββββββββββββββββ
Each agent is a Lambda function with a carefully tuned system prompt and model selection strategy:
| Agent | Model | Specialty | When Chosen |
|---|---|---|---|
| π» Coder | Claude Sonnet 4.5 | Code generation, debugging, algorithms, reviews | task_type == "coding" |
| π Researcher | Claude Sonnet 4.5 | Analysis, comparison, explanations, concepts | task_type == "research" |
| π Summarizer | Claude Haiku 4.5 | Condensing, key point extraction, reformatting | task_type == "summarize" |
| π§ Broker | Llama 3 8B Instruct | Task classification & routing decisions | Every incoming task |
| π Verifier | Claude Sonnet 4.5 | Quality scoring across 3 dimensions | Every agent response |
The Coder agent uses adaptive model selection based on predicted complexity:
COMPLEXITY_MODELS = {
"low": "claude-sonnet", # Sonnet 4.5 β fast, cost-efficient
"medium": "claude-3.5-sonnet", # Sonnet 4.5 β balanced
"high": "claude-3.5-sonnet", # Sonnet 4.5 β maximum capability
} βββββββββββββββββββββββ
β Incoming Task β
ββββββββββββ¬βββββββββββ
β
ββββββββββββββΌβββββββββββββ
β 1. Generate Embedding β
β (Titan V2) β
ββββββββββββββ¬βββββββββββββ
β
ββββββββββββββΌβββββββββββββ
β 2. Search Vector Cache ββββββ Similarity β₯ 0.85
β (OpenSearch) β AND quality β₯ 7.0
ββββββ¬βββββββββββββββ¬ββββββ β Cache Hit!
β β
No Cache Cache Hit
β β
ββββββββββββΌβββββββ ββββββΌβββββββββββββ
β 3. LLM Predict β β Return Cached β
β Type+Complexityβ β Answer + Agent β
β (Llama 3 8B) β ββββββββββββββββββββ
ββββββββββββ¬βββββββ
β
ββββββββββββΌβββββββββββ
β Route to Worker: β
β coding β Coder β
β research β Rsrch β
β summarize β Summ β
βββββββββββββββββββββββ
Agentic Mesh maintains a semantic memory of all successfully completed tasks:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β VECTOR MEMORY SYSTEM β
β β
β βββββββββββββ embed() ββββββββββββββββββββββββ β
β β New Task ββββββββββββββΆβ Amazon Titan β β
β β "Write β β Embed Text V2 β β
β β binary β β (1024-dim vectors) β β
β β search" β ββββββββββββ¬ββββββββββββ β
β βββββββββββββ β β
β βΌ β
β ββββββββββββββββββββββββββ β
β β OpenSearch Serverless β β
β β βββββββββββββββββββββ β β
β β Cosine Similarity KNN β β
β β β β
β β Index: task-success- β β
β β cache β β
β β β β
β β Fields: β β
β β β’ task_text β β
β β β’ task_embedding[] β β
β β β’ answer β β
β β β’ agent_used β β
β β β’ quality_score β β
β β β’ model_used β β
β β β’ timestamp β β
β ββββββββββββββββββββββββββ β
β β
β Threshold: similarity β₯ 0.85 AND quality β₯ 7.0 β
β Result: Skip worker + verification β instant answer β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Benefits:
- π Instant responses for previously-solved tasks (~0ms worker latency)
- π° Zero LLM cost on cache hits
- π Improving over time β the more tasks processed, the higher the hit rate
Agentic Mesh employs a multi-layered cost optimization approach:
| Layer | Strategy | Savings |
|---|---|---|
| 1. Smart Routing | Llama 3 8B (~$0.0003/call) routes instead of sending everything to Sonnet | ~70% on routing |
| 2. Model Tiering | Haiku 4.5 for summarization vs Sonnet 4.5 for coding | ~60% per task |
| 3. Complexity Matching | Low-complexity coding tasks use lighter models | ~30% per task |
| 4. Vector Cache | Identical/similar tasks return cached results instantly | 100% on hits |
| 5. Guardrail Blocking | Harmful tasks are blocked before reaching any model | 100% on blocked |
| Model | Input | Output | Use Case |
|---|---|---|---|
| Llama 3 8B Instruct | $0.0003 | $0.0006 | Broker routing decisions |
| Claude Haiku 4.5 | $0.0010 | $0.0050 | Summarization tasks |
| Claude Sonnet 4.5 | $0.0030 | $0.0150 | Coding, research, verification |
| Titan Embed Text V2 | $0.0002 | β | Task embeddings |
The system ships with a pre-built CloudWatch dashboard:
| Widget | Metrics |
|---|---|
| π Routing Distribution | Tasks per agent (coder/researcher/summarizer) |
| π° Cost per Agent | Running cost breakdown by agent type |
| β‘ Latency by Agent | p50/p95 latency for each worker |
| β Verification Scores | Quality score distribution over time |
| π§ Cache Hit Rate | Percentage of tasks served from vector memory |
| π Escalation Rate | How often self-correction is triggered |
A premium glassmorphism dark-mode dashboard is included:
python -m http.server 8080 --directory dashboard
# Open http://localhost:8080Features:
- π¬ Chat-like task submission interface
- π Animated Step Functions pipeline visualization
- π Real-time analytics (agent performance, quality rings, cost breakdown)
- π Filterable task history with detail modals
- π Toast notifications for task events
| Requirement | Version |
|---|---|
| Python | 3.10+ |
| AWS CLI | 2.x (configured with credentials) |
| AWS SAM CLI | 1.x |
| AWS Account | With Bedrock model access enabled |
Before deploying, enable the following models in the AWS Bedrock Console:
- β Meta Llama 3 8B Instruct
- β Anthropic Claude Haiku 4.5
- β Anthropic Claude Sonnet 4.5
- β Amazon Titan Embed Text V2
git clone https://github.com/yourusername/agentic-mesh.git
cd agentic-meshpip install -r requirements.txtsam build# Guided deployment (first time)
sam deploy --guided
# Subsequent deployments
sam deploy --no-confirm-changesetSAM will provision all resources:
- API Gateway (REST)
- 10 Lambda Functions
- Step Functions State Machine
- SQS Queue
- DynamoDB Table
- OpenSearch Serverless Collection
- CloudWatch Dashboard
- IAM Roles & Policies
- Bedrock Guardrail
After deployment, SAM outputs your API URL:
Key ApiEndpoint
Description API Gateway endpoint URL
Value https://xxxxxxxxxx.execute-api.us-east-1.amazonaws.com/Prod/task
POST /task
Content-Type: application/json
{
"task": "Write a Python function for binary search with error handling",
"type_hint": "coding" // Optional: "coding" | "research" | "summarize" | "auto"
}Response:
{
"task_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"status": "QUEUED",
"message": "Task submitted successfully"
}GET /task/{task_id}Response (completed):
{
"task_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"status": "SUCCESS",
"task": "Write a Python function for binary search with error handling",
"answer": "def binary_search(arr, target):\n if not arr:\n raise ValueError('Array cannot be empty')\n ...",
"agent": "coder",
"model": "us.anthropic.claude-sonnet-4-5-20250929-v1:0",
"quality_score": 8.5,
"cost_estimate": 0.0042,
"worker_latency_ms": 3200,
"cache_hit": false,
"escalated": false
}# Submit a coding task
curl -X POST https://YOUR_API/Prod/task \
-H "Content-Type: application/json" \
-d '{"task": "Write a Python function for binary search", "type_hint": "coding"}'
# Submit a research task
curl -X POST https://YOUR_API/Prod/task \
-H "Content-Type: application/json" \
-d '{"task": "Explain the differences between REST and GraphQL"}'
# Submit a summarization task
curl -X POST https://YOUR_API/Prod/task \
-H "Content-Type: application/json" \
-d '{"task": "Summarize the key principles of clean code", "type_hint": "summarize"}'
# Get result (poll until status != QUEUED)
curl https://YOUR_API/Prod/task/a1b2c3d4-e5f6-7890-abcd-ef1234567890# Submit task
$response = Invoke-RestMethod -Uri "https://YOUR_API/Prod/task" `
-Method POST -ContentType "application/json" `
-Body '{"task": "Write a merge sort in Python"}'
# Get result
$result = Invoke-RestMethod -Uri "https://YOUR_API/Prod/task/$($response.task_id)"
$result.answer# Clone and install
git clone https://github.com/yourusername/agentic-mesh.git
cd agentic-mesh
pip install -r requirements.txt
# Run tests
pytest tests/ -v
# Run specific test suite
pytest tests/test_cost_tracker.py -v
pytest tests/test_guardrails.py -vpython -m http.server 8080 --directory dashboardOpen http://localhost:8080 in your browser.
# Invoke a single function with a test event
sam local invoke GuardrailFunction --event events/guardrail_test.json
# Start local API for testing
sam local start-apiagentic-mesh/
βββ π template.yaml # SAM infrastructure-as-code (all AWS resources)
βββ π samconfig.toml # SAM deployment configuration
βββ π requirements.txt # Python dependencies
βββ π pyproject.toml # Project metadata
β
βββ π src/
β βββ π handlers/ # Lambda function handlers
β β βββ api_handler.py # REST API (POST /task, GET /task/{id})
β β βββ orchestrator.py # SQS β Step Functions trigger
β β βββ broker.py # π§ Broker Agent (routing decisions)
β β βββ guardrail_handler.py # π‘οΈ Bedrock Guardrail check
β β βββ worker_coder.py # π» Coding specialist
β β βββ worker_researcher.py # π Research specialist
β β βββ worker_summarizer.py # π Summarization specialist
β β βββ verification_agent.py # π LLM-as-a-Judge quality scoring
β β βββ self_correction.py # π Re-generation with enhanced prompts
β β βββ save_results.py # πΎ DynamoDB + Vector cache persistence
β β
β βββ π models/ # Shared model clients
β β βββ bedrock_client.py # Unified Bedrock invocation (Claude, Llama, Titan)
β β βββ cost_tracker.py # Per-model cost calculation + CloudWatch metrics
β β βββ vector_memory.py # OpenSearch Serverless KNN search & indexing
β β
β βββ π guardrails/ # Guardrail configurations
β βββ π observability/ # CloudWatch dashboard definitions
β βββ π state_machine/ # Step Functions ASL definition
β βββ definition.asl.json # Full state machine (13 states)
β
βββ π dashboard/ # Web Dashboard UI
β βββ index.html # Main page
β βββ css/style.css # Glassmorphism dark theme
β βββ js/app.js # API integration + real-time polling
β
βββ π tests/ # Test suite
β βββ test_cost_tracker.py # Cost calculation + model tier tests
β βββ test_guardrails.py # Guardrail behavior tests
β
βββ π events/ # Sample Lambda invocation events
The deployment automatically creates a CloudWatch dashboard at:
https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#dashboards:name=AgenticMeshDashboard
| Metric Namespace | Metric Name | Dimensions |
|---|---|---|
AgenticMesh |
TaskRouted |
Agent, CacheHit |
AgenticMesh |
TaskCost |
Agent, Model |
AgenticMesh |
WorkerLatency |
Agent |
AgenticMesh |
VerificationScore |
Agent |
AgenticMesh |
EscalationTriggered |
OriginalAgent |
AgenticMesh |
CacheHitRate |
β |
All Lambda functions are instrumented with AWS X-Ray through Powertools:
from aws_lambda_powertools import Tracer
tracer = Tracer(service="agentic-mesh")
@tracer.capture_lambda_handler
def lambda_handler(event, context):
...| Protection | Description |
|---|---|
| PII Anonymization | Automatically detects and masks personal data (names, emails, SSNs) |
| Content Filtering | Blocks harmful, toxic, or inappropriate content |
| Prompt Injection | Detects and neutralizes prompt injection attempts |
| Topic Blocking | Configurable topic deny-lists |
| Measure | Implementation |
|---|---|
| Least Privilege IAM | Each Lambda has scoped-down permissions |
| Encryption at Rest | DynamoDB + OpenSearch use AWS-managed keys |
| Encryption in Transit | All API calls use HTTPS/TLS 1.2+ |
| VPC Isolation | OpenSearch Serverless runs in managed VPC |
| CORS Protection | API Gateway configured with explicit allow-origins |
| Input Validation | Request body validation before processing |
| Metric | Value |
|---|---|
| Cold Start | ~2-3s (Lambda with Powertools) |
| Warm Latency | ~8-15s end-to-end (including LLM inference) |
| Cache Hit Latency | <1s (skip worker + verification) |
| Concurrent Tasks | Limited by Lambda concurrency (default 1000) |
| SQS Throughput | Up to 3,000 messages/second |
| DynamoDB | On-demand capacity β auto-scales to any load |
| OpenSearch | Serverless β auto-scales compute and storage |
Load Increases β Lambdas scale horizontally (auto)
β SQS absorbs burst traffic
β DynamoDB on-demand scales
β OpenSearch Serverless scales
β No provisioned capacity to manage
β Zero operational overhead
| Task Type | Model | Avg Latency | Avg Cost | Quality Score |
|---|---|---|---|---|
| Coding (simple) | Claude Sonnet 4.5 | ~5s | $0.003 | 8.2/10 |
| Coding (complex) | Claude Sonnet 4.5 | ~12s | $0.008 | 7.8/10 |
| Research | Claude Sonnet 4.5 | ~8s | $0.005 | 8.5/10 |
| Summarization | Claude Haiku 4.5 | ~3s | $0.001 | 8.0/10 |
| Cache Hit | β | <100ms | $0.000 | β₯7.0/10 |
| Broker Routing | Llama 3 8B | <1s | $0.0003 | β |
Benchmarks measured on
us-east-1with warm Lambda invocations. Your results may vary.
| Decision | Choice | Rationale |
|---|---|---|
| Orchestration | Step Functions over SQS choreography | Visual debugging, built-in retry/catch, state management |
| Broker Model | Llama 3 8B over Claude Haiku | 10x cheaper for routing β accuracy is sufficient for classification |
| Vector Store | OpenSearch Serverless over Pinecone | AWS-native, no external dependencies, serverless scaling |
| Queue | SQS over EventBridge | Simple FIFO semantics, built-in DLQ, SAM integration |
| Verification | LLM-as-a-Judge over heuristics | Generalizes across task types, provides natural-language feedback |
| Self-Correction | Single retry with escalation | Prevents infinite loops while improving quality |
| IaC | SAM over CDK/Terraform | Native Lambda support, simpler syntax, faster iterations |
| Dashboard | Vanilla HTML/CSS/JS over React | Zero build step, no node_modules, instant deployment to S3 |
- Core multi-agent orchestration
- Broker routing with Llama 3
- Vector cache with OpenSearch
- Shadow verification (LLM-as-a-Judge)
- Self-correction loop
- Bedrock Guardrails
- CloudWatch dashboard
- Web dashboard UI
- CORS support for dashboard
- π WebSocket streaming (real-time progress updates)
- π Multi-modal support (images + PDFs via Claude Vision)
- π Multi-step task chains (agent collaboration pipelines)
- π A/B model testing (shadow evaluator for model comparison)
- π¬ Conversation memory (multi-turn sessions)
- π SNS/Slack notifications on task completion
- π§ͺ Automated load testing + published benchmarks
- π S3 + CloudFront hosting for dashboard
- π Cognito authentication for API
- π± Mobile-responsive dashboard improvements
Contributions are welcome and greatly appreciated! Here's how to get started:
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Write your code following the existing patterns
- Test your changes:
pytest tests/ -v - Build with SAM:
sam build - Commit with a descriptive message:
git commit -m "feat: add amazing feature" - Push to your branch:
git push origin feature/amazing-feature - Open a Pull Request
- Follow PEP 8 for Python code
- Use type hints where possible
- Add docstrings to all functions
- Include structured logging with
aws_lambda_powertools.Logger - Add
@tracer.capture_lambda_handlerto all Lambda handlers
| Area | Difficulty | Impact |
|---|---|---|
| π§ͺ More test coverage | Easy | High |
| π Documentation improvements | Easy | Medium |
| π New worker agents (e.g., SQL, DevOps) | Medium | High |
| π WebSocket streaming | Medium | High |
| π Multi-modal support | Hard | High |
| π Agent collaboration chains | Hard | Very High |
This project is licensed under the MIT License β see the LICENSE file for details.
MIT License
Copyright (c) 2026 Agentic Mesh Contributors
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
- AWS Bedrock β Foundation model hosting
- AWS Lambda Powertools β Structured logging, tracing, and event handling
- OpenSearch β Vector similarity search
- Anthropic Claude β AI models powering the agents
- Meta Llama β Lightweight broker model
If you find this project useful, please consider giving it a β β it helps others discover the project!
Built with π§ by the Agentic Mesh community
