Skip to content

Latest commit

 

History

History
871 lines (652 loc) · 21.6 KB

File metadata and controls

871 lines (652 loc) · 21.6 KB

Agent Exchange - Production Deployment Guide

This guide covers deploying Agent Exchange to cloud platforms. Choose your preferred provider:

  • Google Cloud Platform (GCP) - Cloud Run, Firestore
  • Amazon Web Services (AWS) - ECS Fargate, DocumentDB

Table of Contents

  1. Overview
  2. GCP Deployment
  3. AWS Deployment
  4. CI/CD Pipelines
  5. Service Configuration
  6. Scaling
  7. Security Best Practices
  8. Troubleshooting
  9. Teardown

Overview

Architecture

Agent Exchange consists of 11 microservices:

Service Port Description
aex-gateway 8080 API gateway, routing
aex-work-publisher 8081 Work specification management
aex-bid-gateway 8082 Bid submission and storage
aex-bid-evaluator 8083 Bid evaluation and ranking
aex-contract-engine 8084 Contract lifecycle management
aex-provider-registry 8085 Provider registration
aex-trust-broker 8086 Trust score management
aex-identity 8087 Tenant and API key management
aex-settlement 8088 Financial transactions + AP2 integration
aex-telemetry 8089 Metrics and events
aex-credentials-provider 8090 NEW: AP2 payment methods management

Demo Components (Optional)

Component Port Description
legal-agent-a 8100 Budget Legal Agent ($5 + $2/page)
legal-agent-b 8101 Standard Legal Agent ($15 + $0.50/page)
legal-agent-c 8102 Premium Legal Agent ($30 + $0.20/page)
orchestrator 8103 Consumer orchestrator agent
payment-legalpay 8200 Payment processor (2% fee, 1% reward)
payment-contractpay 8201 Payment processor (2.5% fee, 3% reward)
payment-compliancepay 8202 Payment processor (3% fee, 4% reward)
demo-ui-nicegui 8502 Recommended: Real-time NiceGUI dashboard
demo-ui 8501 Legacy Mesop dashboard (deprecated)

Deployment Order

Deploy services in this order due to dependencies:

  1. Database: MongoDB
  2. Infrastructure services: aex-identity, aex-telemetry
  3. Core services: aex-provider-registry, aex-trust-broker, aex-credentials-provider
  4. Business services: aex-bid-gateway, aex-bid-evaluator, aex-work-publisher, aex-contract-engine, aex-settlement
  5. Gateway: aex-gateway
  6. Demo (optional): legal-agents, payment-agents, orchestrator, demo-ui-nicegui

GCP Deployment

Prerequisites (GCP)

Required Tools

# Google Cloud SDK
curl https://sdk.cloud.google.com | bash
gcloud init

# Docker
# Install from https://docs.docker.com/get-docker/

# Go 1.22+ (for local builds)
# Install from https://go.dev/dl/

Required Permissions

You need the following IAM roles on your GCP project:

  • roles/owner or these specific roles:
    • roles/run.admin
    • roles/artifactregistry.admin
    • roles/iam.serviceAccountAdmin
    • roles/secretmanager.admin
    • roles/datastore.owner
    • roles/logging.admin

Environment Variables

export GCP_PROJECT_ID="your-project-id"
export GCP_REGION="us-central1"

Setup (GCP)

1. Create or Select Project

# Create new project
gcloud projects create $GCP_PROJECT_ID --name="Agent Exchange"

# Or select existing project
gcloud config set project $GCP_PROJECT_ID

2. Enable Billing

Ensure billing is enabled in the GCP Console.

3. Run Setup Script

./hack/deploy/setup-gcp.sh

This script will:

  • Enable required APIs
  • Create Artifact Registry repository
  • Create service accounts with proper roles
  • Set up Workload Identity for GitHub Actions
  • Create Firestore database
  • Create initial secrets

4. Manual API Enablement (if needed)

gcloud services enable \
  run.googleapis.com \
  artifactregistry.googleapis.com \
  firestore.googleapis.com \
  secretmanager.googleapis.com \
  cloudresourcemanager.googleapis.com \
  iam.googleapis.com \
  iamcredentials.googleapis.com \
  logging.googleapis.com \
  monitoring.googleapis.com \
  cloudtrace.googleapis.com

Database (GCP)

Current Implementation: MongoDB (via Docker or MongoDB Atlas) Production Target: Firestore in Native mode (not yet migrated)

Option A: MongoDB (Current)

# For development/staging - MongoDB via Docker or Atlas
# Connection string in environment variable:
export MONGO_URI="mongodb://root:root@localhost:27017/?authSource=admin"
export MONGO_DB="aex"

Option B: Firestore (Future Production)

# Create database (for future Firestore migration)
gcloud firestore databases create \
  --location=$GCP_REGION \
  --type=firestore-native

Collections Structure

Collection Description
work_specs Work specifications
bids Provider bids
contracts Awarded contracts
providers Registered providers
subscriptions Category subscriptions
tenants Tenant accounts
api_keys API keys
trust_records Provider trust data
balances Account balances
transactions Financial transactions
ledger Ledger entries

Firestore Indexes (for future migration)

gcloud firestore indexes composite create \
  --collection-group=work_specs \
  --field-config field-path=consumer_id,order=ASCENDING \
  --field-config field-path=created_at,order=DESCENDING

gcloud firestore indexes composite create \
  --collection-group=bids \
  --field-config field-path=work_id,order=ASCENDING \
  --field-config field-path=created_at,order=ASCENDING

Secrets (GCP)

# JWT signing secret
echo -n "$(openssl rand -base64 32)" | \
  gcloud secrets create aex-jwt-secret --data-file=-

# API key salt
echo -n "$(openssl rand -base64 32)" | \
  gcloud secrets create aex-api-key-salt --data-file=-

Deploy (GCP)

Build and Push Images

# Authenticate Docker with Artifact Registry
gcloud auth configure-docker ${GCP_REGION}-docker.pkg.dev

# Build all images
make docker-build

# Tag and push
VERSION="v1.0.0"
REGISTRY="${GCP_REGION}-docker.pkg.dev/${GCP_PROJECT_ID}/aex"

for service in aex-gateway aex-work-publisher aex-bid-gateway aex-bid-evaluator \
               aex-contract-engine aex-provider-registry aex-trust-broker \
               aex-identity aex-settlement aex-telemetry; do
  docker tag agent-exchange/${service}:local ${REGISTRY}/${service}:${VERSION}
  docker push ${REGISTRY}/${service}:${VERSION}
done

Deploy Services

# Deploy to staging
./hack/deploy/deploy-cloudrun.sh staging all

# Deploy to production
./hack/deploy/deploy-cloudrun.sh production all

# Deploy specific service
./hack/deploy/deploy-cloudrun.sh production aex-gateway

Get Service URLs

for service in aex-gateway aex-work-publisher aex-bid-gateway aex-bid-evaluator \
               aex-contract-engine aex-provider-registry aex-trust-broker \
               aex-identity aex-settlement aex-telemetry; do
  URL=$(gcloud run services describe $service --region=$GCP_REGION --format='value(status.url)')
  echo "$service: $URL"
done

Monitoring (GCP)

Cloud Monitoring

View metrics in Cloud Monitoring:

# Key metrics
- cloud.run/request_count
- cloud.run/request_latencies
- cloud.run/container/instance_count
- cloud.run/container/cpu/utilizations
- cloud.run/container/memory/utilizations

Cloud Logging

# All AEX logs
gcloud logging read 'resource.type="cloud_run_revision" AND resource.labels.service_name=~"aex-.*"' \
  --limit=100 \
  --format="table(timestamp,resource.labels.service_name,textPayload)"

# Error logs only
gcloud logging read 'resource.type="cloud_run_revision" AND severity>=ERROR' \
  --limit=50

Rollback (GCP)

# List revisions
gcloud run revisions list --service=SERVICE_NAME --region=$GCP_REGION

# Route traffic to previous revision
gcloud run services update-traffic SERVICE_NAME \
  --to-revisions=REVISION_NAME=100 \
  --region=$GCP_REGION

# Gradual rollout (90/10 split)
gcloud run services update-traffic SERVICE_NAME \
  --to-revisions=NEW_REVISION=10,OLD_REVISION=90 \
  --region=$GCP_REGION

AWS Deployment

Prerequisites (AWS)

Required Tools

# AWS CLI v2
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install

# Docker
# https://docs.docker.com/engine/install/

# jq (for JSON processing)
sudo apt-get install jq  # Ubuntu/Debian
brew install jq          # macOS

AWS Account Setup

aws configure
# Enter your AWS Access Key ID
# Enter your AWS Secret Access Key
# Enter your preferred region (e.g., us-east-1)
# Enter output format (json)

Verify Access

aws sts get-caller-identity

Setup (AWS)

Environment Variables

export AWS_REGION="us-east-1"
export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)

Run Setup Script

chmod +x hack/deploy/setup-aws.sh
./hack/deploy/setup-aws.sh all

This creates:

  • ECR repositories for all 10 services
  • VPC with public/private subnets across 2 AZs
  • Security groups for ALB, ECS, and DocumentDB
  • ECS Fargate cluster
  • IAM roles for ECS tasks and GitHub Actions
  • Secrets Manager secrets
  • CloudWatch log groups
  • Application Load Balancer

Individual Setup Commands

./hack/deploy/setup-aws.sh ecr      # Create ECR repositories
./hack/deploy/setup-aws.sh vpc      # Create VPC and networking
./hack/deploy/setup-aws.sh ecs      # Create ECS cluster
./hack/deploy/setup-aws.sh iam      # Create IAM roles
./hack/deploy/setup-aws.sh secrets  # Create secrets
./hack/deploy/setup-aws.sh logs     # Create CloudWatch log groups
./hack/deploy/setup-aws.sh alb      # Create ALB

Database (AWS)

Option A: Amazon DocumentDB (Managed)

VPC_ID=$(aws ec2 describe-vpcs --filters "Name=tag:Name,Values=aex-vpc" \
  --query 'Vpcs[0].VpcId' --output text)

PRIVATE_SUBNETS=$(aws ec2 describe-subnets \
  --filters "Name=tag:Name,Values=aex-private-*" "Name=vpc-id,Values=$VPC_ID" \
  --query 'Subnets[*].SubnetId' --output text | tr '\t' ',')

DOCDB_SG=$(aws ec2 describe-security-groups \
  --filters "Name=group-name,Values=aex-docdb-sg" "Name=vpc-id,Values=$VPC_ID" \
  --query 'SecurityGroups[0].GroupId' --output text)

# Create subnet group
aws docdb create-db-subnet-group \
  --db-subnet-group-name aex-docdb-subnet-group \
  --db-subnet-group-description "Agent Exchange DocumentDB Subnet Group" \
  --subnet-ids ${PRIVATE_SUBNETS//,/ }

# Get password from Secrets Manager
DOCDB_PASSWORD=$(aws secretsmanager get-secret-value \
  --secret-id aex-docdb-password \
  --query SecretString --output text)

# Create DocumentDB cluster
aws docdb create-db-cluster \
  --db-cluster-identifier aex-docdb \
  --engine docdb \
  --master-username aexadmin \
  --master-user-password "$DOCDB_PASSWORD" \
  --vpc-security-group-ids "$DOCDB_SG" \
  --db-subnet-group-name aex-docdb-subnet-group

# Create instance
aws docdb create-db-instance \
  --db-instance-identifier aex-docdb-1 \
  --db-cluster-identifier aex-docdb \
  --db-instance-class db.r5.large \
  --engine docdb

Option B: MongoDB Atlas (External)

  1. Create a MongoDB Atlas cluster
  2. Configure VPC peering with your AWS VPC
  3. Store connection string in Secrets Manager:
aws secretsmanager create-secret \
  --name aex-mongo-uri \
  --secret-string "mongodb+srv://<USERNAME>:<PASSWORD>@<CLUSTER>.mongodb.net/aex"

Deploy (AWS)

Build and Push Images

# Login to ECR
aws ecr get-login-password --region $AWS_REGION | \
  docker login --username AWS --password-stdin $AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com

# Build and push all services
VERSION=v1.0.0 ./hack/deploy/deploy-ecs.sh build

Deploy Services

# Deploy all services to staging
./hack/deploy/deploy-ecs.sh staging all

# Deploy all services to production
./hack/deploy/deploy-ecs.sh production all

# Deploy a single service
./hack/deploy/deploy-ecs.sh staging aex-gateway

Verify Deployment

# Check service status
aws ecs describe-services \
  --cluster aex-cluster \
  --services aex-gateway aex-work-publisher \
  --query 'services[*].{name:serviceName,status:status,running:runningCount,desired:desiredCount}'

# Get ALB URL
ALB_DNS=$(aws elbv2 describe-load-balancers --names "aex-alb" \
  --query 'LoadBalancers[0].DNSName' --output text)
echo "Access at: http://$ALB_DNS"

# Test health endpoint
curl http://$ALB_DNS/health

Monitoring (AWS)

CloudWatch Logs

# View recent logs
aws logs tail /ecs/agent-exchange/aex-gateway --follow

# Search logs
aws logs filter-log-events \
  --log-group-name /ecs/agent-exchange/aex-gateway \
  --filter-pattern "ERROR"

CloudWatch Metrics

Metric Description Threshold
CPUUtilization CPU usage percentage Alert > 80%
MemoryUtilization Memory usage percentage Alert > 85%
RunningTaskCount Number of running tasks Alert if 0
HTTPCode_Target_5XX_Count 5xx errors Alert > 10/min
TargetResponseTime Response latency Alert > 2s

Rollback (AWS)

# Rollback to previous task definition
aws ecs update-service \
  --cluster aex-cluster \
  --service aex-gateway \
  --task-definition aex-gateway:PREVIOUS_REVISION

# Force new deployment
aws ecs update-service --cluster aex-cluster --service aex-gateway \
  --force-new-deployment

CI/CD Pipelines

GCP (Cloud Run)

  1. Add GitHub Secrets:

    • GCP_PROJECT_ID
    • GCP_REGION
    • GCP_WORKLOAD_IDENTITY_PROVIDER
    • GCP_SERVICE_ACCOUNT
  2. Deploy via tag:

git tag v1.0.0
git push origin v1.0.0

AWS (ECS Fargate)

  1. Add GitHub Secrets:

    • AWS_ROLE_ARN - arn:aws:iam::<account-id>:role/aex-github-actions-role
  2. Add GitHub Variables:

    • AWS_REGION - us-east-1
  3. Configure Environments:

    • aws-staging - For staging deployments
    • aws-production - For production (add required reviewers)
  4. Deploy via tag:

git tag -a v1.0.0 -m "Release v1.0.0"
git push origin v1.0.0

Service Configuration

Resource Allocation

Service GCP Memory GCP CPU AWS Memory AWS CPU
aex-gateway 1Gi 2 1024 MB 512
aex-work-publisher 512Mi 1 512 MB 256
aex-bid-gateway 512Mi 1 512 MB 256
aex-bid-evaluator 512Mi 1 512 MB 256
aex-contract-engine 512Mi 1 512 MB 256
aex-provider-registry 512Mi 1 512 MB 256
aex-trust-broker 512Mi 1 512 MB 256
aex-identity 512Mi 1 512 MB 256
aex-settlement 512Mi 1 512 MB 256
aex-telemetry 256Mi 1 512 MB 256

Environment Variables

All Services

Variable Description Default
PORT HTTP port 8080
ENVIRONMENT Environment name production
LOG_LEVEL Logging level info

Service-Specific

Service Variables
aex-gateway WORK_PUBLISHER_URL, BID_GATEWAY_URL, PROVIDER_REGISTRY_URL, SETTLEMENT_URL, IDENTITY_URL
aex-work-publisher STORE_TYPE, PROVIDER_REGISTRY_URL
aex-bid-gateway PROVIDER_REGISTRY_URL
aex-bid-evaluator BID_GATEWAY_URL, TRUST_BROKER_URL
aex-contract-engine BID_GATEWAY_URL, SETTLEMENT_URL

Scaling

GCP Auto-scaling

# High-throughput services (gateway)
--concurrency=250

# CPU-intensive services (bid-evaluator)
--concurrency=50

# Cold start optimization
--min-instances=1 --cpu-boost

AWS Auto-scaling

# Register scalable target
aws application-autoscaling register-scalable-target \
  --service-namespace ecs \
  --resource-id service/aex-cluster/aex-gateway \
  --scalable-dimension ecs:service:DesiredCount \
  --min-capacity 1 \
  --max-capacity 10

# Create scaling policy (target tracking)
aws application-autoscaling put-scaling-policy \
  --service-namespace ecs \
  --resource-id service/aex-cluster/aex-gateway \
  --policy-name cpu-tracking \
  --policy-type TargetTrackingScaling \
  --target-tracking-scaling-policy-configuration '{
    "TargetValue": 70.0,
    "PredefinedMetricSpecification": {
      "PredefinedMetricType": "ECSServiceAverageCPUUtilization"
    },
    "ScaleOutCooldown": 60,
    "ScaleInCooldown": 300
  }'

# Manual scaling
aws ecs update-service \
  --cluster aex-cluster \
  --service aex-gateway \
  --desired-count 5

Security Best Practices

Authentication

  • Disable unauthenticated access in production
  • Use service-to-service authentication
  • Validate API keys against Identity service
  • Implement rate limiting

Network Security

GCP

# VPC Connector for private networking
--vpc-connector=aex-connector
--vpc-egress=private-ranges-only

AWS

  • Services run in private subnets
  • ALB is the only public-facing component
  • Security groups restrict traffic between tiers

Secret Management

  • Never commit secrets to code
  • Use Secret Manager (GCP) or Secrets Manager (AWS)
  • Rotate secrets regularly
  • Use separate secrets per environment

Troubleshooting

GCP

Service Won't Start

gcloud run services logs read SERVICE_NAME --region=$GCP_REGION
gcloud run revisions describe REVISION_NAME --region=$GCP_REGION

Connection Refused

curl -H "Authorization: Bearer $(gcloud auth print-identity-token)" \
  https://target-service-xxx.run.app/health

AWS

Task Fails to Start

aws ecs describe-tasks \
  --cluster aex-cluster \
  --tasks $(aws ecs list-tasks --cluster aex-cluster --service-name aex-gateway \
    --desired-status STOPPED --query 'taskArns[0]' --output text) \
  --query 'tasks[0].stoppedReason'

Service Unhealthy

aws elbv2 describe-target-health \
  --target-group-arn $(aws elbv2 describe-target-groups \
    --names aex-gateway-tg --query 'TargetGroups[0].TargetGroupArn' --output text)

Database Connection Issues

aws ecs execute-command \
  --cluster aex-cluster \
  --task <task-id> \
  --container aex-gateway \
  --interactive \
  --command "/bin/sh"

Teardown

Validate Before Teardown

# GCP - validate what will be deleted
./hack/deploy/teardown-gcp.sh validate

# AWS - validate what will be deleted
./hack/deploy/teardown-aws.sh validate

Execute Teardown

# GCP - delete all resources
./hack/deploy/teardown-gcp.sh

# AWS - delete all resources
./hack/deploy/teardown-aws.sh

Make Targets

# GCP
make gcp-teardown

# AWS
make aws-teardown

Demo Deployment (Local Development)

The demo showcases the complete AEX + A2A + AP2 flow with legal agents and payment processors.

Quick Start Demo

cd demo

# Start everything
docker-compose up -d

# Access UI
open http://localhost:8502

Step-by-Step Demo (for presentations)

cd demo

# 1. Stop everything and clean up
docker-compose down -v

# 2. Start AEX infrastructure only (no agents)
docker-compose up -d mongo aex-identity aex-provider-registry aex-trust-broker \
  aex-bid-gateway aex-bid-evaluator aex-contract-engine aex-work-publisher \
  aex-settlement aex-credentials-provider aex-telemetry aex-gateway

# 3. Start UI without dependencies
docker-compose up -d --no-deps demo-ui-nicegui

# 4. Verify empty marketplace
curl -s http://localhost:8085/providers | jq '.total'  # Should be 0

# 5. Add agents one by one (during presentation)
docker-compose up -d legal-agent-a      # Budget Legal
docker-compose up -d legal-agent-b      # Standard Legal
docker-compose up -d legal-agent-c      # Premium Legal

# 6. Add payment agents
docker-compose up -d payment-legalpay payment-contractpay payment-compliancepay

# 7. Add orchestrator
docker-compose up -d orchestrator

# 8. Open browser and run the demo
open http://localhost:8502

Demo Verification Commands

# Check AEX health
curl http://localhost:8080/health

# Count registered providers
curl -s http://localhost:8085/providers | jq '.total'

# List provider names
curl -s http://localhost:8085/providers | jq '.providers[].name'

# Check agent card
curl -s http://localhost:8100/.well-known/agent.json | jq '{name, description}'

Demo UI Features

The NiceGUI demo interface (port 8502) provides:

  • Real-time agent registration display (auto-refreshes every 5 seconds)
  • Work submission form
  • Live bid collection and comparison
  • Contract award with configurable strategies (balanced, lowest price, best quality)
  • A2A execution visualization
  • AP2 payment processing with cashback rewards
  • Settlement summary with ledger updates

Support Resources

GCP

AWS


Last Updated: 2025-12-30