diff --git a/docs/DEPLOYMENT.md b/docs/DEPLOYMENT.md index b0726d05..f446616a 100644 --- a/docs/DEPLOYMENT.md +++ b/docs/DEPLOYMENT.md @@ -2,7 +2,7 @@ This guide walks you through deploying the Fullstack AgentCore Solution Template (FAST) to AWS. -> **Terraform alternative:** This guide covers CDK deployment. FAST also supports Terraform — see [`infra-terraform/README.md`](../infra-terraform/README.md) for the Terraform deployment guide. We recommend choosing one infrastructure tool and deleting the other directory (`infra-cdk/` or `infra-terraform/`) from your fork to keep things clean. +> **Terraform alternative:** This guide covers CDK deployment. FAST also supports Terraform -- see the [Terraform Deployment Guide](TERRAFORM_DEPLOYMENT.md) for the Terraform deployment guide. We recommend choosing one infrastructure tool and deleting the other directory (`infra-cdk/` or `infra-terraform/`) from your fork to keep things clean. ## Prerequisites diff --git a/docs/TERRAFORM_DEPLOYMENT.md b/docs/TERRAFORM_DEPLOYMENT.md new file mode 100644 index 00000000..d46607f9 --- /dev/null +++ b/docs/TERRAFORM_DEPLOYMENT.md @@ -0,0 +1,335 @@ +# Terraform Deployment Guide + +This guide walks you through deploying the Fullstack AgentCore Solution Template (FAST) to AWS using Terraform. + +> **CDK alternative:** This guide covers Terraform deployment. FAST also supports AWS CDK -- see [Deployment Guide](DEPLOYMENT.md) for the CDK deployment guide. We recommend choosing one infrastructure tool and deleting the other directory (`infra-cdk/` or `infra-terraform/`) from your fork to keep things clean. + +## Prerequisites + +Before deploying, ensure you have: + +- **Terraform** >= 1.5.0 (see [Install Terraform](https://developer.hashicorp.com/terraform/install)) +- **AWS CLI** configured with credentials (`aws configure`) - see [AWS CLI Configuration guide](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-quickstart.html) +- **Python 3.11+** (for the frontend deployment script) +- **Docker** (only required for `backend_deployment_type = "docker"`) - see [Install Docker Engine](https://docs.docker.com/engine/install/). Verify with `docker ps`. Alternatively, [Finch](https://github.com/runfinch/finch) can be used on Mac. See [below](#docker-cross-platform-build-setup-required-for-non-arm-machines) if you have a non-ARM machine. +- An AWS account with sufficient permissions to create: + - S3 buckets + - Cognito User Pools + - Amplify Hosting projects + - Bedrock AgentCore resources + - IAM roles and policies + +## Configuration + +### 1. Create Configuration File + +```bash +cd infra-terraform +cp terraform.tfvars.example terraform.tfvars +``` + +Edit `terraform.tfvars` to customize your deployment: + +```hcl +stack_name_base = "your-project-name" # Base name for all resources (max 35 chars) + +admin_user_email = "admin@example.com" # Optional: auto-creates user & emails credentials +``` + +**Important**: +- Change `stack_name_base` to a unique name for your project to avoid conflicts +- Maximum length is 35 characters (due to AWS AgentCore runtime naming constraints) + +#### Required Variables + +| Variable | Description | +|----------|-------------| +| `stack_name_base` | Base name for all resources | + +#### Optional Variables + +| Variable | Description | Default | +|----------|-------------|---------| +| `admin_user_email` | Email for Cognito admin user | `null` | +| `backend_pattern` | Agent pattern to deploy | `"strands-single-agent"` | +| `backend_deployment_type` | `"docker"` (ECR container) or `"zip"` (S3 package) | `"docker"` | +| `backend_network_mode` | Network mode (PUBLIC/VPC) | `"PUBLIC"` | +| `backend_vpc_id` | VPC ID (required when VPC mode) | `null` | +| `backend_vpc_subnet_ids` | Subnet IDs (required when VPC mode) | `[]` | +| `backend_vpc_security_group_ids` | Security group IDs (optional for VPC mode) | `[]` | + +**Region:** Set via the `AWS_REGION` environment variable or AWS CLI profile (`aws configure`). The Terraform provider uses the standard AWS SDK resolution chain -- no region variable is needed. + +**Tags:** The provider applies default tags (Project, ManagedBy, Repository) to all resources automatically. Add custom tags directly in the provider's `default_tags` block in `main.tf`. + +### Deployment Types + +Set `backend_deployment_type` in `terraform.tfvars` to `"docker"` (default) or `"zip"`. See [Deployment Types](DEPLOYMENT.md#deployment-types) in the main Deployment Guide for guidance on choosing between them. + +**Terraform-specific notes:** +- ZIP mode does not require Docker installed locally (unlike CDK, where Docker is always needed) +- **ZIP packaging includes**: The `patterns//`, `patterns/utils/`, and `tools/` directories are bundled together with dependencies from `requirements.txt` + +### Deployment into existing VPC + +By default, the AgentCore Runtime runs in PUBLIC network mode with internet access. To deploy the runtime into an existing VPC for private network isolation, set `backend_network_mode = "VPC"` and provide your VPC details: + +```hcl +backend_network_mode = "VPC" +backend_vpc_id = "vpc-0abc1234def56789a" +backend_vpc_subnet_ids = ["subnet-aaaa1111bbbb2222c", "subnet-cccc3333dddd4444e"] +backend_vpc_security_group_ids = ["sg-0abc1234def56789a"] # Optional +``` + +The `backend_vpc_id` and `backend_vpc_subnet_ids` fields are required when using VPC mode. The `backend_vpc_security_group_ids` field is optional -- if omitted, a default security group is created with HTTPS (TCP 443) self-referencing ingress and all-traffic egress. + +For detailed VPC prerequisites -- including required VPC endpoints, subnet requirements, NAT Gateway guidance, and security group configuration -- see [VPC Deployment](DEPLOYMENT.md#vpc-deployment-private-network) in the main Deployment Guide. + +**Important:** AgentCore Runtime availability is limited to specific Availability Zones per region. Verify your subnets are in supported AZs before deploying. See [AWS documentation on supported Availability Zones](https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/agentcore-vpc.html#agentcore-supported-azs) for details. + +## Deployment Steps + +### TL;DR +```bash +cd infra-terraform +cp terraform.tfvars.example terraform.tfvars +# Edit terraform.tfvars with your configuration +terraform init +terraform apply +python scripts/deploy-frontend.py +``` + +### 1. Initialize Terraform + +```bash +cd infra-terraform +terraform init +``` + +### 2. Deploy Infrastructure + +Build and deploy the complete stack: + +```bash +terraform apply +``` + +The deployment will: + +1. Create Amplify Hosting app and S3 staging bucket +1. Create a Cognito User Pool with web and machine clients +1. Create AgentCore Memory for persistent conversations +1. Set up OAuth2 Credential Provider for Runtime-to-Gateway authentication +1. Create AgentCore Gateway with Lambda tool targets +1. Build and deploy the AgentCore Runtime (Docker image or ZIP package) +1. Create the Feedback API (API Gateway + Lambda + DynamoDB) +1. Store configuration in SSM Parameters + +- **Docker mode** (default): Automatically builds an ARM64 Docker image, pushes to ECR, and creates the runtime. Requires Docker to be running locally. +- **Zip mode**: Deploys a packager Lambda that bundles your agent code with ARM64 wheels, uploads to S3, and creates the runtime. No Docker required. + +> **Note:** If you provide a pre-built image via `container_uri`, Terraform skips the build and uses your image directly. + +### 3. Deploy Frontend + +```bash +# From infra-terraform directory +python scripts/deploy-frontend.py +``` + +This script automatically: + +- Generates fresh `aws-exports.json` from Terraform outputs (see [below](#understanding-aws-exportsjson) for more information) +- Installs/updates npm dependencies if needed +- Builds the frontend +- Deploys to AWS Amplify Hosting + +You will see the URL for the application in the script's output, which will look similar to this: + +``` +App URL: https://main.d123abc456def7.amplifyapp.com +``` + +A shell alternative is also available for macOS/Linux: +```bash +./scripts/deploy-frontend.sh +``` + +### 4. Create a Cognito User (if necessary) + +**If you provided `admin_user_email` in config:** + +- Check your email for temporary password +- Sign in and change password on first login + +**If you didn't provide email:** + +1. Go to the [AWS Cognito Console](https://console.aws.amazon.com/cognito/) +2. Find your User Pool (named `{stack_name_base}-user-pool`) +3. Click on the User Pool +4. Go to "Users" tab +5. Click "Create user" +6. Fill in the user details: + - **Email**: Your email address + - **Temporary password**: Create a temporary password + - **Mark email as verified**: Check this box +7. Click "Create user" + +### 5. Access the Application + +1. Open the Amplify Hosting URL in your browser +1. Sign in with the Cognito user you created +1. You'll be prompted to change your temporary password on first login + +## Post-Deployment + +### Updating the Application + +To update the frontend code: + +```bash +# From infra-terraform directory +python scripts/deploy-frontend.py +``` + +To update the backend agent: + +```bash +cd infra-terraform +terraform apply +``` + +Terraform detects code changes automatically and rebuilds/redeploys the runtime. After a backend update that replaces the runtime, redeploy the frontend to pick up the new Runtime ARN: + +```bash +python scripts/deploy-frontend.py +``` + +#### Manual Docker Build (Optional) + +If you prefer to build the Docker image separately (e.g., in CI/CD): +```bash +./scripts/build-and-push-image.sh +``` + +**Options:** +```bash +./scripts/build-and-push-image.sh -h # Show help +./scripts/build-and-push-image.sh -p langgraph-single-agent # Use LangGraph pattern +./scripts/build-and-push-image.sh -s my-stack -r us-west-2 # Override stack/region +``` + +### Verify Deployment + +```bash +# Get deployment summary +terraform output deployment_summary + +# Get all outputs +terraform output +``` + +### Test the Agent (Optional) + +```bash +# From infra-terraform directory +pip install boto3 requests colorama # First time only +python scripts/test-agent.py 'Hello, what can you do?' +``` + +### Monitoring and Logs + +- **Frontend logs**: Check Amplify build logs in the AWS Console +- **Backend logs**: Check CloudWatch logs for the AgentCore runtime +- **Feedback API logs**: Check CloudWatch logs for the feedback Lambda + +## Cleanup + +To remove all resources: + +```bash +cd infra-terraform +terraform destroy +``` + +Terraform handles resource dependencies automatically and destroys in the correct order. + +**Warning**: This will delete all data including Cognito users, S3 buckets, DynamoDB tables, and ECR images. + +### Verify Cleanup + +After destroy completes, verify no resources remain: +```bash +aws resourcegroupstaggingapi get-resources --tag-filters Key=Project,Values= +``` + +## Troubleshooting + +### Common Issues + +1. **`terraform apply` fails with Docker errors** + + - Ensure Docker is installed and the daemon is running: `docker ps` + - On Mac, open Docker Desktop or start Finch: `finch vm start` + - On Linux: `sudo systemctl start docker` + - If using `backend_deployment_type = "zip"`, Docker is not required + +2. **"Architecture incompatible" or "exec format error" during Docker build** + + - This occurs when deploying from a non-ARM machine without cross-platform build setup + - Follow the [Docker Cross-Platform Build Setup](#docker-cross-platform-build-setup-required-for-non-arm-machines) instructions below + - Ensure you've installed QEMU emulation: `docker run --privileged --rm tonistiigi/binfmt --install all` + - Verify ARM64 support: `docker buildx ls` should show `linux/arm64` in platforms + +3. **Terraform Init Fails** + + Ensure you have the correct provider versions: + ```bash + terraform init -upgrade + ``` + +4. **Authentication errors** + + Verify AWS credentials: + ```bash + aws sts get-caller-identity + ``` + + Also verify you created a Cognito user and that the user's email is verified. + +5. **"Agent Runtime ARN not configured" or 404 errors** + + - Ensure the backend deployed successfully + - Redeploy the frontend to pick up the latest Runtime ARN: + ```bash + python scripts/deploy-frontend.py + ``` + - Verify SSM parameters match Terraform outputs: + ```bash + terraform output runtime_arn + ``` + +6. **Permission errors** + - Verify your AWS credentials have sufficient permissions + - Check IAM roles created by the stack + +### Getting Help + +- Check CloudWatch logs for detailed error messages +- Review `terraform output` for resource identifiers +- Ensure all prerequisites are met + +## Security Considerations + +See [Security Considerations](DEPLOYMENT.md#security-considerations) in the main Deployment Guide. Additionally, consider deploying in [VPC mode](#deployment-into-existing-vpc) for network isolation. + +## Docker Cross-Platform Build Setup (Required for non-ARM machines) + +Bedrock AgentCore Runtime only supports ARM64 architecture. If you're deploying from a non-ARM machine (x86_64/amd64), you need to enable Docker's cross-platform building capabilities. See [Docker Cross-Platform Build Setup](DEPLOYMENT.md#docker-cross-platform-build-setup-required-for-non-arm-machines) in the main Deployment Guide for setup instructions. + + +## Understanding aws-exports.json + +The `aws-exports.json` file provides the frontend with Cognito authentication configuration. See [Understanding aws-exports.json](DEPLOYMENT.md#understanding-aws-exports-json) in the main Deployment Guide for details on its purpose and structure. + +For Terraform deployments, the file is generated by `deploy-frontend.py` which fetches configuration from `terraform output -json` (rather than CDK stack outputs). You should not manually edit this file as it's regenerated on each deployment. diff --git a/infra-terraform/.terraform.lock.hcl b/infra-terraform/.terraform.lock.hcl index 90bf94e5..95f2df69 100644 --- a/infra-terraform/.terraform.lock.hcl +++ b/infra-terraform/.terraform.lock.hcl @@ -22,25 +22,25 @@ provider "registry.terraform.io/hashicorp/archive" { } provider "registry.terraform.io/hashicorp/aws" { - version = "6.28.0" - constraints = ">= 5.82.0" + version = "6.35.1" + constraints = ">= 5.82.0, >= 6.35.1" hashes = [ - "h1:RwoFuX1yGMVaKJaUmXDKklEaQ/yUCEdt5k2kz+/g08c=", - "zh:0ba0d5eb6e0c6a933eb2befe3cdbf22b58fbc0337bf138f95bf0e8bb6e6df93e", - "zh:23eacdd4e6db32cf0ff2ce189461bdbb62e46513978d33c5de4decc4670870ec", - "zh:307b06a15fc00a8e6fd243abde2cbe5112e9d40371542665b91bec1018dd6e3c", - "zh:37a02d5b45a9d050b9642c9e2e268297254192280df72f6e46641daca52e40ec", - "zh:3da866639f07d92e734557d673092719c33ede80f4276c835bf7f231a669aa33", - "zh:480060b0ba310d0f6b6a14d60b276698cb103c48fd2f7e2802ae47c963995ec6", - "zh:57796453455c20db80d9168edbf125bf6180e1aae869de1546a2be58e4e405ec", - "zh:69139cba772d4df8de87598d8d8a2b1b4b254866db046c061dccc79edb14e6b9", - "zh:7312763259b859ff911c5452ca8bdf7d0be6231c5ea0de2df8f09d51770900ac", - "zh:8d2d6f4015d3c155d7eb53e36f019a729aefb46ebfe13f3a637327d3a1402ecc", - "zh:94ce589275c77308e6253f607de96919b840c2dd36c44aa798f693c9dd81af42", + "h1:xD+5zPhF0ry3sutriARfFVIg5m38VwYt66RveI3aUyI=", + "zh:0a16d1b0ba9379e5c5295e6b3caa42f0b8ba6b9f0a7cc9dbe58c232cf995db2d", + "zh:4b2e69907a1a2c557e45ef590f9fd6187ab5bf90378346ba9f723535e49ce908", + "zh:56bdafda0d629e15dc3dd9275b54f1fb953e2e09a3bc1a34e027da9d03ea4893", + "zh:5b84e933989150249036f84faad221dce0daa9d3043ff24401547e18f00b121e", + "zh:70bac98c27a14cb2cedabd741a1f7f1bab074c127efdcf02b54dbcf0d03db3cc", + "zh:7184f48bd077eaf68e184fd44f97e2d971cb77c59a68aedb95a0f8dc01b134fe", + "zh:7367589ae8b584bfcd83c973f5003e15010a453349c017a0d2cca8772d4fcfd9", + "zh:7ec9699dee49dd31bbc2d0e50fa1fff451eee5c1d9fd59bca7412acb49ce6594", + "zh:92dd139b96977a64af0e976cd06e84921033678ab97550f1b687c0ea54a8e82c", "zh:9b12af85486a96aedd8d7984b0ff811a4b42e3d88dad1a3fb4c0b580d04fa425", - "zh:adaceec6a1bf4f5df1e12bd72cf52b72087c72efed078aef636f8988325b1a8b", - "zh:d37be1ce187d94fd9df7b13a717c219964cd835c946243f096c6b230cdfd7e92", - "zh:fe6205b5ca2ff36e68395cb8d3ae10a3728f405cdbcd46b206a515e1ebcf17a1", + "zh:9f2df575a5b010db60068668c48806595a3d617a2c0305035283fe8b72f07b19", + "zh:a4602b7602c75c8f726bdc7e706dc5c26736e47cc8381be01386aa8d8d998403", + "zh:bc25fefeeee10425df7aebfc21dc6532d19acdf03fa97b9e6d8c113adffd0a1d", + "zh:f445592040b5fc368a12e6edeffc951b2eb41e86413c4074638a13376e25a9cc", + "zh:ff43962a48bd8f85e17188736bbd3c145b6a1320bd8303221f6b4f9ec861e1e6", ] } @@ -65,22 +65,22 @@ provider "registry.terraform.io/hashicorp/null" { } provider "registry.terraform.io/hashicorp/random" { - version = "3.8.0" + version = "3.8.1" constraints = ">= 3.5.0" hashes = [ - "h1:BYpqK2+ZHqNF9sauVugKJSeFWMCx11I/z/1lMplwUC0=", - "zh:0e71891d8f25564e8d0b61654ed2ca52101862b9a2a07d736395193ae07b134b", - "zh:1c56852d094161997df5fd8a6cbca7c6c979b3f8c3c00fbcc374a59305d117b1", - "zh:20698fb8a2eaa7e23c4f8e3d22250368862f578cf618be0281d5b61496cbef13", - "zh:3afbdd5e955f6d0105fed4f6b7fef7ba165cd780569483e688002108cf06586c", - "zh:4ce22b96e625dc203ea653d53551d46156dd63ad79e45bcbe0224b2e6357f243", - "zh:4ff84b568ad468d140f8f6201a372c6c4bea17d64527b72e341ae8fafea65b8e", - "zh:54b071cb509203c43e420cc589523709bbc6e65d80c1cd9384f5bd88fd1ff1a2", - "zh:63fc5f9f341a573cd5c8bcfc994a58fa52a5ad88d2cbbd80f5a9f143c5006e75", - "zh:73cb8b39887589914686d14a99b4de6e85e48603f7235d87da5594e3fbb7d8a7", + "h1:u8AKlWVDTH5r9YLSeswoVEjiY72Rt4/ch7U+61ZDkiQ=", + "zh:08dd03b918c7b55713026037c5400c48af5b9f468f483463321bd18e17b907b4", + "zh:0eee654a5542dc1d41920bbf2419032d6f0d5625b03bd81339e5b33394a3e0ae", + "zh:229665ddf060aa0ed315597908483eee5b818a17d09b6417a0f52fd9405c4f57", + "zh:2469d2e48f28076254a2a3fc327f184914566d9e40c5780b8d96ebf7205f8bc0", + "zh:37d7eb334d9561f335e748280f5535a384a88675af9a9eac439d4cfd663bcb66", + "zh:741101426a2f2c52dee37122f0f4a2f2d6af6d852cb1db634480a86398fa3511", "zh:78d5eefdd9e494defcb3c68d282b8f96630502cac21d1ea161f53cfe9bb483b3", - "zh:7ee20f28aa6a25539a5b9fc249e751dec5a5b130dcd73c5d05efdf4d5e320454", - "zh:994a83fddab1d44a8f546920ed34e45ea6caefe4f08735bada6c28dc9010e5e4", + "zh:a902473f08ef8df62cfe6116bd6c157070a93f66622384300de235a533e9d4a9", + "zh:b85c511a23e57a2147355932b3b6dce2a11e856b941165793a0c3d7578d94d05", + "zh:c5172226d18eaac95b1daac80172287b69d4ce32750c82ad77fa0768be4ea4b8", + "zh:dab4434dba34aad569b0bc243c2d3f3ff86dd7740def373f2a49816bd2ff819b", + "zh:f49fd62aa8c5525a5c17abd51e27ca5e213881d58882fd42fec4a545b53c9699", ] } diff --git a/infra-terraform/README.md b/infra-terraform/README.md index de28f5e3..eb85e12b 100644 --- a/infra-terraform/README.md +++ b/infra-terraform/README.md @@ -2,7 +2,7 @@ This directory contains Terraform configurations for deploying the Fullstack AgentCore Solution Template (FAST). -> **Note:** All commands and scripts in this README run from the `infra-terraform/` directory. This folder is self-contained and independent from the CDK deployment (`infra-cdk/`). +> **Deployment guide:** For step-by-step deployment instructions, see [Terraform Deployment Guide](../docs/TERRAFORM_DEPLOYMENT.md). This README covers module architecture, configuration reference, and developer documentation. ## Architecture @@ -13,108 +13,60 @@ The infrastructure is organized into 3 Terraform modules, mirroring the CDK stac 3. **Backend** (`modules/backend/`) - All AgentCore and API resources: - AgentCore Memory - Persistent memory for agent conversations - M2M Authentication - Cognito resource server and machine client + - OAuth2 Credential Provider - Lambda for Runtime -> Gateway authentication - AgentCore Gateway - MCP gateway with Lambda tool targets - AgentCore Runtime - ECR repository and containerized agent runtime - Feedback API - API Gateway + Lambda + DynamoDB - SSM Parameters and Secrets Manager -## Prerequisites - -1. **Terraform** >= 1.5.0 -2. **AWS CLI** configured with appropriate credentials -3. **Docker** (only required for `deployment_type = "docker"`) - -## Deployment Types - -FAST supports two deployment types for the AgentCore Runtime: - -| | Docker (default) | Zip | -|---|---|---| -| **How it works** | Builds a Docker container image and pushes to ECR | Packages Python code + ARM64 wheels via Lambda and uploads to S3 | -| **Requires Docker** | Yes | No | -| **Best for** | Custom runtime images, complex dependencies | Quick deployment, CI/CD, environments without Docker | - -Set `deployment_type` in your `terraform.tfvars`: -```hcl -deployment_type = "docker" # or "zip" -``` - ## Quick Start ```bash -# Navigate to the terraform directory cd infra-terraform - -# Copy the example variables file cp terraform.tfvars.example terraform.tfvars - # Edit terraform.tfvars with your configuration -# At minimum, set admin_user_email for the Cognito admin user - -# Initialize Terraform terraform init -``` - -### Deploy -```bash terraform apply +python scripts/deploy-frontend.py ``` -- **Docker mode** (default): Builds an ARM64 Docker image, pushes to ECR, and creates the runtime. Requires Docker to be running locally. -- **Zip mode**: Deploys a packager Lambda that bundles your agent code with ARM64 wheels, uploads to S3, and creates the runtime. No Docker required. - -> **Note:** If you provide a pre-built image via `container_uri`, Terraform skips the build and uses your image directly. - -### Manual Docker Build (Optional) - -If you prefer to build the Docker image separately (e.g., in CI/CD), you can use the build script: -```bash -./scripts/build-and-push-image.sh -``` - -**Options:** -```bash -./scripts/build-and-push-image.sh -h # Show help -./scripts/build-and-push-image.sh -p langgraph-single-agent # Use LangGraph pattern -./scripts/build-and-push-image.sh -s my-stack -r us-west-2 # Override stack/region -``` - -### (Optional) Verify Deployment -```bash -terraform output deployment_summary -``` +See the [Terraform Deployment Guide](../docs/TERRAFORM_DEPLOYMENT.md) for detailed instructions, VPC deployment, troubleshooting, and cleanup. -## Configuration +## Configuration Reference -### Required Variables - -| Variable | Description | Default | -|----------|-------------|---------| -| `stack_name_base` | Base name for all resources | `"fast"` | -| `aws_region` | AWS region for deployment | `"us-east-1"` | - -### Optional Variables +### Variables | Variable | Description | Default | |----------|-------------|---------| +| `stack_name_base` | Base name for all resources (required) | - | | `admin_user_email` | Email for Cognito admin user | `null` | | `backend_pattern` | Agent pattern to deploy | `"strands-single-agent"` | -| `deployment_type` | `"docker"` (ECR container) or `"zip"` (S3 package) | `"docker"` | -| `agent_name` | Name for the agent runtime | `"StrandsAgent"` | -| `network_mode` | Network mode (PUBLIC/PRIVATE) | `"PUBLIC"` | -| `environment` | Environment name for tagging | `"dev"` | -| `memory_event_expiry_days` | Memory event TTL in days | `30` | +| `backend_deployment_type` | `"docker"` (ECR container) or `"zip"` (S3 package) | `"docker"` | +| `backend_network_mode` | Network mode (PUBLIC/VPC) | `"PUBLIC"` | +| `backend_vpc_id` | VPC ID (required when VPC mode) | `null` | +| `backend_vpc_subnet_ids` | Subnet IDs (required when VPC mode) | `[]` | +| `backend_vpc_security_group_ids` | Security group IDs (optional for VPC mode) | `[]` | -### VPC Configuration (Private Mode) +**Region:** Set via the `AWS_REGION` environment variable or AWS CLI profile. No region variable is needed. -For `PRIVATE` network mode, provide VPC details: +**Tags:** Default tags (Project, ManagedBy, Repository) are applied automatically via the provider's `default_tags` block in `main.tf`. -```hcl -network_mode = "PRIVATE" -vpc_id = "vpc-xxxxxxxx" -private_subnet_ids = ["subnet-xxx", "subnet-yyy"] -security_group_ids = ["sg-xxxxxxxx"] -``` +### CDK config.yaml to Terraform Variable Mapping + +Terraform uses flat variables with a `backend_` prefix to mirror the CDK's nested `config.yaml` structure: + +| CDK config.yaml path | Terraform variable | +|---|---| +| `stack_name_base` | `stack_name_base` | +| `admin_user_email` | `admin_user_email` | +| `backend.pattern` | `backend_pattern` | +| `backend.deployment_type` | `backend_deployment_type` | +| `backend.network_mode` | `backend_network_mode` | +| `backend.vpc.vpc_id` | `backend_vpc_id` | +| `backend.vpc.subnet_ids` | `backend_vpc_subnet_ids` | +| `backend.vpc.security_group_ids` | `backend_vpc_security_group_ids` | + +Values that are hardcoded in CDK (not in `config.yaml`) are defined as module-internal locals in Terraform: agent name (`StrandsAgent`), memory event expiry (30 days), callback URLs, and password minimum length. ## Module Structure @@ -147,6 +99,7 @@ infra-terraform/ ├── artifacts/ # Build artifacts (.gitignored) ├── memory.tf # AgentCore Memory + IAM ├── auth.tf # M2M resource server + machine client + ├── oauth2_provider.tf # OAuth2 provider Lambda + lifecycle management ├── gateway.tf # Gateway + Lambda tool target ├── runtime.tf # ECR/S3 + Agent Runtime (conditional) ├── zip_packager.tf # S3 + Lambda packager (zip mode only) @@ -154,7 +107,7 @@ infra-terraform/ └── ssm.tf # SSM parameters + Secrets Manager ``` -> **Note:** Feedback Lambda source code is shared from `infra-cdk/lambdas/feedback/`. The zip-packager Lambda is Terraform-specific and lives under `infra-terraform/lambdas/`. +> **Note:** Feedback and OAuth2 provider Lambda code is shared from `infra-cdk/lambdas/`. The zip-packager Lambda is Terraform-specific and lives under `infra-terraform/lambdas/`. ## Deployment Order @@ -164,68 +117,28 @@ The modules are deployed in this order: 2. **Cognito** - Uses Amplify URL for OAuth callback URLs 3. **Backend** - Depends on Cognito and Amplify URL; internally creates Memory, Auth, Gateway, Runtime, Feedback API, and SSM resources with correct dependency ordering -## Post-Deployment Steps - -### 1. Deploy Frontend - -Two deployment scripts are available: - -**Python (cross-platform - recommended):** -```bash -# From infra-terraform directory -python scripts/deploy-frontend.py - -# Or with options -python scripts/deploy-frontend.py --pattern langgraph-single-agent -``` - -**Shell (macOS/Linux only):** -```bash -# From infra-terraform directory -./scripts/deploy-frontend.sh - -# Or with options -./scripts/deploy-frontend.sh -p langgraph-single-agent -``` - -Both scripts perform the same operations: -- Fetch configuration from Terraform outputs -- Generate `aws-exports.json` for frontend authentication -- Build the Next.js application -- Package and upload to S3 -- Trigger Amplify deployment and monitor status - -### 2. Test the Agent (Optional) - -```bash -# From infra-terraform directory -pip install boto3 requests colorama # First time only -python scripts/test-agent.py 'Hello, what can you do?' -``` - -### 3. Verify Deployment - -```bash -# Get deployment summary -terraform output deployment_summary - -# Get all outputs -terraform output -``` - ## Outputs | Output | Description | |--------|-------------| | `amplify_app_url` | Frontend application URL | -| `cognito_hosted_ui_url` | Cognito login page URL | +| `amplify_app_id` | Amplify App ID | +| `amplify_staging_bucket` | S3 bucket for frontend staging deployments | +| `cognito_user_pool_id` | Cognito User Pool ID | +| `cognito_web_client_id` | Cognito Web Client ID (for frontend) | +| `cognito_machine_client_id` | Cognito Machine Client ID (for M2M authentication) | +| `cognito_domain_url` | Cognito domain URL for OAuth | +| `gateway_id` | AgentCore Gateway ID | +| `gateway_arn` | AgentCore Gateway ARN | | `gateway_url` | AgentCore Gateway URL | +| `gateway_target_id` | AgentCore Gateway Target ID | +| `tool_lambda_arn` | Sample tool Lambda function ARN | +| `runtime_id` | AgentCore Runtime ID | | `runtime_arn` | AgentCore Runtime ARN | +| `runtime_role_arn` | AgentCore Runtime execution role ARN | | `memory_arn` | AgentCore Memory ARN | | `feedback_api_url` | Feedback API endpoint | -| `ecr_repository_url` | ECR repository for agent container (docker mode) | -| `agent_code_bucket` | S3 bucket for agent code (zip mode) | -| `deployment_type` | Deployment type used (docker or zip) | +| `ssm_parameter_prefix` | SSM parameter prefix for this deployment | | `deployment_summary` | Combined summary of all resources | ## State Management @@ -270,55 +183,6 @@ See `backend.tf.example` for the full configuration. | SSM Parameter | `aws_ssm_parameter` | | Secrets Manager | `aws_secretsmanager_secret` | -## Troubleshooting - -### Terraform Init Fails - -Ensure you have the correct provider versions: -```bash -terraform init -upgrade -``` - -### Authentication Errors - -Verify AWS credentials: -```bash -aws sts get-caller-identity -``` - -### AgentCore Resources Not Found - -AgentCore resources require AWS provider version >= 5.82.0 with the `aws_bedrockagentcore_*` resources. - -If your provider version doesn't support these resources yet, use the AWS CLI: - -```bash -aws bedrock-agentcore create-agent-runtime --cli-input-json file://runtime-config.json -``` - -## Cleanup - -To remove all provisioned resources: - -```bash -terraform destroy -``` - -Terraform handles resource dependencies automatically and destroys in the correct order. - -**Note:** All Cognito users and their data will be permanently deleted. - -### Verify Cleanup - -After destroy completes, verify no resources remain: -```bash -aws resourcegroupstaggingapi get-resources --tag-filters Key=stack,Values= -``` - -### Cost Note - -Ensure `terraform destroy` completes successfully. Orphaned resources (especially AgentCore Runtime, DynamoDB, or API Gateway) may continue incurring charges. - ## Contributing When modifying the Terraform configuration, run `terraform fmt` and `terraform validate` before committing. diff --git a/infra-terraform/locals.tf b/infra-terraform/locals.tf index 68253f98..73939f29 100644 --- a/infra-terraform/locals.tf +++ b/infra-terraform/locals.tf @@ -17,16 +17,12 @@ locals { account_id = data.aws_caller_identity.current.account_id region = data.aws_region.current.id - # Common tags applied to all resources - common_tags = merge( - { - Project = var.stack_name_base - Environment = var.environment - ManagedBy = "Terraform" - Repository = "fullstack-agentcore-solution-template" - }, - var.tags - ) + # Common tags applied to all resources via provider default_tags + common_tags = { + Project = var.stack_name_base + ManagedBy = "Terraform" + Repository = "fullstack-agentcore-solution-template" + } # SSM parameter paths ssm_parameter_prefix = "/${var.stack_name_base}" @@ -42,9 +38,4 @@ locals { api_throttling_rate_limit = 100 api_throttling_burst_limit = 200 - # Callback URLs for Cognito (includes Amplify URL when available) - default_callback_urls = concat( - var.callback_urls, - [] # Amplify URL will be added dynamically via amplify_url variable - ) } diff --git a/infra-terraform/main.tf b/infra-terraform/main.tf index 702f9f44..220fe7c2 100644 --- a/infra-terraform/main.tf +++ b/infra-terraform/main.tf @@ -6,8 +6,6 @@ # ============================================================================= provider "aws" { - region = var.aws_region - default_tags { tags = local.common_tags } @@ -36,8 +34,6 @@ module "amplify_hosting" { staging_bucket_expiry_days = local.staging_bucket_expiry_days access_logs_expiry_days = local.access_logs_expiry_days - - tags = local.common_tags } # ============================================================================= @@ -52,16 +48,12 @@ module "amplify_hosting" { module "cognito" { source = "./modules/cognito" - stack_name_base = var.stack_name_base - admin_user_email = var.admin_user_email - callback_urls = local.default_callback_urls - password_minimum_length = var.password_minimum_length + stack_name_base = var.stack_name_base + admin_user_email = var.admin_user_email # Use the predictable Amplify URL from the app_url output amplify_url = module.amplify_hosting.app_url - tags = local.common_tags - depends_on = [module.amplify_hosting] } @@ -79,17 +71,15 @@ module "cognito" { module "backend" { source = "./modules/backend" - stack_name_base = var.stack_name_base - backend_pattern = var.backend_pattern - deployment_type = var.deployment_type - agent_name = var.agent_name - network_mode = var.network_mode - memory_event_expiry_days = var.memory_event_expiry_days + stack_name_base = var.stack_name_base + backend_pattern = var.backend_pattern + backend_deployment_type = var.backend_deployment_type + backend_network_mode = var.backend_network_mode - # VPC configuration (for PRIVATE mode) - vpc_id = var.vpc_id - private_subnet_ids = var.private_subnet_ids - security_group_ids = var.security_group_ids + # VPC configuration (for VPC mode) + backend_vpc_id = var.backend_vpc_id + backend_vpc_subnet_ids = var.backend_vpc_subnet_ids + backend_vpc_security_group_ids = var.backend_vpc_security_group_ids # Cognito configuration user_pool_id = module.cognito.user_pool_id @@ -105,7 +95,5 @@ module "backend" { throttling_rate_limit = local.api_throttling_rate_limit throttling_burst_limit = local.api_throttling_burst_limit - tags = local.common_tags - depends_on = [module.cognito, module.amplify_hosting] } diff --git a/infra-terraform/modules/amplify-hosting/main.tf b/infra-terraform/modules/amplify-hosting/main.tf index 2857d82c..914bc815 100644 --- a/infra-terraform/modules/amplify-hosting/main.tf +++ b/infra-terraform/modules/amplify-hosting/main.tf @@ -27,7 +27,7 @@ resource "aws_s3_bucket" "access_logs" { bucket_prefix = "${lower(var.stack_name_base)}-access-logs-" force_destroy = true - tags = var.tags + } resource "aws_s3_bucket_public_access_block" "access_logs" { @@ -60,7 +60,7 @@ resource "aws_s3_bucket" "staging" { bucket_prefix = "${lower(var.stack_name_base)}-staging-" force_destroy = true - tags = var.tags + } resource "aws_s3_bucket_versioning" "staging" { @@ -148,7 +148,7 @@ resource "aws_amplify_app" "frontend" { platform = var.platform description = "${var.stack_name_base} - React/Next.js Frontend" - tags = var.tags + } # ============================================================================= @@ -165,5 +165,5 @@ resource "aws_amplify_branch" "main" { # Enable auto-build on push (if using Git integration) enable_auto_build = false - tags = var.tags + } diff --git a/infra-terraform/modules/amplify-hosting/variables.tf b/infra-terraform/modules/amplify-hosting/variables.tf index b9f0cb62..e1dbd142 100644 --- a/infra-terraform/modules/amplify-hosting/variables.tf +++ b/infra-terraform/modules/amplify-hosting/variables.tf @@ -23,9 +23,3 @@ variable "access_logs_expiry_days" { type = number default = 90 } - -variable "tags" { - description = "Tags to apply to all resources." - type = map(string) - default = {} -} diff --git a/infra-terraform/modules/backend/feedback.tf b/infra-terraform/modules/backend/feedback.tf index 0f17a65a..d532b356 100644 --- a/infra-terraform/modules/backend/feedback.tf +++ b/infra-terraform/modules/backend/feedback.tf @@ -51,7 +51,6 @@ resource "aws_dynamodb_table" "feedback" { enabled = true } - tags = var.tags } # ----------------------------------------------------------------------------- @@ -62,7 +61,6 @@ resource "aws_cloudwatch_log_group" "feedback_lambda" { name = "/aws/lambda/${var.stack_name_base}-feedback" retention_in_days = local.log_retention_days - tags = var.tags } # ----------------------------------------------------------------------------- @@ -86,7 +84,6 @@ resource "aws_iam_role" "feedback_lambda" { assume_role_policy = data.aws_iam_policy_document.feedback_lambda_assume_role.json description = "Execution role for feedback Lambda function" - tags = var.tags } data "aws_iam_policy_document" "feedback_lambda_policy" { @@ -182,7 +179,6 @@ resource "aws_lambda_function" "feedback" { depends_on = [aws_cloudwatch_log_group.feedback_lambda] - tags = var.tags } # ----------------------------------------------------------------------------- @@ -197,7 +193,6 @@ resource "aws_api_gateway_rest_api" "feedback" { types = ["REGIONAL"] } - tags = var.tags } # ----------------------------------------------------------------------------- @@ -365,7 +360,6 @@ resource "aws_api_gateway_stage" "prod" { }) } - tags = var.tags depends_on = [aws_cloudwatch_log_group.api_gateway_access] } @@ -375,7 +369,6 @@ resource "aws_cloudwatch_log_group" "api_gateway_access" { name = "/aws/apigateway/${var.stack_name_base}-feedback-api/access-logs" retention_in_days = local.log_retention_days - tags = var.tags } # ----------------------------------------------------------------------------- diff --git a/infra-terraform/modules/backend/gateway.tf b/infra-terraform/modules/backend/gateway.tf index 49c55155..53d6a570 100644 --- a/infra-terraform/modules/backend/gateway.tf +++ b/infra-terraform/modules/backend/gateway.tf @@ -14,7 +14,6 @@ resource "aws_cloudwatch_log_group" "tool_lambda" { name = "/aws/lambda/${var.stack_name_base}-sample-tool" retention_in_days = local.log_retention_days - tags = var.tags } # ----------------------------------------------------------------------------- @@ -38,7 +37,6 @@ resource "aws_iam_role" "tool_lambda" { assume_role_policy = data.aws_iam_policy_document.tool_lambda_assume_role.json description = "Execution role for sample tool Lambda" - tags = var.tags } data "aws_iam_policy_document" "tool_lambda_policy" { @@ -81,7 +79,6 @@ resource "aws_lambda_function" "sample_tool" { depends_on = [aws_cloudwatch_log_group.tool_lambda] - tags = var.tags } # ----------------------------------------------------------------------------- @@ -105,7 +102,6 @@ resource "aws_iam_role" "gateway" { assume_role_policy = data.aws_iam_policy_document.gateway_assume_role.json description = "Role for AgentCore Gateway" - tags = var.tags } data "aws_iam_policy_document" "gateway_policy" { @@ -211,7 +207,6 @@ resource "aws_bedrockagentcore_gateway" "main" { } } - tags = var.tags depends_on = [time_sleep.gateway_iam_propagation] } diff --git a/infra-terraform/modules/backend/locals.tf b/infra-terraform/modules/backend/locals.tf index 340af47d..40f80f12 100644 --- a/infra-terraform/modules/backend/locals.tf +++ b/infra-terraform/modules/backend/locals.tf @@ -22,8 +22,11 @@ locals { # Stack name for resource naming (underscores for some AWS resources) stack_name_underscore = replace(var.stack_name_base, "-", "_") + # Agent name (matches CDK CfnParameter default in backend-stack.ts) + agent_name = "StrandsAgent" + # Runtime name (underscores required by AgentCore) - runtime_name = "${local.stack_name_underscore}_${var.agent_name}" + runtime_name = "${local.stack_name_underscore}_${local.agent_name}" # Memory name (unique within account/region) # Must match ^[a-zA-Z][a-zA-Z0-9_]{0,47}$ - no hyphens allowed @@ -36,8 +39,8 @@ locals { powertools_layer_arn = "arn:aws:lambda:${local.region}:017000801446:layer:AWSLambdaPowertoolsPythonV3-python313-arm64:18" # Deployment type flags - is_docker = var.deployment_type == "docker" - is_zip = var.deployment_type == "zip" + is_docker = var.backend_deployment_type == "docker" + is_zip = var.backend_deployment_type == "zip" # Project paths (for zip packaging) project_root = "${path.module}/../../.." @@ -62,4 +65,7 @@ locals { api_throttling_rate_limit = var.throttling_rate_limit api_throttling_burst_limit = var.throttling_burst_limit api_cache_ttl_seconds = 300 + + # Memory event expiry (hardcoded in CDK backend-stack.ts) + memory_event_expiry_days = 30 } diff --git a/infra-terraform/modules/backend/memory.tf b/infra-terraform/modules/backend/memory.tf index f657085a..ab9fd458 100644 --- a/infra-terraform/modules/backend/memory.tf +++ b/infra-terraform/modules/backend/memory.tf @@ -25,8 +25,6 @@ resource "aws_iam_role" "memory_execution" { name = "${var.stack_name_base}-memory-execution-role" assume_role_policy = data.aws_iam_policy_document.memory_assume_role.json description = "Execution role for AgentCore Memory" - - tags = var.tags } # Attach the AWS managed policy for Bedrock model inference @@ -40,17 +38,13 @@ resource "aws_iam_role_policy_attachment" "memory_bedrock_policy" { # Configured with short-term memory (conversation history) as default resource "aws_bedrockagentcore_memory" "main" { name = local.memory_name - event_expiry_duration = var.memory_event_expiry_days + event_expiry_duration = local.memory_event_expiry_days description = "Short-term memory for ${var.stack_name_base} agent" # Memory execution role for model processing (required for long-term strategies) memory_execution_role_arn = aws_iam_role.memory_execution.arn - tags = merge( - var.tags, - { - Name = "${var.stack_name_base}_Memory" - ManagedBy = "Terraform" - } - ) + tags = { + Name = "${var.stack_name_base}_Memory" + } } diff --git a/infra-terraform/modules/backend/oauth2_provider.tf b/infra-terraform/modules/backend/oauth2_provider.tf new file mode 100644 index 00000000..a5bc5667 --- /dev/null +++ b/infra-terraform/modules/backend/oauth2_provider.tf @@ -0,0 +1,253 @@ +# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. +# SPDX-License-Identifier: Apache-2.0 + +# ============================================================================= +# OAuth2 Credential Provider +# Maps to: backend-stack.ts createOAuth2CredentialProvider() +# ============================================================================= +# Creates Lambda function that manages OAuth2 Credential Provider lifecycle +# for AgentCore Runtime to authenticate with AgentCore Gateway. +# Uses CloudFormation Custom Resource pattern via null_resource invocation. +# +# Background: +# AgentCore doesn't have a native Terraform/CloudFormation resource for OAuth2 +# Credential Provider yet. This Lambda calls the bedrock-agentcore-control API +# directly to create/update/delete the provider. The Custom Resource pattern +# is used to avoid logging sensitive credentials in CloudWatch (client secret +# is read from Secrets Manager at runtime). + +# ----------------------------------------------------------------------------- +# CloudWatch Log Group for OAuth2 Provider Lambda +# ----------------------------------------------------------------------------- + +resource "aws_cloudwatch_log_group" "oauth2_provider" { + name = "/aws/lambda/${var.stack_name_base}-oauth2-provider" + retention_in_days = 7 + +} + +# ----------------------------------------------------------------------------- +# IAM Role for OAuth2 Provider Lambda +# ----------------------------------------------------------------------------- + +data "aws_iam_policy_document" "oauth2_provider_assume_role" { + statement { + effect = "Allow" + actions = ["sts:AssumeRole"] + + principals { + type = "Service" + identifiers = ["lambda.amazonaws.com"] + } + } +} + +resource "aws_iam_role" "oauth2_provider" { + name = "${var.stack_name_base}-oauth2-provider-role" + assume_role_policy = data.aws_iam_policy_document.oauth2_provider_assume_role.json + +} + +# IAM Policy for OAuth2 Provider Lambda +data "aws_iam_policy_document" "oauth2_provider_policy" { + # CloudWatch Logs + statement { + sid = "CloudWatchLogsAccess" + effect = "Allow" + actions = [ + "logs:CreateLogStream", + "logs:PutLogEvents" + ] + resources = ["${aws_cloudwatch_log_group.oauth2_provider.arn}:*"] + } + + # Read Machine Client Secret + # Lambda needs to read the machine client secret to register OAuth2 provider + statement { + sid = "ReadMachineClientSecret" + effect = "Allow" + actions = ["secretsmanager:GetSecretValue"] + resources = [aws_secretsmanager_secret.machine_client_secret.arn] + } + + # OAuth2 Credential Provider Operations + # Note: Need both vault-level and nested resource permissions because: + # - CreateOauth2CredentialProvider checks permission on vault itself (token-vault/default) + # - Also checks permission on the nested resource path (token-vault/default/oauth2credentialprovider/*) + statement { + sid = "OAuth2CredentialProviderOperations" + effect = "Allow" + actions = [ + "bedrock-agentcore:CreateOauth2CredentialProvider", + "bedrock-agentcore:GetOauth2CredentialProvider", + "bedrock-agentcore:UpdateOauth2CredentialProvider", + "bedrock-agentcore:DeleteOauth2CredentialProvider" + ] + resources = [ + "arn:aws:bedrock-agentcore:${local.region}:${local.account_id}:token-vault/default", + "arn:aws:bedrock-agentcore:${local.region}:${local.account_id}:token-vault/default/oauth2credentialprovider/*" + ] + } + + # Token Vault Operations + # Note: Need both exact match (default) and wildcard (default/*) because: + # - AWS checks permission on the vault container itself (token-vault/default) + # - AWS also checks permission on resources inside (token-vault/default/*) + statement { + sid = "TokenVaultOperations" + effect = "Allow" + actions = [ + "bedrock-agentcore:CreateTokenVault", + "bedrock-agentcore:GetTokenVault", + "bedrock-agentcore:DeleteTokenVault" + ] + resources = [ + "arn:aws:bedrock-agentcore:${local.region}:${local.account_id}:token-vault/default", + "arn:aws:bedrock-agentcore:${local.region}:${local.account_id}:token-vault/default/*" + ] + } + + # Token Vault Secret Management + # Lambda creates secrets in AgentCore Identity namespace for Token Vault + statement { + sid = "TokenVaultSecretManagement" + effect = "Allow" + actions = [ + "secretsmanager:CreateSecret", + "secretsmanager:DeleteSecret", + "secretsmanager:DescribeSecret", + "secretsmanager:PutSecretValue" + ] + resources = [ + "arn:aws:secretsmanager:${local.region}:${local.account_id}:secret:bedrock-agentcore-identity!default/oauth2/*" + ] + } +} + +resource "aws_iam_role_policy" "oauth2_provider" { + name = "${var.stack_name_base}-oauth2-provider-policy" + role = aws_iam_role.oauth2_provider.id + policy = data.aws_iam_policy_document.oauth2_provider_policy.json +} + +# ----------------------------------------------------------------------------- +# Lambda Function for OAuth2 Provider Lifecycle +# ----------------------------------------------------------------------------- + +# Package the Lambda code +data "archive_file" "oauth2_provider" { + type = "zip" + source_file = "${path.module}/../../../infra-cdk/lambdas/oauth2-provider/index.py" + output_path = "${path.module}/artifacts/oauth2-provider.zip" +} + +resource "aws_lambda_function" "oauth2_provider" { + filename = data.archive_file.oauth2_provider.output_path + function_name = "${var.stack_name_base}-oauth2-provider" + role = aws_iam_role.oauth2_provider.arn + handler = "index.handler" + source_code_hash = data.archive_file.oauth2_provider.output_base64sha256 + runtime = "python3.13" + timeout = 300 # 5 minutes + + + depends_on = [ + aws_cloudwatch_log_group.oauth2_provider, + aws_iam_role_policy.oauth2_provider + ] +} + +# ----------------------------------------------------------------------------- +# Custom Resource Invocation via null_resource +# Simulates CloudFormation Custom Resource by invoking Lambda directly +# ----------------------------------------------------------------------------- + +resource "null_resource" "invoke_oauth2_provider" { + # Recreate when any of these values change + triggers = { + provider_name = "${var.stack_name_base}-runtime-gateway-auth" + client_id = aws_cognito_user_pool_client.machine.id + client_secret = aws_secretsmanager_secret_version.machine_client_secret.version_id + discovery_url = local.oidc_discovery_url + function_name = aws_lambda_function.oauth2_provider.function_name + region = local.region + } + + provisioner "local-exec" { + interpreter = ["bash", "-c"] + command = <<-EOT + set -e + + # Build the CloudFormation Custom Resource payload + PAYLOAD=$(cat <<'PAYLOAD_EOF' +{ + "RequestType": "Create", + "ResourceProperties": { + "ProviderName": "${self.triggers.provider_name}", + "ClientSecretArn": "${aws_secretsmanager_secret.machine_client_secret.arn}", + "DiscoveryUrl": "${self.triggers.discovery_url}", + "ClientId": "${self.triggers.client_id}" + } +} +PAYLOAD_EOF +) + + echo "Invoking OAuth2 provider Lambda: ${self.triggers.function_name}" + + # Invoke Lambda (--cli-binary-format raw-in-base64-out ensures JSON payload is accepted) + aws lambda invoke \ + --function-name ${self.triggers.function_name} \ + --cli-binary-format raw-in-base64-out \ + --payload "$PAYLOAD" \ + --region ${self.triggers.region} \ + /tmp/oauth2_provider_response.json + + # Check for errors + if grep -q "FunctionError" /tmp/oauth2_provider_response.json; then + echo "ERROR: OAuth2 provider creation failed" + cat /tmp/oauth2_provider_response.json + exit 1 + fi + + echo "OAuth2 provider created successfully" + cat /tmp/oauth2_provider_response.json + EOT + } + + provisioner "local-exec" { + when = destroy + interpreter = ["bash", "-c"] + command = <<-EOT + # Build the CloudFormation Custom Resource Delete payload + PAYLOAD=$(cat <<'PAYLOAD_EOF' +{ + "RequestType": "Delete", + "PhysicalResourceId": "${self.triggers.provider_name}", + "ResourceProperties": { + "ProviderName": "${self.triggers.provider_name}" + } +} +PAYLOAD_EOF +) + + echo "Deleting OAuth2 provider: ${self.triggers.provider_name}" + + # Invoke Lambda for deletion (ignore errors if already deleted) + aws lambda invoke \ + --function-name ${self.triggers.function_name} \ + --cli-binary-format raw-in-base64-out \ + --payload "$PAYLOAD" \ + --region ${self.triggers.region} \ + /tmp/oauth2_provider_delete.json || true + + echo "OAuth2 provider deletion completed" + cat /tmp/oauth2_provider_delete.json || true + EOT + } + + depends_on = [ + aws_lambda_function.oauth2_provider, + aws_cognito_user_pool_client.machine, + aws_secretsmanager_secret_version.machine_client_secret + ] +} diff --git a/infra-terraform/modules/backend/outputs.tf b/infra-terraform/modules/backend/outputs.tf index fab62e66..d8396813 100644 --- a/infra-terraform/modules/backend/outputs.tf +++ b/infra-terraform/modules/backend/outputs.tf @@ -5,11 +5,6 @@ # Memory Outputs # ============================================================================= -output "memory_id" { - description = "AgentCore Memory ID" - value = aws_bedrockagentcore_memory.main.id -} - output "memory_arn" { description = "AgentCore Memory ARN" value = aws_bedrockagentcore_memory.main.arn @@ -63,26 +58,6 @@ output "runtime_role_arn" { value = aws_iam_role.runtime.arn } -output "ecr_repository_url" { - description = "ECR repository URL for agent container (docker mode only)" - value = local.is_docker && var.container_uri == null ? aws_ecr_repository.agent[0].repository_url : null -} - -output "agent_code_bucket" { - description = "S3 bucket for agent code packages (zip mode only)" - value = local.is_zip ? aws_s3_bucket.agent_code[0].id : null -} - -output "agent_code_key" { - description = "S3 object key for agent deployment package (zip mode only)" - value = local.is_zip ? "deployment_package.zip" : null -} - -output "deployment_type" { - description = "Deployment type used (docker or zip)" - value = var.deployment_type -} - # ============================================================================= # Feedback API Outputs # ============================================================================= @@ -92,21 +67,6 @@ output "feedback_api_url" { value = "${aws_api_gateway_stage.prod.invoke_url}/feedback" } -output "feedback_api_id" { - description = "Feedback API Gateway REST API ID" - value = aws_api_gateway_rest_api.feedback.id -} - -output "feedback_table_name" { - description = "Feedback DynamoDB table name" - value = aws_dynamodb_table.feedback.name -} - -output "feedback_lambda_arn" { - description = "Feedback Lambda function ARN" - value = aws_lambda_function.feedback.arn -} - # ============================================================================= # Machine Client Outputs # ============================================================================= diff --git a/infra-terraform/modules/backend/runtime.tf b/infra-terraform/modules/backend/runtime.tf index 5f19b4db..20560851 100644 --- a/infra-terraform/modules/backend/runtime.tf +++ b/infra-terraform/modules/backend/runtime.tf @@ -25,8 +25,6 @@ resource "aws_ecr_repository" "agent" { encryption_configuration { encryption_type = "AES256" } - - tags = var.tags } # ECR Lifecycle policy to keep only recent images @@ -146,8 +144,6 @@ resource "aws_iam_role" "runtime" { name = "${var.stack_name_base}-agentcore-runtime-role" assume_role_policy = data.aws_iam_policy_document.runtime_assume_role.json description = "Execution role for AgentCore Runtime" - - tags = var.tags } # ----------------------------------------------------------------------------- @@ -155,7 +151,7 @@ resource "aws_iam_role" "runtime" { # ----------------------------------------------------------------------------- data "aws_iam_policy_document" "runtime_policy" { - # 1. ECRImageAccess (docker mode only) + # ECRImageAccess (docker mode only) dynamic "statement" { for_each = local.is_docker ? [1] : [] content { @@ -170,7 +166,7 @@ data "aws_iam_policy_document" "runtime_policy" { } } - # 2. ECRTokenAccess (docker mode only) + # ECRTokenAccess (docker mode only) dynamic "statement" { for_each = local.is_docker ? [1] : [] content { @@ -198,7 +194,7 @@ data "aws_iam_policy_document" "runtime_policy" { } } - # 3. CloudWatchLogsGroupAccess + # CloudWatchLogsGroupAccess statement { sid = "CloudWatchLogsGroupAccess" effect = "Allow" @@ -209,7 +205,7 @@ data "aws_iam_policy_document" "runtime_policy" { resources = ["arn:aws:logs:${local.region}:${local.account_id}:log-group:/aws/bedrock-agentcore/runtimes/*"] } - # 4. CloudWatchLogsDescribeGroups + # CloudWatchLogsDescribeGroups statement { sid = "CloudWatchLogsDescribeGroups" effect = "Allow" @@ -217,7 +213,7 @@ data "aws_iam_policy_document" "runtime_policy" { resources = ["arn:aws:logs:${local.region}:${local.account_id}:log-group:*"] } - # 5. CloudWatchLogsStreamAccess + # CloudWatchLogsStreamAccess statement { sid = "CloudWatchLogsStreamAccess" effect = "Allow" @@ -228,7 +224,7 @@ data "aws_iam_policy_document" "runtime_policy" { resources = ["arn:aws:logs:${local.region}:${local.account_id}:log-group:/aws/bedrock-agentcore/runtimes/*:log-stream:*"] } - # 6. X-Ray Tracing + # X-Ray Tracing statement { sid = "XRayTracing" effect = "Allow" @@ -241,7 +237,7 @@ data "aws_iam_policy_document" "runtime_policy" { resources = ["*"] } - # 7. CloudWatch Metrics + # CloudWatch Metrics statement { sid = "CloudWatchMetrics" effect = "Allow" @@ -255,7 +251,7 @@ data "aws_iam_policy_document" "runtime_policy" { } } - # 8. GetAgentAccessToken + # GetAgentAccessToken statement { sid = "GetAgentAccessToken" effect = "Allow" @@ -270,7 +266,7 @@ data "aws_iam_policy_document" "runtime_policy" { ] } - # 9. BedrockModelInvocation + # BedrockModelInvocation statement { sid = "BedrockModelInvocation" effect = "Allow" @@ -284,15 +280,19 @@ data "aws_iam_policy_document" "runtime_policy" { ] } - # 10. SecretsManagerAccess + # SecretsManagerOAuth2Access + # Runtime needs to read OAuth2 credentials from Token Vault secret + # created by AgentCore Identity (not the machine client secret directly) statement { - sid = "SecretsManagerAccess" - effect = "Allow" - actions = ["secretsmanager:GetSecretValue"] - resources = ["arn:aws:secretsmanager:${local.region}:${local.account_id}:secret:/*/machine_client_secret*"] + sid = "SecretsManagerOAuth2Access" + effect = "Allow" + actions = ["secretsmanager:GetSecretValue"] + resources = [ + "arn:aws:secretsmanager:${local.region}:${local.account_id}:secret:bedrock-agentcore-identity!default/oauth2/${var.stack_name_base}-runtime-gateway-auth*" + ] } - # 11. MemoryResourceAccess - references memory resource directly (no variable passing) + # MemoryResourceAccess - references memory resource directly (no variable passing) statement { sid = "MemoryResourceAccess" effect = "Allow" @@ -305,7 +305,7 @@ data "aws_iam_policy_document" "runtime_policy" { resources = [aws_bedrockagentcore_memory.main.arn] } - # 12. SSMParameterAccess + # SSMParameterAccess statement { sid = "SSMParameterAccess" effect = "Allow" @@ -316,7 +316,7 @@ data "aws_iam_policy_document" "runtime_policy" { resources = ["arn:aws:ssm:${local.region}:${local.account_id}:parameter/${var.stack_name_base}/*"] } - # 13. CodeInterpreterAccess + # CodeInterpreterAccess statement { sid = "CodeInterpreterAccess" effect = "Allow" @@ -327,6 +327,24 @@ data "aws_iam_policy_document" "runtime_policy" { ] resources = ["arn:aws:bedrock-agentcore:${local.region}:aws:code-interpreter/*"] } + + # OAuth2CredentialProviderAccess + # The @requires_access_token decorator performs a two-stage process: + # GetOauth2CredentialProvider - Looks up provider metadata + # GetResourceOauth2Token - Fetches the actual access token from Token Vault + statement { + sid = "OAuth2CredentialProviderAccess" + effect = "Allow" + actions = [ + "bedrock-agentcore:GetOauth2CredentialProvider", + "bedrock-agentcore:GetResourceOauth2Token" + ] + resources = [ + "arn:aws:bedrock-agentcore:${local.region}:${local.account_id}:oauth2-credential-provider/*", + "arn:aws:bedrock-agentcore:${local.region}:${local.account_id}:token-vault/*", + "arn:aws:bedrock-agentcore:${local.region}:${local.account_id}:workload-identity-directory/*" + ] + } } resource "aws_iam_role_policy" "runtime" { @@ -335,6 +353,53 @@ resource "aws_iam_role_policy" "runtime" { policy = data.aws_iam_policy_document.runtime_policy.json } +# ----------------------------------------------------------------------------- +# Default Security Group (for VPC mode, when none provided) +# ----------------------------------------------------------------------------- + +locals { + # Use user-provided security groups, or fall back to the auto-created default + effective_security_group_ids = ( + var.backend_network_mode == "VPC" && length(var.backend_vpc_security_group_ids) == 0 + ? [aws_security_group.runtime_default[0].id] + : var.backend_vpc_security_group_ids + ) +} + +resource "aws_security_group" "runtime_default" { + count = var.backend_network_mode == "VPC" && length(var.backend_vpc_security_group_ids) == 0 ? 1 : 0 + + name = "${var.stack_name_base}-agentcore-runtime-sg" + description = "Default security group for AgentCore Runtime VPC deployment" + vpc_id = var.backend_vpc_id + + tags = { + Name = "${var.stack_name_base}-agentcore-runtime-sg" + } +} + +# Self-referencing ingress rule: allows HTTPS traffic between runtime and VPC endpoints +resource "aws_vpc_security_group_ingress_rule" "runtime_default_https" { + count = var.backend_network_mode == "VPC" && length(var.backend_vpc_security_group_ids) == 0 ? 1 : 0 + + security_group_id = aws_security_group.runtime_default[0].id + referenced_security_group_id = aws_security_group.runtime_default[0].id + from_port = 443 + to_port = 443 + ip_protocol = "tcp" + description = "Allow HTTPS from self (VPC endpoint access)" +} + +# Egress rule: allow all outbound traffic (matches CDK allowAllOutbound: true) +resource "aws_vpc_security_group_egress_rule" "runtime_default_all" { + count = var.backend_network_mode == "VPC" && length(var.backend_vpc_security_group_ids) == 0 ? 1 : 0 + + security_group_id = aws_security_group.runtime_default[0].id + cidr_ipv4 = "0.0.0.0/0" + ip_protocol = "-1" + description = "Allow all outbound traffic" +} + # ----------------------------------------------------------------------------- # AgentCore Runtime # ----------------------------------------------------------------------------- @@ -371,14 +436,18 @@ resource "aws_bedrockagentcore_agent_runtime" "main" { } # Network configuration + # PUBLIC: Runtime is accessible over the public internet (default). + # VPC: Runtime is deployed into a user-provided VPC for private network isolation. + # The user must ensure their VPC has the necessary VPC endpoints for AWS services. + # See docs/DEPLOYMENT.md for the full list of required VPC endpoints. network_configuration { - network_mode = var.network_mode + network_mode = var.backend_network_mode dynamic "network_mode_config" { - for_each = var.network_mode == "PRIVATE" && length(var.private_subnet_ids) > 0 ? [1] : [] + for_each = var.backend_network_mode == "VPC" ? [1] : [] content { - subnets = var.private_subnet_ids - security_groups = var.security_group_ids + subnets = var.backend_vpc_subnet_ids + security_groups = local.effective_security_group_ids } } } @@ -386,8 +455,8 @@ resource "aws_bedrockagentcore_agent_runtime" "main" { # JWT authorizer configuration (Cognito) authorizer_configuration { custom_jwt_authorizer { - discovery_url = local.oidc_discovery_url - allowed_audience = [var.web_client_id] + discovery_url = local.oidc_discovery_url + allowed_clients = [var.web_client_id] } } @@ -403,16 +472,23 @@ resource "aws_bedrockagentcore_agent_runtime" "main" { # Environment variables for the runtime environment_variables = { - AWS_REGION = local.region - AWS_DEFAULT_REGION = local.region - MEMORY_ID = aws_bedrockagentcore_memory.main.id - STACK_NAME = var.stack_name_base + AWS_REGION = local.region + AWS_DEFAULT_REGION = local.region + MEMORY_ID = aws_bedrockagentcore_memory.main.id + STACK_NAME = var.stack_name_base + GATEWAY_CREDENTIAL_PROVIDER_NAME = "${var.stack_name_base}-runtime-gateway-auth" } - tags = var.tags - # Force runtime replacement when agent code changes (zip or docker) lifecycle { + precondition { + condition = var.backend_network_mode != "VPC" || (var.backend_vpc_id != null && var.backend_vpc_id != "") + error_message = "backend_vpc_id is required when backend_network_mode is 'VPC'." + } + precondition { + condition = var.backend_network_mode != "VPC" || length(var.backend_vpc_subnet_ids) > 0 + error_message = "backend_vpc_subnet_ids must contain at least one subnet ID when backend_network_mode is 'VPC'." + } replace_triggered_by = [ terraform_data.agent_code_hash, terraform_data.docker_image_hash, @@ -422,6 +498,7 @@ resource "aws_bedrockagentcore_agent_runtime" "main" { depends_on = [ aws_iam_role_policy.runtime, null_resource.invoke_zip_packager, - null_resource.docker_build_push + null_resource.docker_build_push, + null_resource.invoke_oauth2_provider # Ensure provider is registered before Runtime starts ] } diff --git a/infra-terraform/modules/backend/ssm.tf b/infra-terraform/modules/backend/ssm.tf index 96d1e025..00587ea6 100644 --- a/infra-terraform/modules/backend/ssm.tf +++ b/infra-terraform/modules/backend/ssm.tf @@ -17,7 +17,6 @@ resource "aws_ssm_parameter" "runtime_arn" { type = "String" value = aws_bedrockagentcore_agent_runtime.main.agent_runtime_arn - tags = var.tags } resource "aws_ssm_parameter" "cognito_user_pool_id" { @@ -26,7 +25,6 @@ resource "aws_ssm_parameter" "cognito_user_pool_id" { type = "String" value = var.user_pool_id - tags = var.tags } resource "aws_ssm_parameter" "cognito_user_pool_client_id" { @@ -35,7 +33,6 @@ resource "aws_ssm_parameter" "cognito_user_pool_client_id" { type = "String" value = var.web_client_id - tags = var.tags } resource "aws_ssm_parameter" "machine_client_id" { @@ -44,7 +41,6 @@ resource "aws_ssm_parameter" "machine_client_id" { type = "String" value = aws_cognito_user_pool_client.machine.id - tags = var.tags } resource "aws_ssm_parameter" "cognito_provider" { @@ -53,7 +49,6 @@ resource "aws_ssm_parameter" "cognito_provider" { type = "String" value = var.cognito_domain_url - tags = var.tags } resource "aws_ssm_parameter" "feedback_api_url" { @@ -62,7 +57,6 @@ resource "aws_ssm_parameter" "feedback_api_url" { type = "String" value = "${aws_api_gateway_stage.prod.invoke_url}/feedback" - tags = var.tags } resource "aws_ssm_parameter" "gateway_url" { @@ -71,7 +65,6 @@ resource "aws_ssm_parameter" "gateway_url" { type = "String" value = aws_bedrockagentcore_gateway.main.gateway_url - tags = var.tags } # Agent Code Bucket (zip mode only) - matches CDK's AgentCodeBucketNameParam @@ -83,7 +76,6 @@ resource "aws_ssm_parameter" "agent_code_bucket" { type = "String" value = aws_s3_bucket.agent_code[0].id - tags = var.tags } # ----------------------------------------------------------------------------- @@ -95,7 +87,6 @@ resource "aws_secretsmanager_secret" "machine_client_secret" { name = "${local.ssm_parameter_prefix}/machine_client_secret" description = "Machine Client Secret for M2M authentication" - tags = var.tags } resource "aws_secretsmanager_secret_version" "machine_client_secret" { diff --git a/infra-terraform/modules/backend/variables.tf b/infra-terraform/modules/backend/variables.tf index f77acf0f..f0b75382 100644 --- a/infra-terraform/modules/backend/variables.tf +++ b/infra-terraform/modules/backend/variables.tf @@ -16,54 +16,37 @@ variable "backend_pattern" { default = "strands-single-agent" } -variable "deployment_type" { +variable "backend_deployment_type" { description = "Deployment type: 'docker' (container via ECR) or 'zip' (Python package via S3)." type = string default = "docker" } -variable "agent_name" { - description = "Name for the agent runtime." - type = string - default = "StrandsAgent" -} - -variable "network_mode" { - description = "Network mode for AgentCore resources (PUBLIC or PRIVATE)." +variable "backend_network_mode" { + description = "Network mode for AgentCore Runtime (PUBLIC or VPC)." type = string default = "PUBLIC" } -variable "memory_event_expiry_days" { - description = "Number of days after which memory events expire." - type = number - default = 30 -} - -variable "environment" { - description = "Environment name for tagging." - type = string - default = "dev" -} # ============================================================================= -# VPC Configuration (Required if network_mode = PRIVATE) +# VPC Configuration (Required if backend_network_mode = VPC) # ============================================================================= -variable "vpc_id" { - description = "VPC ID for private network mode." +variable "backend_vpc_id" { + description = "VPC ID for VPC network mode. Required when backend_network_mode is 'VPC'." type = string default = null } -variable "private_subnet_ids" { - description = "List of private subnet IDs for private network mode." +variable "backend_vpc_subnet_ids" { + description = "List of subnet IDs for VPC network mode. Required when backend_network_mode is 'VPC'." type = list(string) default = [] } -variable "security_group_ids" { - description = "List of security group IDs for private network mode." +variable "backend_vpc_security_group_ids" { + description = "List of security group IDs for VPC network mode. Optional when backend_network_mode is 'VPC'. If omitted, a default security group is created." type = list(string) default = [] } @@ -129,8 +112,3 @@ variable "throttling_burst_limit" { default = 200 } -variable "tags" { - description = "Tags to apply to all resources." - type = map(string) - default = {} -} diff --git a/infra-terraform/modules/backend/zip_packager.tf b/infra-terraform/modules/backend/zip_packager.tf index 19abfe95..07b70dbd 100644 --- a/infra-terraform/modules/backend/zip_packager.tf +++ b/infra-terraform/modules/backend/zip_packager.tf @@ -21,7 +21,6 @@ resource "aws_s3_bucket" "agent_code" { bucket = "${var.stack_name_base}-agent-code-${local.account_id}" force_destroy = true - tags = var.tags } resource "aws_s3_bucket_versioning" "agent_code" { @@ -81,7 +80,6 @@ resource "aws_iam_role" "zip_packager" { name = "${var.stack_name_base}-zip-packager-role" assume_role_policy = data.aws_iam_policy_document.zip_packager_assume_role[0].json - tags = var.tags } data "aws_iam_policy_document" "zip_packager_policy" { @@ -131,7 +129,6 @@ resource "aws_cloudwatch_log_group" "zip_packager" { name = "/aws/lambda/${var.stack_name_base}-zip-packager" retention_in_days = local.log_retention_days - tags = var.tags } # ----------------------------------------------------------------------------- @@ -164,7 +161,6 @@ resource "aws_lambda_function" "zip_packager" { size = 2048 } - tags = var.tags depends_on = [ aws_cloudwatch_log_group.zip_packager[0], diff --git a/infra-terraform/modules/cognito/main.tf b/infra-terraform/modules/cognito/main.tf index a8a909bc..9a17322f 100644 --- a/infra-terraform/modules/cognito/main.tf +++ b/infra-terraform/modules/cognito/main.tf @@ -19,8 +19,14 @@ locals { # Cognito domain prefix (must be globally unique and lowercase) domain_prefix = "${lower(replace(var.stack_name_base, "_", "-"))}-${local.account_id}-${local.region}" + # Callback URLs (hardcoded to match CDK cognito-stack.ts defaults) + default_callback_urls = ["http://localhost:3000", "https://localhost:3000"] + # Combine callback URLs with Amplify URL if provided - all_callback_urls = var.amplify_url != null ? concat(var.callback_urls, [var.amplify_url]) : var.callback_urls + all_callback_urls = var.amplify_url != null ? concat(local.default_callback_urls, [var.amplify_url]) : local.default_callback_urls + + # Password minimum length (hardcoded to match CDK cognito-stack.ts) + password_minimum_length = 8 # User invitation email template invitation_email_subject = "Welcome to ${var.stack_name_base}!" @@ -70,7 +76,7 @@ resource "aws_cognito_user_pool" "main" { # Password policy password_policy { - minimum_length = var.password_minimum_length + minimum_length = local.password_minimum_length require_lowercase = true require_uppercase = true require_numbers = true @@ -99,8 +105,6 @@ resource "aws_cognito_user_pool" "main" { # Allow deletion (no protection) deletion_protection = "INACTIVE" - - tags = var.tags } # ============================================================================= diff --git a/infra-terraform/modules/cognito/outputs.tf b/infra-terraform/modules/cognito/outputs.tf index a1438a04..029f2da2 100644 --- a/infra-terraform/modules/cognito/outputs.tf +++ b/infra-terraform/modules/cognito/outputs.tf @@ -43,11 +43,6 @@ output "cognito_domain_url" { value = "${aws_cognito_user_pool_domain.main.domain}.auth.${local.region}.amazoncognito.com" } -output "hosted_ui_url" { - description = "Cognito hosted UI login URL" - value = "https://${aws_cognito_user_pool_domain.main.domain}.auth.${local.region}.amazoncognito.com/login?client_id=${aws_cognito_user_pool_client.web.id}&response_type=code&redirect_uri=${urlencode(local.all_callback_urls[0])}" -} - # ============================================================================= # OIDC Configuration Outputs # ============================================================================= diff --git a/infra-terraform/modules/cognito/variables.tf b/infra-terraform/modules/cognito/variables.tf index edb649b1..31189e18 100644 --- a/infra-terraform/modules/cognito/variables.tf +++ b/infra-terraform/modules/cognito/variables.tf @@ -12,26 +12,8 @@ variable "admin_user_email" { default = null } -variable "callback_urls" { - description = "OAuth callback URLs for Cognito." - type = list(string) - default = ["http://localhost:3000", "https://localhost:3000"] -} - variable "amplify_url" { description = "Amplify app URL to add to callback URLs." type = string default = null } - -variable "password_minimum_length" { - description = "Minimum password length for Cognito User Pool." - type = number - default = 8 -} - -variable "tags" { - description = "Tags to apply to all resources." - type = map(string) - default = {} -} diff --git a/infra-terraform/outputs.tf b/infra-terraform/outputs.tf index 15a16a46..1ea240ce 100644 --- a/infra-terraform/outputs.tf +++ b/infra-terraform/outputs.tf @@ -10,11 +10,6 @@ output "cognito_user_pool_id" { value = module.cognito.user_pool_id } -output "cognito_user_pool_arn" { - description = "Cognito User Pool ARN" - value = module.cognito.user_pool_arn -} - output "cognito_web_client_id" { description = "Cognito Web Client ID (for frontend)" value = module.cognito.web_client_id @@ -30,11 +25,6 @@ output "cognito_domain_url" { value = module.cognito.cognito_domain_url } -output "cognito_hosted_ui_url" { - description = "Cognito hosted UI login URL" - value = module.cognito.hosted_ui_url -} - # ============================================================================= # Amplify Outputs # ============================================================================= @@ -58,11 +48,6 @@ output "amplify_staging_bucket" { # AgentCore Memory Outputs # ============================================================================= -output "memory_id" { - description = "AgentCore Memory ID" - value = module.backend.memory_id -} - output "memory_arn" { description = "AgentCore Memory ARN" value = module.backend.memory_arn @@ -116,21 +101,6 @@ output "runtime_role_arn" { value = module.backend.runtime_role_arn } -output "ecr_repository_url" { - description = "ECR repository URL for agent container (docker mode only)" - value = module.backend.ecr_repository_url -} - -output "agent_code_bucket" { - description = "S3 bucket for agent code packages (zip mode only)" - value = module.backend.agent_code_bucket -} - -output "deployment_type" { - description = "Deployment type used (docker or zip)" - value = module.backend.deployment_type -} - # ============================================================================= # Feedback API Outputs # ============================================================================= @@ -140,21 +110,6 @@ output "feedback_api_url" { value = module.backend.feedback_api_url } -output "feedback_api_id" { - description = "Feedback API Gateway ID" - value = module.backend.feedback_api_id -} - -output "feedback_table_name" { - description = "Feedback DynamoDB table name" - value = module.backend.feedback_table_name -} - -output "feedback_lambda_arn" { - description = "Feedback Lambda function ARN" - value = module.backend.feedback_lambda_arn -} - # ============================================================================= # SSM Parameter Paths (for reference) # ============================================================================= @@ -174,11 +129,9 @@ output "deployment_summary" { stack_name = var.stack_name_base region = local.region account_id = local.account_id - environment = var.environment - deployment_type = var.deployment_type + deployment_type = var.backend_deployment_type frontend_url = module.amplify_hosting.app_url gateway_url = module.backend.gateway_url api_url = module.backend.feedback_api_url - cognito_login = module.cognito.hosted_ui_url } } diff --git a/infra-terraform/scripts/build-and-push-image.sh b/infra-terraform/scripts/build-and-push-image.sh index 3c193ec4..f4d7a8c1 100755 --- a/infra-terraform/scripts/build-and-push-image.sh +++ b/infra-terraform/scripts/build-and-push-image.sh @@ -92,20 +92,18 @@ if [[ -z "$STACK_NAME" || -z "$REGION" ]]; then STACK_NAME=$(grep -E '^stack_name_base\s*=' "$TFVARS_FILE" | awk -F'"' '{print $2}') fi - if [[ -z "$REGION" ]]; then - REGION=$(grep -E '^aws_region\s*=' "$TFVARS_FILE" | awk -F'"' '{print $2}') - fi + # Region is resolved from AWS_REGION env var or AWS CLI profile (not in tfvars) fi fi # Check deployment type - this script is for docker mode only TFVARS_FILE="$TERRAFORM_DIR/terraform.tfvars" if [[ -f "$TFVARS_FILE" ]]; then - DEPLOYMENT_TYPE=$(grep -E '^deployment_type\s*=' "$TFVARS_FILE" | awk -F'"' '{print $2}') + DEPLOYMENT_TYPE=$(grep -E '^backend_deployment_type\s*=' "$TFVARS_FILE" | awk -F'"' '{print $2}') if [[ "$DEPLOYMENT_TYPE" == "zip" ]]; then - echo -e "${YELLOW}========================================${NC}" - echo -e "${YELLOW} deployment_type is set to 'zip' ${NC}" - echo -e "${YELLOW}========================================${NC}" + echo -e "${YELLOW}===========================================${NC}" + echo -e "${YELLOW} backend_deployment_type is set to 'zip' ${NC}" + echo -e "${YELLOW}===========================================${NC}" echo "" echo -e "This script is only needed for ${GREEN}docker${NC} deployment mode." echo -e "With ${GREEN}zip${NC} mode, agent code is packaged automatically during ${BLUE}terraform apply${NC}." @@ -114,8 +112,10 @@ if [[ -f "$TFVARS_FILE" ]]; then fi fi -# Set defaults if still empty -REGION="${REGION:-us-east-1}" +# Resolve region: CLI flag > AWS_REGION env > AWS_DEFAULT_REGION env > AWS CLI config +if [[ -z "$REGION" ]]; then + REGION="${AWS_REGION:-${AWS_DEFAULT_REGION:-$(aws configure get region 2>/dev/null || echo "")}}" +fi # Validate required values if [[ -z "$STACK_NAME" ]]; then @@ -123,6 +123,11 @@ if [[ -z "$STACK_NAME" ]]; then exit 1 fi +if [[ -z "$REGION" ]]; then + echo -e "${RED}Error: AWS region not found. Set AWS_REGION environment variable or configure via 'aws configure'.${NC}" + exit 1 +fi + # Get AWS account ID AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text 2>/dev/null) if [[ -z "$AWS_ACCOUNT_ID" ]]; then diff --git a/infra-terraform/terraform.tfvars.example b/infra-terraform/terraform.tfvars.example index bb59c9ff..3fe95485 100644 --- a/infra-terraform/terraform.tfvars.example +++ b/infra-terraform/terraform.tfvars.example @@ -5,6 +5,9 @@ # Terraform Configuration Example # ============================================================================= # Copy this file to terraform.tfvars and customize the values. +# +# Region: Set via AWS_REGION environment variable or AWS CLI profile. +# Tags: Add custom tags via the provider's default_tags block in main.tf. # ============================================================================= # ----------------------------------------------------------------------------- @@ -14,9 +17,6 @@ # Base name for all resources stack_name_base = "fast-demo-stack" -# AWS region for deployment -aws_region = "us-east-1" - # ----------------------------------------------------------------------------- # Optional: Admin User # ----------------------------------------------------------------------------- @@ -35,54 +35,30 @@ backend_pattern = "strands-single-agent" # Deployment type for AgentCore Runtime # "docker" - Container image via ECR (requires Docker + separate build step) -# Deployment: terraform apply → ./scripts/build-and-push-image.sh → terraform apply +# Deployment: terraform apply -> ./scripts/build-and-push-image.sh -> terraform apply # "zip" - Python package via S3 (no Docker required, single-step deploy) # Deployment: terraform apply (packages and deploys automatically) -deployment_type = "docker" - -# Name for the agent runtime -agent_name = "StrandsAgent" +backend_deployment_type = "docker" -# Network mode for AgentCore resources -# PUBLIC: Uses public internet (default) -# PRIVATE: Requires VPC configuration below -network_mode = "PUBLIC" +# Network mode for AgentCore Runtime +# PUBLIC: Runtime is accessible over the public internet (default) +# VPC: Runtime is deployed into a user-provided VPC for private network isolation +backend_network_mode = "PUBLIC" # ----------------------------------------------------------------------------- -# VPC Configuration (Required only if network_mode = "PRIVATE") -# ----------------------------------------------------------------------------- -# Uncomment and configure if using PRIVATE network mode - -# vpc_id = "vpc-xxxxxxxxxxxxxxxxx" -# private_subnet_ids = ["subnet-xxxxxxxxxxxxxxxxx", "subnet-yyyyyyyyyyyyyyyyy"] -# security_group_ids = ["sg-xxxxxxxxxxxxxxxxx"] - +# VPC Configuration (Required only if backend_network_mode = "VPC") # ----------------------------------------------------------------------------- -# Cognito Configuration -# ----------------------------------------------------------------------------- - -# OAuth callback URLs (localhost included for development) -callback_urls = ["http://localhost:3000", "https://localhost:3000"] - -# Minimum password length for Cognito User Pool -password_minimum_length = 8 - -# ----------------------------------------------------------------------------- -# Memory Configuration -# ----------------------------------------------------------------------------- - -# Number of days after which memory events expire (7-365) -memory_event_expiry_days = 30 - -# ----------------------------------------------------------------------------- -# Environment and Tagging -# ----------------------------------------------------------------------------- - -# Environment name -environment = "dev" - -# Additional tags to apply to all resources -tags = { - # Team = "YourTeam" - # CostCenter = "12345" -} +# Uncomment and configure if using VPC network mode. +# Your VPC must have the necessary VPC endpoints for AWS services. +# See README.md VPC Deployment Mode section for full requirements. +# +# Security groups are optional. If omitted, a default security group is created +# with HTTPS (443) self-referencing ingress and all-traffic egress. +# +# IMPORTANT: AgentCore Runtime is only available in specific Availability Zones +# per region. Ensure your subnets are in supported AZs. See: +# https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/agentcore-vpc.html#agentcore-supported-azs + +# backend_vpc_id = "vpc-xxxxxxxxxxxxxxxxx" +# backend_vpc_subnet_ids = ["subnet-xxxxxxxxxxxxxxxxx", "subnet-yyyyyyyyyyyyyyyyy"] +# backend_vpc_security_group_ids = ["sg-xxxxxxxxxxxxxxxxx"] # Optional diff --git a/infra-terraform/test-scripts/test-oauth2-auth.py b/infra-terraform/test-scripts/test-oauth2-auth.py new file mode 100755 index 00000000..2cbe1cad --- /dev/null +++ b/infra-terraform/test-scripts/test-oauth2-auth.py @@ -0,0 +1,271 @@ +#!/usr/bin/env python3 +""" +Test OAuth2 authentication for Terraform-deployed infrastructure. + +This script verifies that the OAuth2 Credential Provider is working correctly +by testing machine-to-machine authentication with the Gateway. + +USAGE: + python test-oauth2-auth.py + +PREREQUISITES: + 1. Terraform infrastructure must be deployed successfully + - Run: terraform apply + - Verify: terraform output shows gateway_url, cognito_domain_url, etc. + + 2. Required Python packages: + - boto3 (AWS SDK) + - requests (HTTP client) + Install: pip install boto3 requests + + 3. AWS credentials configured: + - AWS CLI configured (aws configure) OR + - Environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY) OR + - IAM role with permissions to: + - Read Secrets Manager secrets + - Query Terraform state + + 4. Current directory must be the terraform root (infra-terraform/) + - Script reads terraform outputs from current directory + - Run: cd infra-terraform && python test-scripts/test-oauth2-auth.py + +WHAT IT TESTS: + 1. Terraform outputs are accessible + - Retrieves: stack name, Cognito domain, machine client ID, gateway URL + + 2. Machine client secret retrieval from Secrets Manager + - Tests: secretsmanager:GetSecretValue permission + - Validates: Secret exists and is readable + + 3. OAuth2 token exchange with Cognito + - Flow: Client Credentials Grant (machine-to-machine) + - Tests: Cognito token endpoint responds correctly + - Validates: Access token is returned + + 4. Gateway authentication with OAuth2 token + - Tests: Gateway accepts Bearer token + - Validates: MCP tools/list request succeeds + - Confirms: OAuth2 Credential Provider integration works + +EXPECTED OUTPUT: + On success: + [PASS] OAuth2 Authentication Test PASSED + [x] OAuth2 token retrieved from Cognito + [x] Gateway authenticated successfully with token + [x] OAuth2 Credential Provider working correctly + + On failure: + [FAIL] Descriptive error message + Exit code: 1 + +TROUBLESHOOTING: + Error: "Failed to get Terraform output" + - Fix: Run from infra-terraform/ directory + - Fix: Ensure terraform apply completed successfully + + Error: "Failed to get secret" + - Fix: Check AWS credentials (aws sts get-caller-identity) + - Fix: Verify IAM permissions for Secrets Manager + + Error: "Failed to get OAuth2 token" + - Fix: Check Cognito User Pool and App Client exist + - Fix: Verify machine client secret is correct + + Error: "Gateway request failed" + - Fix: Verify Gateway URL is accessible + - Fix: Check Runtime is deployed and running + - Fix: Confirm OAuth2 Credential Provider is registered +""" + +import subprocess +import sys + +import boto3 +import requests + + +def run_command(cmd): + """Run shell command and return output.""" + result = subprocess.run(cmd, shell=True, capture_output=True, text=True) + return result.stdout.strip(), result.returncode + + +def get_terraform_output(key): + """Get Terraform output value.""" + output, code = run_command(f"terraform output -raw {key}") + if code != 0: + print(f"[FAIL] Failed to get Terraform output for '{key}'") + sys.exit(1) + return output + + +def get_secret(secret_name, region): + """Get secret from AWS Secrets Manager.""" + client = boto3.client("secretsmanager", region_name=region) + try: + response = client.get_secret_value(SecretId=secret_name) + return response["SecretString"] + except Exception as e: + print(f"[FAIL] Failed to get secret '{secret_name}': {e}") + sys.exit(1) + + +def test_oauth2_authentication(): + """ + Test OAuth2 authentication flow. + + This is the main test function that orchestrates the full authentication test: + 1. Get configuration from Terraform outputs + 2. Fetch machine client secret from AWS Secrets Manager + 3. Request OAuth2 token from Cognito (client credentials flow) + 4. Test Gateway with the OAuth2 token (MCP tools/list request) + """ + print("=" * 60) + print("OAuth2 Authentication Integration Test") + print("=" * 60) + print() + + # === PHASE 1: Get Configuration from Terraform === + # Terraform outputs contain all the URLs, IDs, and resource names we need + print("Getting configuration from Terraform...") + stack_name = get_terraform_output("ssm_parameter_prefix").lstrip("/") + region = "us-east-1" # From terraform state + cognito_domain = get_terraform_output("cognito_domain_url") + machine_client_id = get_terraform_output("cognito_machine_client_id") + gateway_url = get_terraform_output("gateway_url") + + print(f" Stack: {stack_name}") + print(f" Region: {region}") + print(f" Gateway URL: {gateway_url}") + print() + + # === PHASE 2: Retrieve Machine Client Secret === + # The machine client secret is stored in Secrets Manager (created by Terraform) + # This is the credential used for machine-to-machine authentication + print("Fetching machine client secret from Secrets Manager...") + secret_name = f"/{stack_name}/machine_client_secret" + machine_client_secret = get_secret(secret_name, region) + print(f" Secret retrieved: {secret_name}") + print() + + # === PHASE 3: OAuth2 Token Exchange with Cognito === + # Request an access token using the Client Credentials grant type + # This simulates what the Runtime does to authenticate with the Gateway + print("Step 1: Requesting OAuth2 token from Cognito...") + token_url = f"https://{cognito_domain}/oauth2/token" + + token_response = requests.post( + token_url, + data={ + "grant_type": "client_credentials", + "client_id": machine_client_id, + "client_secret": machine_client_secret, + }, + headers={"Content-Type": "application/x-www-form-urlencoded"}, + timeout=30, + ) + + if token_response.status_code != 200: + print(f"[FAIL] Failed to get OAuth2 token: {token_response.status_code}") + print(f" Response: {token_response.text}") + sys.exit(1) + + token_data = token_response.json() + access_token = token_data.get("access_token") + + if not access_token: + print("[FAIL] No access token in response") + print(f" Response: {token_data}") + sys.exit(1) + + print("[PASS] OAuth2 token received successfully") + print(f" Token type: {token_data.get('token_type')}") + print(f" Expires in: {token_data.get('expires_in')} seconds") + print() + + # === PHASE 4: Test Gateway Authentication === + # Send an MCP request to the Gateway using the OAuth2 token + # This validates the entire OAuth2 Credential Provider flow: + # - Gateway receives Bearer token + # - Gateway validates token with Cognito + # - Runtime uses OAuth2 Credential Provider to authenticate with Gateway + print("Step 2: Testing Gateway with OAuth2 token...") + + # Create a test MCP request (tools/list is a simple read-only operation) + mcp_request = { + "jsonrpc": "2.0", + "id": 1, + "method": "tools/list", + } + + gateway_response = requests.post( + gateway_url, + json=mcp_request, + headers={ + "Authorization": f"Bearer {access_token}", + "Content-Type": "application/json", + }, + timeout=30, + ) + + if gateway_response.status_code != 200: + print(f"[FAIL] Gateway request failed: {gateway_response.status_code}") + print(f" Response: {gateway_response.text}") + sys.exit(1) + + gateway_data = gateway_response.json() + + if "error" in gateway_data: + print(f"[FAIL] Gateway returned error: {gateway_data['error']}") + sys.exit(1) + + print("[PASS] Gateway authentication successful") + print(f" Available tools: {len(gateway_data.get('result', {}).get('tools', []))}") + + tools = gateway_data.get("result", {}).get("tools", []) + if tools: + print(" Tools:") + for tool in tools: + print(f" - {tool.get('name')}: {tool.get('description', 'N/A')}") + print() + + # Summary + print("=" * 60) + print("[PASS] OAuth2 Authentication Test PASSED") + print("=" * 60) + print() + print("[x] OAuth2 token retrieved from Cognito") + print("[x] Gateway authenticated successfully with token") + print("[x] OAuth2 Credential Provider working correctly") + print() + + +if __name__ == "__main__": + """ + Main entry point for the test script. + + USAGE EXAMPLES: + # Run from terraform directory + cd infra-terraform + python test-scripts/test-oauth2-auth.py + + # Run with verbose AWS debugging (if needed) + export AWS_DEFAULT_REGION=us-east-1 + export BOTO_LOG_LEVEL=DEBUG + python test-scripts/test-oauth2-auth.py + + # Check exit code in scripts + python test-scripts/test-oauth2-auth.py + if [ $? -eq 0 ]; then echo "Tests passed"; fi + """ + try: + test_oauth2_authentication() + except KeyboardInterrupt: + print("\n\n[FAIL] Test interrupted by user") + sys.exit(1) + except Exception as e: + print(f"\n\n[FAIL] Test failed with error: {e}") + import traceback + + traceback.print_exc() + sys.exit(1) diff --git a/infra-terraform/variables.tf b/infra-terraform/variables.tf index 2cd79d48..801b88d3 100644 --- a/infra-terraform/variables.tf +++ b/infra-terraform/variables.tf @@ -10,19 +10,8 @@ variable "stack_name_base" { type = string validation { - condition = can(regex("^[a-z][a-z0-9-]{2,62}$", var.stack_name_base)) - error_message = "Stack name must start with a lowercase letter, be 3-63 characters, and contain only lowercase alphanumeric characters and hyphens." - } -} - -variable "aws_region" { - description = "AWS region for deployment." - type = string - default = "us-east-1" - - validation { - condition = can(regex("^[a-z]{2}-[a-z]+-\\d$", var.aws_region)) - error_message = "Must be a valid AWS region (e.g., us-east-1, eu-west-1)." + condition = can(regex("^[a-z][a-z0-9-]{2,34}$", var.stack_name_base)) + error_message = "Stack name must start with a lowercase letter, be 3-35 characters, and contain only lowercase alphanumeric characters and hyphens." } } @@ -56,114 +45,47 @@ variable "backend_pattern" { } } -variable "deployment_type" { +variable "backend_deployment_type" { description = "Deployment type for AgentCore Runtime. 'docker' uses ECR container image (requires Docker + separate build step). 'zip' uses S3 Python package (no Docker required, single-step deploy)." type = string default = "docker" validation { - condition = contains(["docker", "zip"], var.deployment_type) + condition = contains(["docker", "zip"], var.backend_deployment_type) error_message = "Deployment type must be 'docker' or 'zip'." } } -variable "agent_name" { - description = "Name for the agent runtime." - type = string - default = "StrandsAgent" - - validation { - condition = can(regex("^[a-zA-Z][a-zA-Z0-9_]{1,62}$", var.agent_name)) - error_message = "Agent name must start with a letter, be 2-63 characters, and contain only alphanumeric characters and underscores." - } -} - -variable "network_mode" { - description = "Network mode for AgentCore resources. PUBLIC uses public internet, PRIVATE requires VPC configuration." +variable "backend_network_mode" { + description = "Network mode for AgentCore Runtime. PUBLIC (default) uses public internet. VPC deploys into a user-provided VPC for private network isolation." type = string default = "PUBLIC" validation { - condition = contains(["PUBLIC", "PRIVATE"], var.network_mode) - error_message = "Network mode must be PUBLIC or PRIVATE." + condition = contains(["PUBLIC", "VPC"], var.backend_network_mode) + error_message = "Network mode must be 'PUBLIC' or 'VPC'." } } # ============================================================================= -# VPC Configuration (Required if network_mode = PRIVATE) +# VPC Configuration (Required if backend_network_mode = VPC) # ============================================================================= -variable "vpc_id" { - description = "VPC ID for private network mode. Required if network_mode is PRIVATE." +variable "backend_vpc_id" { + description = "VPC ID for VPC network mode. Required when backend_network_mode is 'VPC'." type = string default = null } -variable "private_subnet_ids" { - description = "List of private subnet IDs for private network mode. Required if network_mode is PRIVATE." +variable "backend_vpc_subnet_ids" { + description = "List of subnet IDs for VPC network mode. Required when backend_network_mode is 'VPC'. Subnets should be in at least two Availability Zones." type = list(string) default = [] } -variable "security_group_ids" { - description = "List of security group IDs for private network mode. Required if network_mode is PRIVATE." +variable "backend_vpc_security_group_ids" { + description = "List of security group IDs for VPC network mode. Optional when backend_network_mode is 'VPC'. If omitted, a default security group is created with HTTPS self-referencing ingress and all-traffic egress." type = list(string) default = [] } -# ============================================================================= -# Cognito Configuration -# ============================================================================= - -variable "callback_urls" { - description = "OAuth callback URLs for Cognito. Defaults include localhost for development." - type = list(string) - default = ["http://localhost:3000", "https://localhost:3000"] -} - -variable "password_minimum_length" { - description = "Minimum password length for Cognito User Pool." - type = number - default = 8 - - validation { - condition = var.password_minimum_length >= 8 && var.password_minimum_length <= 99 - error_message = "Password minimum length must be between 8 and 99." - } -} - -# ============================================================================= -# Memory Configuration -# ============================================================================= - -variable "memory_event_expiry_days" { - description = "Number of days after which memory events expire. Must be between 7 and 365." - type = number - default = 30 - - validation { - condition = var.memory_event_expiry_days >= 7 && var.memory_event_expiry_days <= 365 - error_message = "Memory event expiry must be between 7 and 365 days." - } -} - -# ============================================================================= -# Tagging -# ============================================================================= - -variable "tags" { - description = "Additional tags to apply to all resources." - type = map(string) - default = {} -} - -variable "environment" { - description = "Environment name (e.g., dev, staging, prod)." - type = string - default = "dev" - - validation { - condition = contains(["dev", "staging", "prod", "test"], var.environment) - error_message = "Environment must be one of: dev, staging, prod, test." - } -} diff --git a/infra-terraform/versions.tf b/infra-terraform/versions.tf index 6c7c6b2b..89e69d00 100644 --- a/infra-terraform/versions.tf +++ b/infra-terraform/versions.tf @@ -7,7 +7,7 @@ terraform { required_providers { aws = { source = "hashicorp/aws" - version = ">= 5.82.0" + version = ">= 6.35.1" } random = { source = "hashicorp/random"