Skip to content

Comments

feat(perforce/p4-code-review): Migrate from ECS to EC2#852

Open
gabebatista wants to merge 7 commits intomainfrom
gabeaws/perforce/swarm-native-ec2
Open

feat(perforce/p4-code-review): Migrate from ECS to EC2#852
gabebatista wants to merge 7 commits intomainfrom
gabeaws/perforce/swarm-native-ec2

Conversation

@gabebatista
Copy link
Contributor

@gabebatista gabebatista commented Jan 6, 2026

Issue number: N/A

Summary

Changes

This PR migrates the P4 Code Review (Helix Swarm) module from a containerized ECS/Fargate deployment to a native EC2 Auto Scaling Group deployment with persistent storage.

Key changes:

  • Packer Template: New AMI builder for Ubuntu 24.04 LTS (officially supported by Perforce until 2029), replacing the unsupported Amazon Linux 2023
  • Terraform Module: Complete infrastructure refactor from ECS to EC2 Auto Scaling with:
    • Persistent EBS volume support with automatic attachment on instance replacement
    • Self-healing infrastructure via Auto Scaling Group
    • Native DEB package installation instead of containers
    • SSM Session Manager access (replacing SSH)
    • Comprehensive user-data script for runtime configuration
  • P4 Server Integration:
    • Dual super user approach - always creates a super user for Swarm extension compatibility, plus optional custom super user
    • unlimited_timeout group for service accounts to prevent ticket expiration
    • Fixed Swarm extension configuration by establishing P4 trust and authentication before modifying extension settings
  • Documentation:
    • Enhanced Packer README with architecture details, runtime configuration, and troubleshooting guides
    • Updated p4-code-review module README to reflect EC2 architecture (removed ECS/Fargate/Docker references)
    • Updated variable descriptions in parent perforce module for EC2-based fields
  • Example: Updated to remove deprecated ECS-related variables

User experience

In the existing implementation, when the p4cr container is restarted jobs and queue data is lost due to the lack of Swarm-compatible persistent storage. The new implementation uses EC2 in place of ECS/Fargate, allowing for EBS volumes to handle persistent storage for /opt/perforce/swarm/data.

Files Changed

Area Files Description
Packer assets/packer/perforce/p4-code-review/* New AMI builder for Ubuntu 24.04 with Swarm
Packer assets/packer/perforce/p4-server/p4_configure.sh Dual super user + unlimited_timeout group
Terraform modules/perforce/modules/p4-code-review/* EC2 ASG, EBS, launch template, user-data
Terraform modules/perforce/sg.tf Security group rules for P4 Server ↔ Swarm
Terraform modules/perforce/variables.tf Updated p4_code_review_config descriptions
Example modules/perforce/examples/create-resources-complete/main.tf Removed ECS variables

How to Test

Prerequisites

  1. AWS account with appropriate permissions
  2. Packer installed (>= 1.8.0)
  3. Terraform installed (>= 1.0)
  4. A Route53 hosted zone for DNS records
  5. An ACM certificate for HTTPS

Step 1: Build the P4 Code Review AMI

cd assets/packer/perforce/p4-code-review
packer init p4_code_review_x86.pkr.hcl
packer build p4_code_review_x86.pkr.hcl

Note the AMI ID from the output (e.g., ami-0abc123def456789).

Step 2: Build the P4 Server AMI (if not already available)

cd assets/packer/perforce/p4-server
packer init p4_al2023_x86.pkr.hcl
packer build p4_al2023_x86.pkr.hcl

Step 3: Deploy the Infrastructure

Use the example configuration or create your own:

cd modules/perforce/examples/create-resources-complete

# Create terraform.tfvars with your values
cat > terraform.tfvars <<TFVARS
project_prefix = "test"
certificate_arn = "arn:aws:acm:us-east-1:ACCOUNT:certificate/CERT-ID"
route53_public_hosted_zone_id = "Z0123456789ABCDEFGHIJ"
fully_qualified_domain_name = "perforce.example.com"
TFVARS

terraform init
terraform apply

Step 4: Verify the Deployment

  1. P4 Server: Connect using P4V or p4 CLI

    p4 -p ssl:perforce.example.com:1666 info
  2. P4 Code Review (Swarm):

    • Navigate to https://review.perforce.example.com
    • Log in with a Perforce user
    • Verify changes are visible and queue workers are running
  3. Verify Swarm Extension (via SSM on P4 Server):

    # Check extension is installed
    p4 extension --list --type extensions
    
    # Verify Swarm-Secure is set correctly
    p4 extension --configure Perforce::helix-swarm -o | grep Swarm-Secure
    # Should show: Swarm-Secure: false
  4. Test Instance Recovery:

    • Terminate the Swarm EC2 instance from AWS Console
    • Verify Auto Scaling Group launches a new instance
    • Confirm Swarm is accessible again with data intact (EBS volume reattaches)

Expected Outcomes

  • ✅ P4 Server accepts SSL connections on port 1666
  • ✅ Swarm web UI loads and shows P4 changes
  • ✅ Queue workers are running (check via Swarm API: curl https://review.example.com/api/v10/queue/workers)
  • ✅ Swarm extension installed on P4 Server with Swarm-Secure: false
  • ✅ Data persists across instance replacement

Checklist

  • I have performed a self-review of this change
  • Changes have been tested
  • Changes are documented
Is this a breaking change?

Yes, this is a breaking change.

Users currently running P4 Code Review using the ECS-based module will need to:

  1. Build new AMIs using the updated Packer template
  2. Migrate their data from ECS containers to the new EBS-backed EC2 instances
  3. Update their Terraform configurations to use the new EC2-based variables

Migration guide and detailed documentation have been included in the module README.

Acknowledgment

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Disclaimer: We value your time and bandwidth. As such, any pull requests created might not be successful.

@gabebatista gabebatista force-pushed the gabeaws/perforce/swarm-native-ec2 branch from ac4448c to a4511d0 Compare January 8, 2026 16:03
@github-actions
Copy link
Contributor

github-actions bot commented Jan 8, 2026

📚 Documentation Preview

Preview deployed successfully!

🔗 Preview URL: https://aws-games.github.io/cloud-game-development-toolkit/preview-pr-852/

🔒 Maintainer Action Required

The preview requires approval before it's accessible. A maintainer must approve the GitHub Pages deployment in the Environments section.

Once approved, the preview will be accessible within 1-2 minutes.

Build Information

  • Status: ✅ Deployed (awaiting approval)
  • Commit: f7e56004a34c9a1129182a0232e01df90f989b3a
  • Workflow Run: #114

This preview will be automatically deleted when the PR is merged or closed.

@gabebatista gabebatista force-pushed the gabeaws/perforce/swarm-native-ec2 branch 2 times, most recently from a22cbad to e4afb3e Compare January 8, 2026 18:08
@gabebatista gabebatista force-pushed the gabeaws/perforce/swarm-native-ec2 branch 2 times, most recently from 84b8793 to 4766a91 Compare January 15, 2026 04:19
@gabebatista gabebatista marked this pull request as ready for review January 15, 2026 14:26
@gabebatista gabebatista requested a review from a team as a code owner January 15, 2026 14:26
@gabebatista gabebatista force-pushed the gabeaws/perforce/swarm-native-ec2 branch from 4766a91 to 180cd6b Compare January 15, 2026 14:55
Copy link
Contributor

@hwkiem hwkiem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plenty of comments and some minor changes requested or clarification needed. Also, can you change the name of the PR to conventional commit if the plan is to squash and merge? or is the plan to rebase and merge?

@gabebatista gabebatista force-pushed the gabeaws/perforce/swarm-native-ec2 branch from 180cd6b to 759ea6e Compare January 23, 2026 15:20
@gabebatista gabebatista changed the title p4cr ECS/Fargate to EC2 feat(perforce/p4-code-review): Migrate from ECS to EC2 Jan 23, 2026
@gabebatista gabebatista force-pushed the gabeaws/perforce/swarm-native-ec2 branch from 759ea6e to ebfb57c Compare February 10, 2026 16:31
@hwkiem
Copy link
Contributor

hwkiem commented Feb 10, 2026

⚠️ Potential Data Corruption Risk

I've identified a critical issue in modules/perforce/modules/p4-code-review/user-data.sh.tpl (lines 85-93) that could lead to data corruption.

The Issue

The force detach operation could potentially cause data corruption if the previous instance is still writing to the volume:

aws ec2 detach-volume \
    --region "$REGION" \
    --volume-id "$VOLUME_ID" \
    --force 2>&1 | tee -a /tmp/swarm-setup.log || log "Warning: Force detach may have failed"

Recommended Fix

Add an instance state check before force detaching to ensure the previous instance is truly terminated:

# Check if the attached instance is terminated before force detaching
if [ "$ATTACHED_INSTANCE" != "none" ] && [ "$ATTACHED_INSTANCE" != "null" ]; then
    INSTANCE_STATE=$(aws ec2 describe-instances \
        --region "$REGION" \
        --instance-ids "$ATTACHED_INSTANCE" \
        --query 'Reservations[0].Instances[0].State.Name' \
        --output text 2>/dev/null || echo "unknown")
    
    if [ "$INSTANCE_STATE" = "terminated" ] || [ "$INSTANCE_STATE" = "unknown" ]; then
        log "Previous instance $ATTACHED_INSTANCE is terminated/unknown, safe to force detach"
        aws ec2 detach-volume --region "$REGION" --volume-id "$VOLUME_ID" --force
    else
        log "ERROR: Volume attached to running instance $ATTACHED_INSTANCE (state: $INSTANCE_STATE)"
        exit 1
    fi
fi

This ensures we only force detach from terminated instances, preventing potential corruption of the Swarm data volume.

@hwkiem
Copy link
Contributor

hwkiem commented Feb 10, 2026

Only other feedback - lets review the documentation. It built and deployed a preview to GH pages (see docs comment near the top of this PR). The architecture diagram is outdated.

@gabebatista gabebatista force-pushed the gabeaws/perforce/swarm-native-ec2 branch 3 times, most recently from e8f2f69 to 533950c Compare February 16, 2026 19:21
@gabebatista gabebatista requested a review from hwkiem February 16, 2026 23:34
@gabebatista gabebatista force-pushed the gabeaws/perforce/swarm-native-ec2 branch from 0b3e700 to 7fe2360 Compare February 20, 2026 17:28
@github-actions
Copy link
Contributor

Terraform validation failed for modules/perforce/examples/create-resources-complete

View detailed logs: Workflow run

@github-actions
Copy link
Contributor

Terraform validation failed for modules/perforce/modules/p4-code-review

View detailed logs: Workflow run

@github-actions
Copy link
Contributor

Terraform validation failed for modules/perforce

View detailed logs: Workflow run

@gabebatista gabebatista force-pushed the gabeaws/perforce/swarm-native-ec2 branch from 345d288 to 2bd2966 Compare February 20, 2026 19:24
Copy link
Contributor

@hwkiem hwkiem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Add Packer template and configuration scripts for building P4 Code Review
(Helix Swarm) AMI on Ubuntu 24.04 LTS.

Key components:
- p4_code_review_x86.pkr.hcl: Packer template for x86_64 AMI
- swarm_setup.sh: Initial Swarm package installation
- swarm_configure.sh: Runtime configuration script for:
  - P4 server connection with SSL trust and authentication
  - Redis cache configuration
  - SSO/SAML setup
  - Queue worker configuration
  - Swarm extension Swarm-Secure setting
…caling

Refactor P4 Code Review (Helix Swarm) deployment from ECS Fargate to native
EC2 with Auto Scaling Group for improved performance and simpler operations.

Key changes:
- Replace ECS task definition with EC2 launch template
- Add Auto Scaling Group (min=1, max=1) for automatic instance recovery
- Add persistent EBS volume for Swarm data directory
- Add user-data script for volume attachment and Swarm configuration
- Update security groups for EC2-based deployment
- Add ALB target group for health checks
- Support for custom config.php injection via Secrets Manager
Update the create-resources-complete example to use the new EC2-based
P4 Code Review module configuration, removing deprecated ECS-specific
variables.
… compatibility

Implement a dual super user approach to support Swarm extension installation
while allowing custom super user configuration.

Key changes:
- Always create 'super' user first for Swarm extension compatibility
- Create custom super user (if specified) and grant super privileges
- Add 'unlimited_timeout' group for service integrations
- Both users added to unlimited_timeout group to prevent ticket expiration
- Update variables for super user configuration
… 24.04

Consolidate Swarm authentication to use the super user for both runtime
operations (-u) and admin tasks (-U). This simplifies credential management
and ensures compatibility with all authentication configurations (SSO,
standard password, etc.).

Changes:
- Use super user for both configure-swarm.sh -u and -U parameters
- Ensure super user is standard type (not service account) for p4 protects validation
- Remove unused Swarm user credential variables from Terraform modules
- Pin P4 Code Review AMI to Ubuntu 24.04 LTS (helix-swarm-optional requires ImageMagick 6)
- Update README prerequisites to reflect simplified credential setup
Add wait_for_apt function that polls for the dpkg lock to be released
before running apt-get commands. This fixes intermittent build failures
caused by unattended-upgrades holding the lock after instance boot.
…ndition

Wait for the previous instance to terminate before detaching the EBS
volume. ASG launches new instances before terminating old ones, which
can cause the new instance to fail when trying to attach the volume
while the old instance is still running or shutting down.

The script now waits up to 5 minutes for the old instance to reach
terminated state before proceeding with force detach.
@hwkiem hwkiem force-pushed the gabeaws/perforce/swarm-native-ec2 branch from 2bd2966 to f7e5600 Compare February 20, 2026 23:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants