Skip to content

Docs: Improve modules/perforce/README.md: Add context, architecture decisions, multi-region guidance, and costs #784

@gabebatista

Description

@gabebatista

What were you searching in the docs?

Context on why Perforce is used for game development, what the module components are, architecture decisions, cost estimates, and production considerations.

Is this related to an existing documentation section?

modules/perforce/README.md - proposing new sections to add context around the existing technical documentation.

How can we improve?

The Perforce README has good technical documentation but lacks context on WHY Perforce is used for games (vs Git), WHAT each module component does, WHY specific architecture decisions were made, WHAT the costs are, and WHAT to consider for production. Adding these sections helps users make informed decisions and justify the architecture.

Got a suggestion in mind?

1. Add "Why Perforce for Game Development?" Section (After opening paragraph)

## Why Perforce for Game Development?

Perforce (Helix Core) is the industry standard for game asset version control because:

**vs Git/GitLab**:
- **Large binary file handling**: Optimized for multi-GB files (textures, models, audio)
- **Exclusive file locking**: Prevents merge conflicts on binary files
- **Selective sync**: Developers download only needed assets, not entire repository
- **Performance at scale**: Handles terabyte-sized repositories with 1000+ users

**Real-World Scale**:
- Major AAA studios use Perforce for 10-50 TB repositories
- Supports 1000+ concurrent users per server
- Handles files 10+ GB (cinematic videos, high-res textures)

**Best For**:
- Game development teams (any size)
- Projects with large binary assets (3D models, textures, audio, video)
- Teams needing exclusive file locking (binary files that can't be merged)

**Not Ideal For**:
- Text-only codebases (Git is better)
- Small teams with only source code (<10 developers, no binary assets)

2. Add "Module Components" Section (After "Why Perforce for Game Development?")

## Module Components

This module is composed of several components that can be deployed independently or together:

### P4 Server (Required)
**What it does**: The core Perforce Helix Core version control server

**Deployed as**: EC2 instance with EFS for metadata

**Storage Options**:
- **EBS volumes**: Default option, good for most deployments
- **FSx ONTAP**: Optional, provides advanced features (deduplication, high IOPS, snapshots) for large-scale deployments

**Use this for**:
- Version control for game assets and source code
- File locking to prevent merge conflicts on binary files
- Workspace management for developers

**Deploy alone if**: You only need basic version control with username/password authentication

### P4 Auth (Optional)
**What it does**: Helix Authentication Service for SSO integration

**Deployed as**: ECS Fargate service

**Use this for**:
- Single sign-on (SSO) with corporate identity providers
- SAML or OIDC authentication (Okta, Azure AD, Auth0, AWS Cognito)
- Centralized user management

**Deploy if**: You want to integrate Perforce authentication with your existing identity provider instead of managing Perforce users separately

### P4 Code Review (Optional)
**What it does**: Helix Swarm for code review workflows

**Deployed as**: ECS Fargate service with RDS database and ALB

**Use this for**:
- Pre-submit code reviews (like GitHub Pull Requests)
- Visual diff viewing in web browser
- Review approval workflows before submitting to mainline

**Deploy if**: You want formal code review processes for your team

### Deployment Combinations

| Configuration | Components | Best For |
|---------------|-----------|----------|
| **Minimal** | P4 Server only (EBS) | Small teams, getting started |
| **High-performance** | P4 Server (FSx ONTAP) | Large repositories, need deduplication |
| **SSO-enabled** | P4 Server + P4 Auth | Teams with existing identity providers |
| **Full-featured** | P4 Server + P4 Auth + P4 Code Review | Teams wanting SSO and code review workflows |

3. Add "Architecture Decisions" Section (After architecture diagram)

## Architecture Decisions

### Why Separate P4 Server, P4 Auth, and P4 Code Review?

**Modularity**: Deploy only what you need
- **P4 Server only**: Minimal version control (no SSO, no code review)
- **P4 Server + P4 Auth**: Add SSO authentication (SAML, OIDC)
- **Full stack**: Add code review with Helix Swarm

**Scalability**: Components scale independently
- **P4 Server**: Vertical scaling (larger EC2 instance for more users)
- **P4 Code Review**: Horizontal scaling (ECS tasks scale based on load)
- **P4 Auth**: Lightweight, minimal resources needed

### Why EFS for Perforce Metadata?

- **Persistence**: Survives EC2 instance termination
- **Snapshots**: EFS backup integration for disaster recovery
- **Performance**: Low-latency access for metadata operations

**Alternative Considered**: EBS volumes
**Why Not Used**: EBS requires manual snapshot management, doesn't support multi-AZ natively

### Storage Options: EBS vs FSx ONTAP

**EBS Volumes (Default)**:
- **Pros**: Lower cost, simpler setup, good performance for most teams
- **Cons**: Limited to 16 TB per volume, no built-in deduplication
- **Best for**: Teams with <5 TB repositories, standard performance needs

**FSx ONTAP (Optional)**:
- **Pros**: Petabyte-scale, deduplication (saves on similar assets), high IOPS, snapshots
- **Cons**: Higher cost, more complex setup
- **Best for**: Large repositories (>5 TB), need deduplication, high-performance requirements

### Why ALB for Helix Swarm Instead of NLB?

- **Path-based routing**: Multiple services behind single ALB (future extensibility)
- **HTTPS termination**: ALB handles SSL/TLS, Swarm container runs HTTP
- **WebSockets**: ALB supports WebSocket for real-time updates in Swarm UI

**Alternative Considered**: NLB
**Why Not Used**: Requires TLS configuration in Swarm container, no path-based routing

### Why Private Subnets for P4 Server?

- **Security**: No direct internet exposure for version control server
- **Compliance**: Meets requirements for protecting source code and assets
- **Controlled access**: Users connect via VPN, Direct Connect, or Client VPN

**Alternative Considered**: Public subnets with security group restrictions
**Why Not Used**: Increases attack surface, violates least-privilege principle

4. Add Cost Estimation Section (Before "Getting Started")

## Cost Considerations

⚠️ **Perforce infrastructure costs vary significantly based on team size, storage option, and repository size.**

### Cost Breakdown (Single Region, us-east-1)

| Component | Configuration | Notes |
|-----------|---------------|-------|
| **EC2 (P4 Server)** | c5.2xlarge or larger, 24/7 | CPU-intensive for metadata operations |
| **Storage (EBS)** | gp3 volumes, scales with repo size | Default option - lower cost |
| **Storage (FSx ONTAP)** | Scales with asset size | Optional - higher cost but advanced features |
| **EFS (Metadata)** | 100GB+ storage | Perforce metadata and journals |
| **ECS Fargate (Swarm)** | 2 vCPU, 4GB RAM, 24/7 | Code review service (optional) |
| **Application Load Balancer** | 1 ALB, 24/7 | HTTPS termination for Swarm (if used) |
| **RDS (Swarm DB)** | db.t3.medium or larger | PostgreSQL for Swarm (if used) |
| **Data Transfer** | Variable based on sync patterns | Client syncs |
| **CloudWatch Logs** | 10GB+ ingested | Log storage |

### Cost Factors

**Primary cost drivers**: 
- **Storage choice**: EBS (lower cost) vs FSx ONTAP (higher cost with advanced features)
- **Repository size**: Costs scale with TB stored
- **Optional components**: P4 Auth and P4 Code Review add infrastructure costs

### Cost Optimization

1. **Choose appropriate storage**:
   - Start with EBS for most deployments (lower cost)
   - Upgrade to FSx ONTAP only if you need deduplication or >16 TB volumes

2. **Right-size storage**:
   - Audit current depot size before deployment
   - If using FSx ONTAP, enable deduplication (can save on similar assets)

3. **Implement retention policies**:
   - Archive old projects to S3 Glacier for long-term storage
   - Use \`p4 obliterate\` for truly obsolete data (irreversible)

4. **Monitor unused depots**:
   - Identify depots not accessed recently
   - Consider archiving to reduce active storage costs

5. **Stop non-production servers**:
   - Dev/test Perforce servers can be stopped outside business hours

**Use [AWS Pricing Calculator](https://calculator.aws) for accurate estimates based on your specific requirements**.

5. Add "Production Considerations" Section (Before "Getting Started")

## Production Considerations

When preparing to deploy this module in a production environment, consider the following:

### Security
- Review and restrict network access to P4 server (VPN, Direct Connect, or specific IP ranges)
- Enable MFA for AWS IAM users with access to Perforce infrastructure
- Configure VPC Flow Logs for network traffic auditing
- Implement regular password rotation for Perforce users
- Enable CloudTrail for API activity logging
- Review security groups to ensure least-privilege access

### High Availability & Disaster Recovery
- Configure automated EFS backups for metadata
- Set up automated snapshots for EBS/FSx volumes (depot data)
- Test restoration procedures from backups
- Document checkpoint and journal recovery procedures
- Consider standby replica server for critical deployments
- Plan for EC2 instance failure recovery

### Monitoring & Observability
- Set up CloudWatch alarms for critical metrics (CPU, memory, disk I/O)
- Configure billing alerts for unexpected cost increases
- Monitor Perforce server logs for errors and performance issues
- Track sync performance and file access patterns
- Set up alerting for failed submits or replication issues
- Monitor FSx/EBS performance metrics

### Performance
- Right-size EC2 instance based on user count and workload
- Monitor and optimize Perforce db.* configurables for your workload
- Consider storage performance (IOPS, throughput) for large binary assets
- Review and optimize depot structure and file organization
- Plan for growth in repository size and user count

### Backup & Data Protection
- Establish checkpoint frequency based on change rate
- Store backups in separate AWS region for disaster recovery
- Test backup restoration procedures regularly
- Document data retention policies
- Configure S3 lifecycle policies for archived data

### Operations
- Document procedures for common operations (adding users, creating depots, managing permissions)
- Establish Perforce server upgrade and patching schedule
- Train operations team on Perforce administration and AWS Console access
- Define incident response procedures for server failures
- Plan for scaling storage as repository grows

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentationperforce

    Type

    No type

    Projects

    Status

    Ready

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions