Skip to content

Docs: Improve modules/unreal/horde/README.md: Add context, comparison, architecture decisions, and costs #787

@gabebatista

Description

@gabebatista

What were you searching in the docs?

Context on why to use Horde, when to choose Horde vs Jenkins/TeamCity, architecture decisions, cost estimates, and production considerations.

Is this related to an existing documentation section?

modules/unreal/horde/README.md - proposing new sections to add context around the existing technical documentation.

How can we improve?

The Horde README has good technical documentation but lacks context on WHY to use Horde, WHEN to choose it over Jenkins/TeamCity, WHY specific architecture decisions were made, WHAT the costs are, and WHAT to consider for production. Adding these sections helps Unreal teams make informed decisions.

Got a suggestion in mind?

1. Add "Why Horde for Unreal Engine?" Section (After opening paragraph)

## Why Horde for Unreal Engine?

Horde is Epic Games' build orchestration system specifically designed for Unreal Engine projects.

**vs Jenkins/TeamCity (Generic CI/CD)**:
- **Native Unreal integration**: Built by Epic Games for Unreal workflows
- **Build graph visualization**: Visual representation of Unreal build dependencies
- **Unreal-optimized**: Understands Unreal project structure and build processes
- **Epic support**: Direct integration with Epic's tooling and documentation

**Best For**:
- Pure Unreal Engine projects
- Teams following Epic's recommended build practices
- Projects leveraging Unreal Build Graph extensively
- Studios wanting Epic's official build solution

**Consider Jenkins/TeamCity instead if**:
- Studio has multiple projects on different engines (need one CI/CD for all projects)
- Existing Jenkins/TeamCity expertise and infrastructure
- Need to build non-game software alongside game (backend services, tools, websites)
- Require extensive customization beyond Unreal workflows

2. Add "Horde vs Jenkins/TeamCity" Comparison (After "Why Horde?")

## Horde vs Jenkins/TeamCity

| Factor | Horde (This Module) | Jenkins/TeamCity Modules |
|--------|---------------------|-------------------------|
| **Unreal integration** | Native (Epic-built) | Requires plugins/configuration |
| **Build graph visualization** | Excellent Unreal-specific views | Basic/generic views |
| **Unreal workflow** | Optimized for UE projects | Generic CI/CD adapted to UE |
| **Learning curve** | Unreal-specific knowledge needed | Industry-standard CI/CD patterns |
| **Flexibility** | Focused on Unreal builds | Supports any build workflow |
| **Community support** | Unreal community | Broad CI/CD community |
| **Best for** | Pure Unreal projects | Studios with diverse build needs |

**Choose Horde if**:
- ✅ Pure Unreal Engine project (only building Unreal game)
- ✅ Team familiar with Unreal tooling and build graph
- ✅ Want Epic's recommended build solution
- ✅ Need excellent Unreal build visualization

**Choose Jenkins/TeamCity if**:
- ✅ Studio has multiple projects on different engines (Unity project, Unreal project, etc.)
- ✅ Existing Jenkins/TeamCity expertise
- ✅ Need to build non-game software (backend services, tools, websites)
- ✅ Require extensive CI/CD customization beyond game builds

**Note**: You can use both - Horde for Unreal project, Jenkins/TeamCity for other projects/services.

3. Add "Architecture Decisions" Section (After architecture diagram)

## Architecture Decisions

### What is Horde?

Horde consists of two main components:

**Horde Server**: 
- Orchestrates builds and manages agent pools
- Provides web UI for build monitoring and configuration
- Stores build metadata and logs
- Deployed as ECS Fargate service

**Horde Agents**: 
- Execute Unreal build tasks (compile, cook, package)
- Run on EC2 instances with sufficient CPU/memory for Unreal builds
- Auto-scale based on build queue depth

### Why ECS Fargate for Horde Server?
- **No server management**: AWS handles underlying compute infrastructure
- **Cost efficiency**: Pay only when builds are running
- **High availability**: Multi-AZ deployment with automatic failover

**Alternative Considered**: EC2 instance for server
**Why Not Used**: Fargate eliminates server management overhead

### Why EC2 for Horde Agents (Not Fargate)?
- **Resource requirements**: Unreal builds need powerful instances (c5.4xlarge or larger)
- **Long-running tasks**: Unreal builds often run 30-120 minutes
- **Spot instance support**: 70% cost savings acceptable for build agents

**Alternative Considered**: ECS Fargate for agents
**Why Not Used**: Fargate has limits on CPU/memory, less cost-effective for long-running tasks

### Why S3 for Artifact Storage?
- **Scalability**: Handles large build outputs (GB to TB)
- **Durability**: 99.999999999% durability for build artifacts
- **Cost-effective**: Lower cost than EBS/EFS for large artifacts
- **Integration**: Native integration with Unreal build tools

**Alternative Considered**: EFS for artifacts
**Why Not Used**: EFS more expensive for large-scale artifact storage

### Why RDS for Horde Database?
- **Managed service**: AWS handles backups, patching, high availability
- **Performance**: Dedicated database instance for Horde metadata
- **Multi-AZ**: Automatic failover for production deployments

**Alternative Considered**: Database on ECS container
**Why Not Used**: Not recommended for production, difficult backup/restore

4. Add Cost Estimation Section (Before "Getting Started")

## Cost Considerations

⚠️ **Horde infrastructure costs vary significantly based on build frequency and project complexity.**

### Cost Breakdown (us-east-1)

| Component | Configuration | Notes |
|-----------|---------------|-------|
| **ECS Fargate (Server)** | 2 vCPU, 4GB RAM, 24/7 | Horde orchestration server |
| **EC2 Build Agents** | c5.4xlarge or larger, variable | Primary cost driver - scales with build frequency |
| **RDS Database** | db.t3.medium or larger | Horde metadata storage |
| **S3 Storage** | Scales with artifact retention | Build outputs and artifacts |
| **Application Load Balancer** | 1 ALB, 24/7 | HTTPS termination for Horde UI |
| **Data Transfer** | Variable | Artifact downloads |
| **CloudWatch Logs** | 10GB+ ingested | Log storage |

### Cost Factors

**Primary cost driver**: Build agent instances (EC2) and build frequency
- **Rare builds** (nightly only): Lower costs, agents idle most of time
- **Frequent builds** (per-commit): Higher costs, agents constantly utilized

**Unreal build characteristics**:
- Small projects: 15-30 minute builds
- Medium projects: 30-60 minute builds  
- Large (AAA) projects: 60-120+ minute builds

**Agent instance sizing**: Larger instances = faster builds but higher cost per hour

### Cost Optimization

1. **Use Spot Instances for build agents**:
   - Up to 70% discount vs On-Demand pricing
   - Build agents can tolerate interruptions (restart build on new agent)
   **Potential Savings**: Significant for high-frequency builds

2. **Right-size build agents**:
   - Monitor build times and resource utilization
   - Balance build speed vs cost (faster instance = shorter runtime but higher hourly cost)

3. **Implement S3 lifecycle policies**:
   - Archive old build artifacts to S3 Glacier
   - Delete very old artifacts after retention period
   **Potential Savings**: Variable based on artifact volume

4. **Stop agents when not building**:
   - Auto-scale agents to zero during low-activity periods
   - Schedule builds during business hours if possible
   **Potential Savings**: Significant for sporadic build patterns

5. **Optimize Unreal build graph**:
   - Parallelize build tasks where possible
   - Cache intermediate build outputs
   - Reduce unnecessary build steps

**Use [AWS Pricing Calculator](https://calculator.aws) for accurate estimates based on your build frequency and project size**.

5. Add "Production Considerations" Section (Before "Getting Started")

## Production Considerations

When preparing to deploy this module in a production environment, consider the following:

### Security
- Review and restrict access to Horde web UI (ALB security groups)
- Enable MFA for AWS IAM users with access to Horde infrastructure
- Configure VPC Flow Logs for network traffic auditing
- Implement secret rotation policies for database credentials
- Enable CloudTrail for API activity logging
- Review Horde user permissions and authentication settings

### High Availability & Reliability
- Deploy RDS in Multi-AZ configuration for automatic failover
- Enable automated RDS backups with appropriate retention period
- Configure auto-scaling policies for build agents across multiple AZs
- Test disaster recovery procedures (restoring Horde server from backup)
- Document runbooks for common failure scenarios (agent failures, server issues)

### Monitoring & Observability
- Set up CloudWatch alarms for critical metrics (agent CPU/memory, build queue depth)
- Configure billing alerts for unexpected cost increases
- Monitor build success/failure rates
- Track build duration trends to identify performance degradation
- Set up alerting for agent connectivity issues
- Monitor RDS performance metrics (connections, CPU, IOPS)

### Performance
- Right-size EC2 agent instances based on actual build times
- Monitor agent utilization and adjust pool sizes
- Review Unreal build graph for optimization opportunities
- Consider dedicated agents for different build types (editor builds vs shipping builds)
- Monitor S3 upload/download speeds for artifact handling

### Build Agent Management
- Establish policies for agent lifecycle (launch, updates, termination)
- Plan for Unreal Engine version upgrades on agents
- Document custom agent configurations (installed tools, SDKs)
- Consider using custom AMIs for faster agent provisioning
- Test agent replacement procedures

### Operations
- Document procedures for common operations (adding agents, modifying build configs)
- Establish backup and retention policies for build artifacts
- Plan for Horde version upgrades
- Define incident response procedures for build failures
- Train team on Horde administration and troubleshooting
- Establish SLA for build completion times

Metadata

Metadata

Labels

Type

No type

Projects

Status

Ready

Relationships

None yet

Development

No branches or pull requests

Issue actions