RFC: Terraform Module Versioning vs Project Versioning #562

novekm · 2025-04-05T03:53:07Z

novekm
Apr 5, 2025
Maintainer

Is this related to an existing feature request or issue?

No response

What part of the Cloud Game Development Toolkit does this RFC relate to?

Modules

Summary

This RFC is to gain feedback on the current function of the versions of the toolkit, and to discuss the potential to add versioning to the Terraform modules that are within the toolkit.

Use case

The reasoning behind this suggestion is because currently there is no way to reference a specific version of any of the Terraform modules. We have the version of the toolkit, but there is no link between this and n number of changes that could take place to any of the modules.

Proposal

Currently the toolkit is on v1.1.3-alpha however, what does this mean for an end user? This project mentions "The Cloud Game Development Toolkit (a.k.a. CGD Toolkit) is a collection of templates and configurations for deploying game development infrastructure and tools on AWS." At its core, these templates are largely Terraform modules, since the goal of the project is to make it easier to deploy infrastructure. There are things like Packer templates, and Ansible playbooks as well. These can be consumed standalone, however it's my understanding that the typical expected usage is to use these in tandem with a Terraform module that would be provisioning the actual infrastructure in AWS.

However, currently there is no way for a user of the toolkit to reference a specific version of any of the Terraform modules. For example, assuming there is a request to add SES email support for Perforce Helix Swarm. If and when this is added to the Perforce module, how can a user effectively use this new enhancement that was added to the module?

In a standard Terraform module, this would be achieved by referencing the specific version of the module that adds the functionality. For those who do not desire to have this functionality, they could continue to use their current version of the module. The same applies for Terraform providers, such as the AWS Provider for Terraform. At the moment, when a new release of the toolkit is cut (e.g. v1.1.4-alpha), this can provide any number of changes to these Terraform modules (as well as the Packer templates, Ansible playbooks, etc.) however this is potentially destructive to the Terraform modules. This version of the toolkit has no relation to the version of any of the Terraform modules in a way that can be referenced by a Terraform user.

Recommendations

In addition to the version releases for the toolkit itself, I propose that we start to version the Terraform modules that are being developed in the toolkit. This will allow us to more effectively track changes that are made to these modules, align feature requests and the related completed PRs to the versions, and allow end users to reference specific versions of the modules as desired.

I propose that we standardize using the Terraform Registry and individual git repositories to host all of these modules for the following reasons:

1. Easier visibility, lifecycle management, and contributions

The Terraform Registry is the standard location Terraform developers go to to find modules that they can use. By using the registry, it is easy for developers to use specific versions of the module and the remote source of the registry itself. For example:

module "iam-identity-center" {
  source  = "aws-ia/iam-identity-center/aws"
  version = "1.0.2"
}

This also means that potential users who may have a use case for something in the toolkit but are not a direct game studio can also make use of the modules we are actively developing. This can increase the number of community contributions we receive and ultimately lead to advancements in the modules with less of a reliance on internal contributors.

For a data point, the AWS IAM Identity Center Terraform Module has ~50k downloads, and 2 of the most major changes were submitted by external contributors.

2. Enhanced usage tracking and alignment with project roadmap

Another benefit of the Terraform Registry is the usage metrics they provide. This is a clear data point that can be used to determine the most in demand modules, and prioritize adding new functionality based on this.

For example, take two modules released in the AWS-IA GitHub Org:

The AWS IAM Identity Center module was released almost a full calendar year after the AWS Amplify App module, however it has almost 50x more downloads in 50% of the time. For prioritization, it is clear which would take preference if both had a feature request.

Example Usage

module "perforce" {
  source  = "aws-games/perforce/aws"
  version = "1.0.0"
}

module "jenkins" {
  source  = "aws-games/jenkins/aws"
  version = "1.0.0"
}

module "teamcity" {
  source  = "aws-games/teamcity/aws"
  version = "1.0.0"
}

Out of scope

Anything that is not a Terraform module.

Potential challenges

A potential challenge is the method used for Terraform module versioning. Typically a Terraform module is contained to a single git repository. This can then be used as the source when storing the module in the Terraform Registry. To my understanding, one of the main reasons a single Terraform module should be used within a single git repo is because of git tags. These are what are used to tag commits and release new versions of the module. If continuing down the current path of keeping all Terraform modules within the core Cloud Game Development Toolkit repo, I do not think we will be able to add git tags to the modules that are in the subdirectory. And even if we were able to, a single Terraform module hosted on the Terraform Registry must be contained to a single git repository. As such, I propose that we break out the modules currently in the toolkit into individual standalone git repositories.

While this has the con of increasing the number of GitHub repos we deal with, I believe the pros outweigh the cons, especially for the end users. The toolkit can think reference these individual Git repositories for ease of visibility. For an example, see [AWS Labs] MCP(https://github.com/awslabs/mcp?tab=readme-ov-file).

Dependencies and Integrations

No response

Alternative solutions

novekm · 2025-04-05T04:29:02Z

novekm
Apr 5, 2025
Maintainer Author

Relevant discussion from HashiCorp discuss: https://discuss.hashicorp.com/t/versioning-modules-in-a-monorepo/17650/2

0 replies

ghost · 2025-04-07T17:14:13Z

ghost
Apr 7, 2025

Users of the CGD Toolkit currently fork or clone the repository, and either deploy directly or copy and paste module source into their own IaC projects. Changes to module functionality have been minimal thus far, so version bumps have been inconsequential.

Should a user wish to avoid a particular change or implementation of a module they can cherry pick in commits as they see fit. Some of our end users have already expressed that this is the primary way they are consuming the Toolkit - as a managed Git submodule.

As such, it is possible to consume distinct versions of our modules - its just that we use git to do this as opposed to the TF registry. As for using the Terraform registry, I'm not sure how I feel about the complexity of splitting each module into a separate repository. If this is something we actually want to undertake there is significant scoping required, and it takes time and effort away from delivering valuable updates to the project. I'm game for it, but I have a couple of questions we should answer first:

How do we manage Packer and Ansible playbooks related to a workload? Does the Perforce Helix Core Packer template live in the Terraform module for Perforce Helix Core? Does the Ansible playbook for registering Horde agents exist inside the Horde module?
Does the Cloud Game Development Toolkit continue to exist as a collection of examples connecting these modules together in complex architectures? Do we publish it as a module as well (quickstart idea)?
Do the examples in each module only pertain to that technology? In other words, where do we store a sample configuration that deploys Helix Core, Helix Swarm and the Helix Authentication Service? Maybe the solution is all three of these modules exist together and are released and versioned together?

I'm also a bit unclear on the MCP example. This project is a monorepo that contains source for multiple MCP servers. The issue actually relies with Terraform registry constraints on versioning to a single Github repo - plenty of package management tools actually support the ability to release distinct versions within a single repository.

0 replies

novekm · 2025-04-07T22:42:35Z

novekm
Apr 7, 2025
Maintainer Author

How do we manage Packer and Ansible playbooks related to a workload? Does the Perforce Helix Core Packer template live in the Terraform module for Perforce Helix Core? Does the Ansible playbook for registering Horde agents exist inside the Horde module?

If a specific module directly depends on a Packer template and/or Ansible playbook (e.g. this is a pre-requisite to be able to successfully deploy resources using the module), I believe that should live within the repo for the Terraform module in question. For Helix Core for example, depending on how much we want to automate we could even consider doing the following:

Default configuration would have pre-requisite to create the AMI and reference it in the module configuration before apply. This could be done using Packer or other means, and user could customize the image as they see fit.
Optional variable (e.g. enable_auto_image_creation) or something similar that can optionally use the local-exec provisioner to build the AMI and push to the target account on the user's behalf, then use that image for the deployment. We could determine if we want this image to persist in the account after a Terraform destroy (default behavior) or could potentially add another variable (e.g. disable_image_deletion_protection) which could use another local-exec to destroy the image on run of terraform destroy. I understand that using the local-exec provisioner is generally not recommended at scale, so this could be an optional experimental feature meant to simplify deployment. The only pre-requisite would be for the user to have Packer installed.

However, it is also a viable option to keep these Packer templates within the root toolkit repo and people can consume those as needed, and the module calls this out as a pre-requisite. I am unsure of how many potential options there could be for the AMI needed for Perforce, but I'm assuming there is likely a pretty generic one that the module can provide as a default, and users can customize this on their own if they want.

Does the Cloud Game Development Toolkit continue to exist as a collection of examples connecting these modules together in complex architectures? Do we publish it as a module as well (quickstart idea)?

Yeah, that's one option. I see two paths:

Option 1. The toolkit becomes as you mentioned a collection of examples that connect multiple modules together for purpose built solutions for more complex architectures. It can also contain helper resources such as the existing and future Packer templates and Ansible playbooks which ideally are used in combination with a linked Terraform module, but really could be used with an IaC tool besides Terraform, or even with ClickOps if a user desired this (though not recommended).

Option 2. The Cloud Game Development Toolkit itself becomes its own distinct Terraform module which uses the existing submodules (which become true submodules at that point). Conditional logic and optional parameters/variables would allow users to enable/disable certain functionality. The main downsides of this is that the project becomes more of a monolith, the code becomes substantially more complex, state becomes larger and potentially more complex to manage for end users, small changes could potentially hit service quota limits since so much infra could potentially be linked, and the blast radius is increased for resources that are deployed.

I believe option 1 is more viable in the longterm.

Do the examples in each module only pertain to that technology? In other words, where do we store a sample configuration that deploys Helix Core, Helix Swarm and the Helix Authentication Service? Maybe the solution is all three of these modules exist together and are released and versioned together?

Yes, I believe the examples in each module would just pertain to the specific technology. This makes the modules versatile, and the toolkit examples is what adds the Games lens for more complex implementations that use multiple of the modules together.

On Perforce - that's exactly what I was thinking. I believe Helix Core, Helix Swarm, and Helix Authentication Service should be contained within a single perforce module and conditional logic/optional parameters used to enable specific components, or not. For someone who just wants to use one part of the module, they would just modify the configuration to only enable those specific resources. However the assumption for the module would be that users would likely be deploying at least 2/3, in most cases all 3. The examples within this single Perforce module could demonstrate some of the customizations that are possible.

In terms of the MCP example, I just meant how the README links to the various versioned packages which from what I understand, can be separately installed and used as desired. This is what I am envisioning for the toolkit. Instead of python packages, pointing to the related repos on GitHub or modules in the Terraform Registry.

I recommend we use the Terraform Registry and the single git repo per module (this also simplifies the automated testing with GitHub actions). However, as an alternative to the separate Git repos/Terraform Registry:

if the git tagging works for the submodule directories in a way that won't conflict with the git tags for the toolkit itself
and/or if we're ok that versions for the modules would all change together even if changes were only made to a single module
and we don't desire to host the modules on the Terraform Registry

Perhaps users could reference the directory for the modules and the specific git tag. Something like this:

Perforce

module "perforce" {
  source = git::https://github.com/aws-games/cloud-game-development-toolkit//modules/perforce?ref=v1.0.0)
}

TeamCity

module "teamcity" {
  source = git::https://github.com/aws-games/cloud-game-development-toolkit//modules/teamcity?ref=v1.0.0)
}

0 replies

kylesomers · 2025-04-08T22:47:03Z

kylesomers
Apr 8, 2025

As more use cases and tools are identified and the number of modules grows, I agree that it will likely make sense to split these out into their own module repositories and manage them independently with separate communities of contributors that are supporting each of them. But that is not a near-term milestone in my opinion.

However, currently there is no way for a user of the toolkit to reference a specific version of any of the Terraform modules. For example, assuming there is a request to add SES email support for Perforce Helix Swarm. If and when this is added to the Perforce module, how can a user effectively use this new enhancement that was added to the module?

Currently this would involve subscribing to issue notifications and cherry picking commits as @henrykie described. This is a tradeoff we tolerate for now because we know that there will be breaking changes frequently to all of the modules until they are in a stable state (and we exit alpha). Definitely room for improvement here and would be addressed with module specific versioning.

0 replies

novekm · 2025-04-09T00:30:00Z

novekm
Apr 9, 2025
Maintainer Author

Gotcha, makes sense. If the end goal is to eventually move towards individual git repos for the Terraform modules, I propose that we do the following:

1. Model Toolkit repo structure to align with future state

In the short term, I recommend that we model the toolkit repo structure to align with the future state of the modules eventually being in separate git repos. That way, if/when we transition in the future, the structure of the toolkit repo will largely stay identical. The only thing that would change is the modules directory would go away, and each module would git (pun intended) it's own repo.

This will make the potential transition easier in the future, and allow contributors to familiarize themselves with that structure so future contributions would largely stay the same. The only real difference they would experience is working in a separate git repo for the each of the modules, instead of within the /modules directory of the current repo. Ideally the all of the other components of the toolkit repo would stay generally the same even after the future transition.

Example directory structure

.
└── cloud-game-development-toolkit/
    ├── assets
    └── modules/
        ├── perforce/
        │   ├── examples/
        │   │   ├── basic/
        │   │   │   ├── main.tf
        │   │   │   ├── providers.tf
        │   │   │   └── # etc.
        │   │   └── existing-resources/
        │   │       ├── main.tf
        │   │       ├── providers.tf
        │   │       └── # etc.
        │   ├── tests/
        │   │   ├── 01_mandatory.tftest.hcl
        │   │   ├── 02_existing_resources.tftest.hcl
        │   │   └── # etc.
        │   ├── data.tf
        │   ├── iam.tf
        │   ├── local-exec.tf
        │   ├── local-file.tf
        │   ├── locals.tf
        │   ├── main.tf
        │   ├── outputs.tf
        │   ├── provider.tf
        │   ├── README.md
        │   ├── variables.tf
        │   ├── VERSION
        │   └── vpc.tf
        ├── jenkins/
        │   ├── examples/
        │   │   └── # etc.
        │   ├── tests/
        │   │   └── # etc.
        │   ├── main.tf
        │   ├── providers.tf
        │   └── # etc.
        ├── unreal/
        │   ├── horde/
        │   │   ├── examples/
        │   │   │   └── # etc.
        │   │   ├── tests/
        │   │   │   └── # etc.
        │   │   ├── main.tf
        │   │   ├── providers.tf
        │   │   └── # etc.
        │   ├── unreal-cloud-ddc/
        │   │   └── # etc.
        │   ├── tests/
        │   │   ├── 01_simple_build_pipeline.tftest.hcl
        │   │   └── # etc.
        │   └── examples/
        │       ├── simple-build-pipeline/
        │       │   ├── main.tf
        │       │   ├── providers.tf
        │       │   └── # etc.
        │       └── unreal-cloud-ddc-single-region/
        │           ├── main.tf
        │           ├── providers.tf
        │           └── # etc.
        ├── README.md
        ├── CONTRIBUTING.md
        └── # etc.

2. Consolidate the three Perforce modules into a single Perforce module

This is shown in the example directory structure above. I've already be working on this as a test. Essentially the single Perforce module could be used to deploy Helix Core, Helix Swarm, and Helix Authentication Service. Conditional logic would be used to optionally create infra. See this example main.tf file for the updates I've been working on:

main.tf

module "perforce" {
  source = "../../"
  # source = "aws-games/perforce/aws"
  # version = "v1.0.0"

  # SHARED
  fully_qualified_domain_name = "novekm.people.aws.dev"

  create_vpc           = true
  vpc_cidr_block       = "10.0.0.0/16"
  public_subnet_cidrs  = ["10.0.1.0/24", "10.0.2.0/24"]
  private_subnet_cidrs = ["10.0.3.0/24", "10.0.4.0/24"]


  # HELIX CORE
  helix_core_config = {
    resource_prefix = "helix-core"

    # Compute
    lookup_existing_ami      = false
    enable_auto_ami_creation = true
    ami_prefix               = "p4_al2023"

    instance_type         = "c6in.large"
    instance_architecture = "arm64"
    p4_server_type        = "p4d_commit" # required
    unicode               = false
    selinux               = false
    case_sensitive        = true
    plaintext             = false

    # Storage
    storage_type = "EBS"

    depot_volume_size    = 128
    metadata_volume_size = 32
    logs_volume_size     = 32

    # Networking
    lookup_existing_instance_subnet_id = false
    internal = false

    # Security & Auth
    create_helix_core_default_role = true

  }

  # HELIX SWARM
  helix_swarm_config = {
    # parameters

  }

  # HELIX AUTHENTICATION SERVICE
  helix_authentication_service_config = {
    # parameters

  }

In the module, infra is conditional created based on if the related components have the variable set, for example:

# only create this resource is var.helix_core_config exists and is not null
resource "aws_volume_attachment" "helix_core_depot_attachment" {
  count = var.helix_core_config != null ? 1 : 0
  # count       = var.helix_core_config.lookup_existing_instance_subnet_id == true ? 1 : 0
  device_name = "/dev/sdh"
  volume_id   = aws_ebs_volume.helix_core_depot[0].id
  instance_id = aws_instance.helix_core_instance[0].id
}

3. Support and document interim module versioning

For module versioning in the short term, users could continue to do as you mentioned - monitor GitHub issues and cherry pick commits. I'm not sure how this typically is achieved, but on the Terraform side, I think they should be able to use the core toolkit git tag that is used for the releases (e.g. v1.1.3-alpha) to achieve this. This would mean that all modules would would be linked to the same tag for now (which is tied to the toolkit releases), which I agree is probably fine while it's still in alpha.

For usage, it would look like:

Perforce

module "perforce" {
  source = git::https://github.com/aws-games/cloud-game-development-toolkit//modules/perforce?ref=v1.1.3-alpha
}

TeamCity

module "teamcity" {
  source = git::https://github.com/aws-games/cloud-game-development-toolkit//modules/teamcity?ref=v1.1.3-alpha
}

With the examples used, we should also start to reference the module sources in this way instead of using a local source (e.g. `source = "../../").

This also will ensure that for any enablement content we make before the transition, it is easy to update. All that would really change is the module source in the examples, and a some links in the docs

4. Standardize using EC2 Image Builder within all modules that need to manage container images

Outlined in #558. For now this would be another module within the toolkit /modules/ directory, but eventually could also be stored in its own git repo. Each module that needs to use it could either use the local source (e.g. source = "../moudles/ec2-image-builder") or use the git method of source = git::https://github.com/aws-games/cloud-game-development-toolkit//modules/ec2-image-builder?ref=v1.1.3-alpha.

0 replies

gabebatista · 2025-04-09T14:02:31Z

gabebatista
Apr 9, 2025
Maintainer

I think both of you make a good point. Having the ability to version the individual modules makes a lot of sense to me, and having the modules available in TF registry would help discovery. My concerns are around how splitting the codebase across multiple repositories would affect the management of the project as whole. Being that the project team is very small at the moment, we need to minimize the effort required to keep the project organized. We should hold off on breaking out the modules until we exit alpha. At that point we should have a better handle on the project and will have automated many of the required tasks (which we can then replicate in the module specific repositories).

0 replies

ghost · 2025-04-09T17:07:00Z

ghost
Apr 9, 2025

I think we could pilot this with the EC2 Image Builder module that @kylesomers is building if we wanted to, but where would that live? In aws-games? In aws-samples? Do we just avoid building our own module and contribute to the aws-ia/ec2-image-builder module? As Gabe mentioned we end up managing more overhead solely for the benefit of distributing these modules via TF registry and individual versioning. I'm not convinced the value is significant enough to end users to justify that.

1 reply

kylesomers Apr 10, 2025

Current plan is to house this in modules/utilities/container-image-pipeline. Open to suggestions.

ghost · 2025-04-09T17:09:36Z

ghost
Apr 9, 2025

Also, I'd like to see us remove the RFC issue type @gabebatista, and leverage GH discussions to avoid cluttering the backlog. I'll leave it up to you to determine if we close this and move the conversation elsewhere.

1 reply

kylesomers Apr 10, 2025

Done.

novekm · 2025-04-09T23:01:57Z

novekm
Apr 9, 2025
Maintainer Author

I agree that I could wait until we are out of alpha. On where to store the pilot EC2 Image Builder module (and the other Terraform modules in general) - I'd say it depends on how much autonomy we want to have. I don't have a strong preference on if the modules should live in the aws-games or aws-ia GitHub org if externalized. aws-samples has too many repos so I'd avoid that - I feel like it could just get buried there and lost in the sea of various unrelated repos. My recommendation would be to keep the modules in the aws-games git hub org (whether within the toolkit or in separate git repos), as this will allow us to still build with velocity and autonomy.

To help me understand the concerns better, what are the main complexities and additional overhead that you both envision? From what I gather so far it's:

GitHub issue tracking
- If externalizing the modules, these would each have their own GitHub issues, which would need to be linked to a centralized view for the toolkit, such as the GitHub Project that is being used for roadmap and sprint planning.
Internal and External Contributor workflow
- Instead of only having to think about a single repo (cloud-game-development-toolkit) they would have to interact with a variety of git repos
Managing Releases
- Instead of managing a single release and versioning for the toolkit, we would also be managing versions and releases for each of the separate modules

Am I off base or missing anything? In my mind, most of those above items can be resolved with user training/clear docs on contributing, and automation. Also, as discussed earlier - so far there aren't any net-new Terraform modules planned, so once moved to separate git repos and initially hosted, I don't envision that adding much either. Just incremental updates to those modules/repos based on GitHub issues. What I do think I'm hearing though is that the additional overhead is not really technical overhead, but project management overhead. If so, what are the key areas that become more difficult there?

On the point on end user value prop of TF registry and individual module versioning - I suppose my question is how do most Terraform developers consume modules, does this differ for game studios, and do we want to align the toolkit a typical consumption model or not?

From what I've seen, Terraform developers generally fall into one of three buckets in terms of module creation and/or consumption:

[New, Intermediate, Advanced Users] - Find and consume Terraform Modules using the Terraform Registry
- With this option, modules are consumed as is, while using the version to target specific versions of the modules. Any additional desired functionality is submitted in the form of a feature request via a GitHub issue raised against the underlying repo. This is the same process for Terraform Providers.
[Intermediate, Advanced Users] - Build Custom Terraform Modules using starter code hosted somewhere
- With this option, modules are still generally found using the public Terraform Registry. Users fork the repo and make their own customizations to the module as needed, instead of waiting on feature requests. Or, they simply copy and paste code from the source files, and use this as the foundation for their own custom Terraform Module that they store elsewhere.
[Advanced Users] Build Terraform Modules from scratch
- With this option, Terraform modules are developed using potentially no starter code

I'm not sure if this differs for game studios, but in my mind I'm assuming most users would do one of the first two options (option 1 even moreso for newer to intermediate Terraform developers). Either use the modules as-is by pointing to a public location such as the Terraform Registry, with the ability to easily manage versions, or fork one of the module git repos and work from there, potentially use the git module source that can reference a version based on git tags (e.g. `source = git::https://github.com/novekm/terraform-aws-perforce?ref=v1.0.0)

In the future, do we anticipate most users looking for ready-to-go modules that they can easily reference, or forking the modules and/or the cloning toolkit repo making extensive changes to them?

Either way, at the minimum I think it will help the end customers if they can reference a module version, even if in the short term, that is achieved by using the git tag for the toolkit itself (e.g. v1.1.3-alpha), and we start to use this in the examples. They could then use the release notes that would explain what changes were made within the modules, so they are aware that v1.1.4-alpha may have only impacted a single module, so they only need to bump the version in the source reference if they want the new changes just to that module. Perhaps that is already possible today.

2 replies

kylesomers Apr 10, 2025

This is great Kevon, thank you. And it gives us a lot to work backwards from for planning our exit from alpha. Overall I like a lot of what you're proposing here. I don't see any major issues with project management overhead moving to a multi-repo project structure. In fact, I like the idea of us possibly re-purposing this repository into a home (and landing page) of comprehensive deployment examples that span several modules, which would each live in their own repos in this org like you suggest. This would also be the repo that houses our docs site for all the Terraform modules like it already is.

Timing, inertia, and opportunity cost are concerns, though. Your points about users being able to reference a specific module version are totally valid for Terraform module consumption, and is something we should build towards, so this is a good conversation to have. But as @henrykie shared earlier, it is predicated on a developer consumption model that does not reflect how our project is currently being used by those that we have talked with and helped implement the modules in here. We're still undergoing a lot of pivoting and breaking changes that we undergo in every commit right now, which would likely result in all the benefits of referencing a module version pointless to begin with, because the project is changing so much. Even with module versioning, the upgrade path would not be pretty (hence, alpha). The module versioning is a good thing to have, but I'm not sure it actually lands us on a better developer experience until we first get these modules into a more stable place.

While I don't want to distract from this conversation about "how", I'm more interested in us identifying our criteria for "when" it makes sense to do this. Could you provide a clear step-by-step breakdown of the work effort needed in order to implement this new approach that you propose, and a checklist of the items and dependencies we'd need to complete for this to make sense to do, in order to deliver on the best dev experience. That would give us a good starting place to potentially identify a module to start with. My main concern is avoiding breaking our examples that span across modules.

novekm Apr 11, 2025
Maintainer Author

No problem, happy to share. This is what I envision for the step-by-step breakdown of the work effort needed to implement, and the checklist items/dependencies.

1. Create the GitHub repos for the following modules:

Perforce
Jenkins
Teamcity
Unreal Horde
Unreal Cloud DDC

For now, these repos would just be private GitHub repos that are named based on the requirements to be hosted on the Terraform Registry, but would be empty for now.

Level of effort: LOW

2. Refactor the file structure of the existing modules in the toolkit to align to standards for Terraform modules:

Something like this:

.
└── my-terraform-module/
    ├── .github/
    │   ├── workflows
    │   └── # etc
    ├── .config/
    │   ├── .terraform-docs.yaml
    │   └── # etc.
    ├── examples/
    │   ├── basic/
    │   │   ├── .header.md
    │   │   ├── README.md
    │   │   ├── main.tf
    │   │   └── # etc.
    │   └── advanced/
    │       ├── .header.md
    │       ├── README.md
    │       ├── main.tf
    │       └── # etc.
    ├── tests/
    │   ├── 01_mandatory.tftest.hcl
    │   └── # etc.
    ├── .header.md
    ├── README.md
    ├── .pre-commit-config.yaml
    ├── CONTRIBUTING.md
    ├── LICENSE
    ├── main.tf
    ├── data.tf
    ├── providers.tf
    ├── variables.tf
    ├── outputs.tf
    └── # etc.

This will make the eventual transition to the created GitHub repos easier, and ensure we don't have to do a large refactor if/when we decide to host the modules on the TF Registry when we move out of alpha versions.

Note: the .header.md file is a helpful fill used in the aws-ai repos. It allows you to prepend README.md files such as for terraform docs with custom readme docs for additional demos. The idea is that when building, you use precommit with precommit run --all-files which will do a number of things, part of which is generate the TF docs. Then in the .terraform-docs.yaml file, you just have this configuration:

formatter: markdown
header-from: .header.md
settings:
  anchor: true
  color: true
  default: true
  escape: true
  html: true
  indent: 2
  required: true
  sensitive: true
  type: true

sort:
  enabled: true
  by: required

output:
  file: README.md
  mode: replace

3. Refactor Perforce module, consolidating the `helix-core`, `helix-swarm`, and `helix-authentication-service` into a single module:

I've already started doing this, and have finished moving the resources for helix-core but have paused while I wait to hear from Perforce on the documentation and recommended deployment path for Perforce P4 on AWS. This would allow users to use a single perforce module to deploy common perforce software, which are very closely related. They can conditionally enable specific components by using variables in their core module declaration, and can omit usage of variables to not turn on certain things. An example main.tf for this would be (with the source changing in examples once moved to TF Registry:

module "perforce" {
  source = "../../"
  # source = "aws-games/perforce/aws" - example when on TF Registry
  # version = "v1.0.0" - example when on TF Registry

  # SHARED
  fully_qualified_domain_name = "example.com"

  create_vpc           = true
  vpc_cidr_block       = "10.0.0.0/16"
  public_subnet_cidrs  = ["10.0.1.0/24", "10.0.2.0/24"]
  private_subnet_cidrs = ["10.0.3.0/24", "10.0.4.0/24"]


  # HELIX CORE
  helix_core_config = {
    resource_prefix = "helix-core"

    # Compute
    lookup_existing_ami      = false
    enable_auto_ami_creation = true
    ami_prefix               = "p4_al2023"

    instance_type         = "c6g.large"
    instance_architecture = "arm64"
    p4_server_type        = "p4d_commit" # required
    unicode               = false
    selinux               = false
    case_sensitive        = true
    plaintext             = false

    # Storage
    storage_type = "EBS"

    depot_volume_size    = 128
    metadata_volume_size = 32
    logs_volume_size     = 32

    # Networking
    lookup_existing_instance_subnet_id = false

    internal = false

    # Security & Auth
    create_helix_core_default_role = true
  }

  # HELIX SWARM
  # helix_swarm config = {

  # }

  # # HELIX AUTH SERVICE
  # helix_auth_service_config = {

  # }
}

The goal of this is that it will be easier for customers to consume this single module, and will prevent us for making an increasingly large number of TF Modules if we add support for different closely related software from an organization. I think going forward, we should evaluate how closely linked software components are, and use that to determine what we combine within a single TF Module.

I want to be clear - I am not proposing that we group every piece of software a company has within a single Terraform module. For example, for unreal in my opinion Horde and Cloud DDC are distinct enough to be their own distinct Terraform modules (this is what we currently are doing in the toolkit). I think we should continue to keep this mental model - structure based on how closely linked the software components are and the expected end user experience for consumption.

Level of effort MEDIUM

4. Add tests to all existing Terraform modules:

If the goal is to eventually move the modules to their own GitHub repos, I propose that we start adding in tests to these modules. We can start realizing the benefits from this now, as we add additional features and functionality to the modules. This would also entail the creation of GitHub Actions that can trigger these tests.

This is what I am currently assigned to for this current sprint. I have this mostly done, though need to determine the best path for the GitHub Actions. The key challenge is alignment on the directory structure for these tests, as it will influence how we document the structure for others to write tests in the future as they add additional module functionality, and the number of/structure of the GitHub actions to trigger these tests based on the working directory.

A pre-requisite to this is Step 2 mentioned above. We need to align on this directory structure to make these tests and GitHub actions. My recommendation (as mentioned in Step 2) is that if we are planning to move these modules out to their own GitHub repos in the future, we should structure these modules to be aligned to how they should be when transitioned to their own individual repos and hosted on TF Registry, especially in relation to the tests.

Level of effort LOW

6. Determine a pause point:

Once the above tasks are complete, we should determine a pause point on enhancements to the Terraform modules, or set up automation for a interim solution where changes in the toolkit are synced with the private GitHub repos for each module. Or if automation for this is more trouble than it's worth, manually keep these in sync until the cutover.

The ideal situation is that the pause point is only for the modules - docs to the core toolkit repo, or examples using the modules (larger examples such as simple-build-pipeline, not individual module examples) should be able to still be created. The goal is that in this transitional mode (as well as with all of the other steps), customers are not blocked from using the toolkit as is. This should be generally transparent to them.

Level of effort LOW

6. Determine Project Management Plan Post-cutover:

In this step, we would determine the optimal path to manage these collection of GitHub repos after the cutover. This would entail actions such as:

Determining how we want to track GitHub issues
Determining how we want to centralize GitHub issues, (GitHub Project, Asana, Jira, something else?)
Determining how we want to handle versioning and releases going forward
Determining updates that need to be made to the docs for the toolkit repo to point to the modules in TF Registry/the distinct GitHub repos
etc.

Level of effort MEDIUM to HIGH - open to feedback on the level of effort folks anticipate here.

7. Cutover and Terraform module release:

This step would entail a final testing of the Terraform Modules, and ideally a single step to release the first non-alpha version of all of the modules simultaneously. This will be the break point, after which each of these modules will live on and have their own distinct releases and versions. These will likely be part of sprint planning going forward (e.g. features added to the module, releases cut). The actual release process after this would be pretty simple:

Review the PR
Invoke the existing tests with GitHub Actions (created back when the modules were in the core toolkit repo + any new ones)
If all looks well, simply bump the version in the VERSION file and merge the PR, which will trigger a release GitHub action which will update the git tag for that commit using the string of this version in the VERSION file (and whatever else we want the action to do). TF Registry will automatically pick this up and update the landing page for the module to reference whichever is the most recent release in the GitHub repo (linked to the git tags).

Level of effort LOW

8. Marketing/promotion about the new modules and transition of the toolkit out of alpha:

Blogs
Videos
AWS Workshop
Any other relevant internal/external enablement (doesn't all have to be done at once) but ideally this could be worked on as we work towards the release so it can be posted around the same time
Choreography and Merriment 🌐🎺🕺🏽

Level of effort MEDIUM

Happy for feedback on this.

RFC: Terraform Module Versioning vs Project Versioning #562

Uh oh!

Uh oh!

novekm Apr 5, 2025 Maintainer

Is this related to an existing feature request or issue?

What part of the Cloud Game Development Toolkit does this RFC relate to?

Summary

Use case

Proposal

Recommendations

1. Easier visibility, lifecycle management, and contributions

2. Enhanced usage tracking and alignment with project roadmap

Example Usage

Out of scope

Potential challenges

Dependencies and Integrations

Alternative solutions

Replies: 9 comments · 4 replies

Uh oh!

novekm Apr 5, 2025 Maintainer Author

Uh oh!

Uh oh!

ghost Apr 7, 2025

Uh oh!

Uh oh!

novekm Apr 7, 2025 Maintainer Author

Perforce

TeamCity

Uh oh!

kylesomers Apr 8, 2025

Uh oh!

Uh oh!

novekm Apr 9, 2025 Maintainer Author

1. Model Toolkit repo structure to align with future state

Example directory structure

2. Consolidate the three Perforce modules into a single Perforce module

main.tf

3. Support and document interim module versioning

Perforce

TeamCity

4. Standardize using EC2 Image Builder within all modules that need to manage container images

Uh oh!

gabebatista Apr 9, 2025 Maintainer

Uh oh!

Uh oh!

ghost Apr 9, 2025

Uh oh!

kylesomers Apr 10, 2025

Uh oh!

ghost Apr 9, 2025

Uh oh!

kylesomers Apr 10, 2025

Uh oh!

Uh oh!

novekm Apr 9, 2025 Maintainer Author

Uh oh!

Uh oh!

kylesomers Apr 10, 2025

Uh oh!

Uh oh!

novekm Apr 11, 2025 Maintainer Author

1. Create the GitHub repos for the following modules:

2. Refactor the file structure of the existing modules in the toolkit to align to standards for Terraform modules:

3. Refactor Perforce module, consolidating the helix-core, helix-swarm, and helix-authentication-service into a single module:

4. Add tests to all existing Terraform modules:

6. Determine a pause point:

6. Determine Project Management Plan Post-cutover:

7. Cutover and Terraform module release:

8. Marketing/promotion about the new modules and transition of the toolkit out of alpha:

novekm
Apr 5, 2025
Maintainer

Replies: 9 comments 4 replies

novekm
Apr 5, 2025
Maintainer Author

ghost
Apr 7, 2025

novekm
Apr 7, 2025
Maintainer Author

kylesomers
Apr 8, 2025

novekm
Apr 9, 2025
Maintainer Author

gabebatista
Apr 9, 2025
Maintainer

ghost
Apr 9, 2025

ghost
Apr 9, 2025

novekm
Apr 9, 2025
Maintainer Author

novekm Apr 11, 2025
Maintainer Author

3. Refactor Perforce module, consolidating the `helix-core`, `helix-swarm`, and `helix-authentication-service` into a single module: