This repo is trying to show how we use terraform in pipetail.
It is for our internal use to reference external terraform modules as well as a "terraform skeleton" to bootstrap new infrastructure. We also use it for educational purposes as a reference in our workshops.
We hope this will also help to anyone out there searching for some terraform "best practices" and inspiration for "public cloud infrastructure codebases".
You might want to check out 10 most common mistakes using terraform.
Any feedback & contributions are welcome!
For independent terraform states / environments / etc. we use folder layout rather than somehow using one folder with staging.tfvars and prod.tfvars and plenty of ifs in the configuration.
We prefer boilerplate rather than complexity in this.
Folder layout also makes it easier to use direnv with AWS_PROFILE credentials and other ENV vars we might need.
This repository uses pre-commit framework https://pre-commit.com. Please install the framework and install all the hooks by invoking:
pre-commit install
The configuration is directly in .pre-commit-config.yaml file. It mainly ensures the following:
terraform_fmt- terraform code formattingterraform_docs- auto-generated module documentationterraform_validate- terraform syntax validationterraform_tflint- terraform linting with custom rulesterraform_checkov- security and compliance scanningshellcheck/shfmt- shell script linting and formattingpacker_fmt- packer file formattingcheck-merge-conflict- avoids commiting merge-conflicts by mistakeend-of-file-fixer- convention for end-of-files (empty line at end of every file)trailing-whitespace- deletes trailing whitespaces at end of linescheck-yaml- validates YAML syntaxpretty-format-json- consistent JSON formattingdetect-private-key- avoids commiting private keys in gitcheck-added-large-files- avoids commiting large files (>4MB) in gitcheck-case-conflict- catches filename case conflicts across platformscheck-executables-have-shebangs/check-shebang-scripts-are-executable- script permission sanityno-commit-to-branch- prevents accidental direct commits tomasterandmaincheck-github-actions/check-github-workflows- validates GitHub Actions workflow schemaopa-fmt- OPA/Rego policy file formattingconftest-verify- runs unit tests for custom Conftest policiesconftest-terraform- validates Terraform files against custom OPA policies
It is possible to manually run all checks on all files using
pre-commit run --all-files
However, pre-commit hooks are going to run automatically every time you try to git commit. The hook will run only on the files that changed within the commit itself, not on all files.
Please always run (pre-commit does this for you):
terraform fmt -recursive .
in the root of your repo.
This command rewrites Terraform configuration files to a canonical format and style.
It's prettier. It doesn't trigger your colleagues' OCDs anymore. Just do it. No arguing.
We use the terraform dependency lock file to track terraform provider dependencies and verify their checksums.
This file is versioned in git and not .gitignored as many people do.
We lock multiple platforms:
terraform providers lock \
-platform=windows_amd64 \
-platform=darwin_amd64 \
-platform=linux_amd64 \
-platform=darwin_arm64 \
-platform=linux_arm64
The terraform-lock.yaml workflow automatically updates lock files when provider versions change in PRs. It runs terraform providers lock for all platforms and commits the updated lock files back to the PR branch.
We use renovate to manage all our dependencies.
Since we prefer pinning our dependencies to certain versions (as opposed to using something like :latest, etc.), we still need an "upgrade strategy". Instead of manually checking for newer versions, changelogs and creating PRs to upgrade each of the dependencies, we have this automated.
That's where renovate comes into play.
Renovate is configured by renovate.json. Key features of our configuration:
- GitHub Action digest pinning for supply chain security
- Grouped PRs - Terraform providers and modules are grouped to reduce PR noise
- Automerge for low-risk updates (provider patch versions, action digest updates)
- Custom regex managers for tracking EKS/Kubernetes versions in Terraform variables
- Lock file maintenance scheduled weekly to keep dependency metadata fresh
- Separate major/minor/patch updates so breaking changes are clearly visible
Renovate scans all files in default branch and looks for dependencies and their versions. It looks through terraform files, Dockerfiles, etc. and when it finds a new version is available for something, it creates a Pull Request with bumping the version, dumps Changelog, etc.
We run all github actions checks to validate, test and terraform plan the changes and when it is safe to upgrade, we simply merge the PR.
The terraform-lock.yaml workflow automatically updates lock files when Renovate (or any PR) changes provider versions, since Renovate doesn't handle this natively.
This .gitignore is a template we use in all our git repos where terraform is used.
There are several GitHub Actions workflows:
precommit.yaml- to check everything with pre-commit in Pull Requests since some people might "forget" to use it :))terraform-validate.yaml- toterraform validateeverythingterraform-plan-*.yaml- toterraform planall folders in PRsterraform-apply-*.yaml- toterraform applyall approved plans from PRs (approved == merged PR)periodic-terraform-apply-*.yaml- aka "poor man's gitops" to periodically terraform apply what is in the default branch, can be also triggered manually (useful when terraform-apply workflows fail for issues with previous terraform plans, etc.)terraform-lock.yaml- automatically updates.terraform.lock.hclfiles for all platforms when provider versions change in PRsterraform-state-unlock.yaml- scheduled workflow (daily 2 AM) that detects and removes stale S3 state locks (>4 hours old), also supports manual unlock via workflow_dispatchterraform-drift-detection.yaml- scheduled workflow (twice daily at 8 AM and 4 PM UTC) that runsterraform planon all environments to detect configuration driftpacker-build.yaml- reusable workflow for building AMIs with Packerpacker-wireguard-04.yaml- builds WireGuard VPN AMI when Packer files change in example 04update-bottlerocket-ami.yaml- weekly check for new Bottlerocket AMI releases, creates a PR to update the pinned versionscheduled-scale-in.yaml.example/scheduled-scale-out.yaml.example- example workflows for scaling down non-prod resources on evenings/weekends and scaling back up on Monday morningpackage-lambdas.yaml- automatically packages Lambda functions when source code changes in PRs, commits updated zip files back to the branchlambda-deploy.yaml- manual workflow dispatch to build, upload, and deploy Lambda functions to S3 and optionally update the function code
All GitHub Actions are pinned to full commit digests (not tags) for supply chain security.
The terraform-drift-detection.yaml workflow automatically detects when infrastructure has been modified outside of Terraform (e.g., via AWS Console or CLI). This is important because:
- Unexpected changes: Someone may have made emergency fixes directly in AWS that need to be captured in code
- Security: Unauthorized or accidental changes should be detected and reviewed
- State consistency: Drift can cause future
terraform applyruns to behave unexpectedly
The workflow uses terraform plan -detailed-exitcode where exit code 2 indicates drift. When drift is detected:
- A summary is posted to the GitHub Actions step summary
- A GitHub issue is automatically created (with label
terraform-drift) to track resolution - The issue links to the workflow run for detailed plan output
The workflow can also be triggered manually via workflow_dispatch for on-demand drift checks. Tool versions in CI workflows are explicitly pinned for reproducibility.
The packer-build.yaml is a reusable workflow for building custom AMIs with Packer. It provides:
- OIDC authentication for secure AWS access
- Validation on PRs - runs
packer validateto catch errors before merge - Build on merge - builds the AMI when changes are pushed to master
- Bot commit detection - skips builds triggered by automated commits to prevent loops
- Step summary - outputs the built AMI ID to GitHub Actions summary
Example 04 (WireGuard VPN) uses this pattern via packer-wireguard-04.yaml. To add Packer CI for other examples, create a caller workflow that references the reusable workflow with appropriate inputs.
The scheduled-scale-in.yaml.example and scheduled-scale-out.yaml.example workflows demonstrate how to reduce non-production costs by scaling down resources on evenings/weekends and scaling back up before business hours. They use targeted terraform apply with variable overrides to adjust Aurora reader replica counts (or any other autoscaling target) on a cron schedule. Copy and customize for your environments.
The update-bottlerocket-ami.yaml workflow runs weekly (Monday 8 AM UTC) to check for new Bottlerocket AMI releases. It queries the AWS SSM public parameter for the latest AMI ID, compares it against the version pinned in Terraform, and creates a PR with the updated AMI ID when a new version is available. This ensures EKS nodes run on the latest Bottlerocket release with security patches and bug fixes while still going through the standard PR review and terraform plan process.
Lambda functions live in src/<lambda-name>/ directories with an index.mjs (or index.js) entry point. Two workflows handle the build and deploy lifecycle:
-
package-lambdas.yamlruns automatically on PRs when Lambda source code changes. It usesscripts/package-lambdas.shto create reproducible zip packages (normalized timestamps, deterministic file ordering) and commits the updated.zipfiles back to the PR branch. If the packaging script itself changes, all Lambdas are repackaged. -
lambda-deploy.yamlis a manualworkflow_dispatchworkflow for deploying a Lambda to a target environment. It packages the function, uploads the zip to an S3 artifacts bucket, and optionally updates the Lambda function code via the AWS CLI. The S3 bucket and region are configurable per invocation.
tflint is a pluggable linter for Terraform. We use the tflint-ruleset-aws plugin to catch AWS-specific issues (invalid instance types, missing tags, deprecated resources) before they reach terraform plan. Configuration is in .tflint.hcl files per environment.
Checkov is an amazing tool to lint terraform (and other) resources, we use the non-official pre-commit hook by antonbabenko
We use Conftest (built on OPA/Rego) to enforce custom rules on Terraform files that can't be caught by tflint or checkov. Policies live in conftest-policies/.
Current policies:
-
JSON policy enforcement (
json_policy.rego) -- Bansdata "aws_iam_policy_document"data sources and raw JSON heredoc strings in policy fields across all AWS resource types (IAM, S3, SNS, SQS, KMS, ECR, OpenSearch, CloudWatch Logs, etc.). Usejsonencode()instead. -
S3 lifecycle rule prefix validation (
s3_lifecycle.rego) -- Prevents placingprefixas a top-level key in S3 lifecycle rules instead of inside afilterblock. The wrong syntax causes the expiration rule to apply to ALL objects in the bucket, not just the intended prefix.
Running manually:
# Test a specific file
conftest test --parser hcl2 --policy conftest-policies/ examples/05-aws-complete/storage.tf
# Run policy unit tests
conftest verify --policy conftest-policies/Adding new policies:
- Create a
.regofile inconftest-policies/withdeny_rules - Add tests in a corresponding
_test.regofile - Run
conftest verify --policy conftest-policies/to validate - Run
opa fmt -w conftest-policies/to format
Use jsonencode() for all AWS policy fields -- not just IAM, but also S3 bucket policies, KMS key policies, SNS/SQS policies, ECR repository policies, OpenSearch access policies, etc. Do NOT use data "aws_iam_policy_document" or raw JSON heredoc strings.
This is enforced by conftest-policies/json_policy.rego across all examples and modules.
resource "aws_iam_policy" "example" {
policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Action = ["s3:GetObject"]
Resource = ["arn:aws:s3:::bucket/*"]
}]
})
}
resource "aws_s3_bucket_policy" "example" {
bucket = aws_s3_bucket.example.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Principal = "*"
Action = ["s3:GetObject"]
Resource = ["${aws_s3_bucket.example.arn}/*"]
}]
})
}data "aws_iam_policy_document" "example" {
statement {
actions = ["s3:GetObject"]
resources = ["arn:aws:s3:::bucket/*"]
}
}resource "aws_iam_policy" "example" {
policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": ["s3:GetObject"],
"Resource": ["arn:aws:s3:::bucket/*"]
}]
}
EOF
}Rationale: jsonencode keeps policy definition inline with the resource, is more readable, avoids extra data source lookups, and produces cleaner terraform plan output.
Example 05 shows production-grade Aurora PostgreSQL using the community terraform-aws-modules/rds-aurora/aws module with KMS encryption, Performance Insights, CloudWatch log exports, 35-day backup retention, and S3 lifecycle tiering. See examples/05-aws-complete/database.tf.
VPC Flow Logs capture network traffic metadata for security analysis, troubleshooting, and compliance. Example 05 enables flow logs using the VPC module's built-in support, sending logs to CloudWatch Logs with KMS encryption and 90-day retention. See examples/05-aws-complete/networking.tf.
We use S3 native locking with use_lockfile = true (requires Terraform 1.6+). This eliminates the need for a separate DynamoDB table for state locking.
Example backend configuration:
terraform {
backend "s3" {
bucket = "my-terraform-state"
key = "infrastructure"
region = "eu-west-1"
use_lockfile = true
encrypt = true
}
}The aws-bootstrap module still supports creating a DynamoDB table for backwards compatibility via create_dynamodb_table = true, but this is no longer the default.
The terraform-state-unlock.yaml workflow runs daily to detect and remove stale locks (locks older than 4 hours are considered stale and are automatically removed). Manual unlock is also available via workflow_dispatch for emergency situations.
We use a dedicated migrations.tf file per workspace for all state migrations (moved {}, import {}, removed {} blocks). This replaces manual terraform state mv and terraform import commands, which are error-prone and not reviewable in PRs.
Benefits:
- Migrations are versioned in git and go through the normal PR review process
- They are applied automatically during
terraform apply— no manual steps - Applied
movedblocks are harmless no-ops and serve as a refactoring history importblocks can be removed after they have been applied to all environments
See examples/05-aws-complete/migrations.tf for patterns including resource renames, module extractions, for_each key changes, and resource type upgrades.
.envrc in every folder using includes + correct AWS_PROFILE
We use tfenv to manage multiple terraform versions on our local workstations.
What kind of infra would be it if it's not sprinkled with some shell scripts?
Shellcheck is awesome to lint your scripts. That's why we use it in pre-commit.
Since we specify variables descriptions and types, it is easy to generate terraform documentation for all our modules:
terraform-docs markdown . > README.md
This is useful for some people and takes no effort on our side. We do this manually so far. Automating this and having this in pre-commit would be far better. I'm writing this here as a TODO.
Reusable Terraform modules in modules/:
| Module | Description |
|---|---|
aws-bootstrap |
S3 backend + optional DynamoDB table for state management |
certificate |
ACM certificate with DNS validation |
cloudtrail |
Multi-region CloudTrail with S3 storage, CloudWatch Logs, KMS encryption, and lifecycle rules |
cluster-autoscaler |
Kubernetes Cluster Autoscaler with IRSA |
eks |
EKS cluster with managed/self-managed node groups |
github-oidc |
GitHub Actions OIDC provider + IAM role |
kms |
Shared KMS key with key rotation, CloudWatch Logs and CloudTrail encryption |
wireguard-ec2 |
WireGuard VPN on EC2 with Packer AMI |
Basically just this
snake_casein terraform resource names (no convention for cloud resources names, often we usecamel-case)- don't repeat resource types in names,
resource "aws_route_table" "public_route_table"is ugly and long
these are (partially) enforced by tflint.
special thanks to