The project is organized into several key components:
./cmd: Contains the main CLI tool (Go implementation)./lib: Common Go libraries and utilities./python-pulumi: Python package with Pulumi infrastructure-as-code resources./examples: Example configurations for control rooms and workloads./e2e: End-to-end tests./docs: Documentation (see docs/README.md for structure)./docs/cli: CLI reference documentation./docs/team-operator: Team Operator documentation./docs/guides: How-to guides for common tasks./docs/infrastructure: Infrastructure documentation
./Justfile: Command runner file with various tasks (just -lto list commands)
The Team Operator is a Kubernetes operator that manages the deployment and configuration of Posit Team products within a Kubernetes cluster. It is maintained in a separate public repository: posit-dev/team-operator.
PTD consumes the Team Operator via its public Helm chart at oci://ghcr.io/posit-dev/charts/team-operator.
Testing with adhoc images: PR builds from posit-dev/team-operator publish adhoc images to GHCR. To test:
# In ptd.yaml cluster spec
adhoc_team_operator_image: "ghcr.io/posit-dev/team-operator:adhoc-{branch}-{version}"The PTD CLI uses Viper for configuration management. Configuration can be set via:
- CLI flags: Highest precedence (e.g.,
--targets-config-dir) - Environment variables: Second precedence (e.g.,
PTD_TARGETS_CONFIG_DIR) - Config file: Third precedence (
~/.config/ptd/ptdconfig.yaml) - Defaults: Lowest precedence
PTD expects target configurations in a targets directory. Configure it via:
# ~/.config/ptd/ptdconfig.yaml
targets_config_dir: /path/to/your/targetsOr via environment variable:
export PTD_TARGETS_CONFIG_DIR=/path/to/your/targetsOr via CLI flag:
ptd --targets-config-dir /path/to/your/targets ensure workload01The targets configuration directory must contain:
__ctrl__/: Control room configurations__work__/: Workload configurations
See examples/ for example configurations.
The Go CLI communicates the infrastructure path to Python Pulumi stacks via the PTD_ROOT environment variable:
- Go: Sets
PTD_ROOTinlib/pulumi/python.gowhen invoking Python - Python: Reads
PTD_ROOTinpython-pulumi/src/ptd/paths.py - Tests: Python tests must set
PTD_ROOTviamonkeypatch.setenv()
just deps: Install dependenciesjust check: Check all (includes linting and formatting)just test: Test alljust build: Build alljust format: Run automatic formatting
just check-python-pulumi: Check Python Pulumi code
just build-cmd: Build command-line tool
just test-cmd: Test command-line tooljust test-e2e: Run end-to-end tests (requires URL argument)just test-lib: Test library codejust test-python-pulumi: Test Python Pulumi code
just aws-unset: Unset all AWS environment variables
Always use git worktrees instead of plain branches. This enables concurrent Claude sessions in the same repo.
This repo is expected to live at ptd-workspace/ptd/. The ../.worktrees/ relative path resolves to ptd-workspace/.worktrees/ in that layout.
# New branch
git worktree add ../.worktrees/ptd-<branch-name> -b <branch-name>
# Existing remote branch
git worktree add ../.worktrees/ptd-<branch-name> <branch-name>Always prefix worktree directories with ptd- to avoid collisions with other repos.
- Build the binary — each worktree needs its own ptd binary:
cd ../.worktrees/ptd-<branch-name> just build-cmd
- direnv — if direnv is available, copy
envrc.recommendedto.envrcin the worktree, then rundirenv allow. The file usessource_upto inherit workspace vars and overridesPTDto point to the worktree. - For agents without direnv — set env vars explicitly before running
ptdcommands:export PTD="$(pwd)" export PATH="${PTD}/.local/bin:${PATH}"
# From the main checkout
git worktree remove ../.worktrees/ptd-<branch-name>- NEVER use
git checkout -bfor new work — alwaysgit worktree add - NEVER put worktrees inside the repo directory — always use
../.worktrees/ptd-<name> - ALWAYS rebuild the binary after creating a worktree (
just build-cmd) - Branch names: kebab-case, no slashes, no usernames (slashes break worktree directory paths)
Pod alerts (PodError, CrashLoopBackoff, DeploymentReplicaMismatch, etc.) are scoped to a minimal namespace allowlist to prevent false alerts from customer-deployed workloads:
Monitored Namespaces:
- Application:
posit-team,posit-team-system(direct customer impact) - Observability:
alloy,mimir,loki,grafana(failures cause monitoring blindness)
PromQL Filter: {namespace=~"posit-team|posit-team-system|alloy|mimir|loki|grafana"}
Why Infrastructure Namespaces Are Excluded: Infrastructure namespaces (Calico, Traefik, kube-system) are excluded because their failures manifest as application failures, avoiding redundant alerts. For example:
- CNI failure → Network breaks → Application pods fail → Alert fires for application namespace
- Ingress failure → HTTP checks fail →
Healthchecksalert fires
Alert Configuration: Alert definitions are in python-pulumi/src/ptd/grafana_alerts/*.yaml. All pod-related alerts in pods.yaml include the namespace filter in their PromQL queries.
When contributing to the project:
- Ensure that Snyk tests pass before merging a PR
- Follow the development workflows described in the repository files
- Use the provided Justfiles for common tasks
- Always run
just formatbefore committing changes to ensure code style consistency
- LLM coding instructions shared with copilot: .github/copilot/copilot-instructions.md
- Follow the template in .github/pull_request_template.md to format PR descriptions correctly
Brief pointer section:
- Config Flow: How YAML config flows through Go to Python → See
docs/architecture/config-flow.md - Step Dependencies: Deployment pipeline ordering and why → See
docs/architecture/step-dependencies.md - Pulumi Conventions: Resource naming, Output handling, autoload pattern → See
docs/architecture/pulumi-conventions.md
- NEVER change the first argument (logical name) to a Pulumi resource constructor without understanding state implications
- Changing
aws.s3.Bucket("my-bucket-name", ...)toaws.s3.Bucket("different-name", ...)causes Pulumi to DELETE the old bucket and CREATE a new one - This applies to ALL resources: VPCs, RDS instances, S3 buckets, IAM roles, EKS clusters, etc.
- If you need to rename a resource, discuss the state migration strategy first
- Adding/modifying a config option requires changes in BOTH:
- Go: Struct in
lib/types/workload.go(with YAML struct tags) - Python: Dataclass in
python-pulumi/src/ptd/aws_workload.pyorpython-pulumi/src/ptd/__init__.py
- Go: Struct in
- Field names must match: Go YAML tags (snake_case) = Python dataclass field names
- There is no automated validation between the two — mismatches fail at runtime
AWSEKSClusteruses a builder pattern wherewith_*()methods have ordering dependencies- Example:
with_node_role()MUST be called beforewith_node_group()(setsself.default_node_role) - Check method dependencies before reordering calls
AWS:
- IAM roles:
f"{purpose}.{compound_name}.posit.team" - S3 buckets:
f"{compound_name}-{purpose}" - EKS clusters:
f"default_{compound_name}-control-plane" - All naming methods are on
AWSWorkloadclass inpython-pulumi/src/ptd/aws_workload.py
Azure:
- Resource Groups:
f"rsg-ptd-{sanitized_name}" - Key Vault:
f"kv-ptd-{name[:17]}"(max 24 chars) - Storage Accounts:
f"stptd{name_no_hyphens[:19]}"(NO hyphens, max 24 chars) - VNets:
f"vnet-ptd-{compound_name}" - All naming methods are on
AzureWorkloadclass inpython-pulumi/src/ptd/azure_workload.py - Azure tags must use
azure_tag_key_format()which converts.to/
Do NOT introduce new naming patterns — follow existing conventions
Go generates __main__.py dynamically (see lib/pulumi/python.go:WriteMainPy):
import ptd.pulumi_resources.<module>
ptd.pulumi_resources.<module>.<Class>.autoload()- Module:
{cloud}_{target_type}_{step_name}(e.g.,aws_workload_persistent) - Class:
{Cloud}{TargetType}{StepName}(e.g.,AWSWorkloadPersistent) __main__.pyis NOT in source control — it's generated at runtime
AWS (EKS):
- Uses builder pattern with
with_*()methods - Builder methods have ordering dependencies (e.g.,
with_node_role()must come beforewith_node_group()) - EKS step is Python-based (
AWSEKSClusterclass)
Azure (AKS):
- AKS step is Go-based (
lib/steps/aks.go) unlike the Python-based EKS step - Azure persistent resources use simple
_define_*()methods (no builder pattern) - No ordering dependencies between
_define_*()methods
- Resource properties return
Output[T], not plain values - Use
.apply(lambda x: ...)to transform; cannot use in f-strings directly - Combine with
pulumi.Output.all(a, b).apply(lambda args: ...)
Steps run sequentially via ptd ensure:
bootstrap(Go) → 2.persistent(Python) → 3.postgres_config(Python) → 4.eks/aks→ 5.clusters→ 6.helm→ 7.sites→ 8.persistent_reprise(Go)
Each step produces outputs consumed by later steps. See docs/architecture/step-dependencies.md.
- Use
pulumi.runtime.set_mocks()for Pulumi resource tests - For Go→Python integration details, see the "Go→Python Integration" section above
- Tests must set
PTD_ROOTviamonkeypatch.setenv("PTD_ROOT", ...) - See
python-pulumi/tests/for examples - Run:
just test-python-pulumi
- Create
python-pulumi/src/ptd/pulumi_resources/<cloud>_<target_type>_<step_name>.py - Define a class inheriting from
pulumi.ComponentResource - Implement
@classmethod autoload(cls)that reads stack name and constructs workload - Add corresponding step in
lib/steps/ - Register step in
WorkloadStepsorControlRoomStepsinlib/steps/steps.go
These files are large and require careful context management:
AWS:
pulumi_resources/aws_eks_cluster.py(~2580 lines) — EKS cluster provisioning with builder patternpulumi_resources/aws_workload_persistent.py(~1454 lines) — VPC, RDS, S3, IAMpulumi_resources/aws_workload_helm.py(~1390 lines) — Helm chart deployments (AWS)__init__.py(~1275 lines) — Base types, constants, utility functionsaws_workload.py(~815 lines) — AWS workload config and naming conventions
Azure:
pulumi_resources/azure_workload_persistent.py(~817 lines) — VNet, Postgres, Storage, ACRpulumi_resources/azure_workload_helm.py(~675 lines) — Helm chart deployments (Azure)azure_workload.py(~398 lines) — Azure workload config and naming with strict char limits