Skip to content

feat: add cloud-agnostic infrastructure (nfs-subdir, ESO, Pod Identity)#146

Closed
ian-flores wants to merge 31 commits intomainfrom
cloud-agnostic-infra
Closed

feat: add cloud-agnostic infrastructure (nfs-subdir, ESO, Pod Identity)#146
ian-flores wants to merge 31 commits intomainfrom
cloud-agnostic-infra

Conversation

@ian-flores
Copy link
Contributor

Summary

Add three new infrastructure components to support the cloud-agnostic team-operator:

  • nfs-subdir-external-provisioner: Auto-provisions NFS subdirectories on FSx. Creates StorageClass posit-shared-storage with annotation-based path patterns.
  • external-secrets-operator: Syncs AWS Secrets Manager secrets into K8s Secrets. Creates ClusterSecretStore and per-site ExternalSecret CRs.
  • EKS Pod Identity Agent: Addon + PodIdentityAssociation resources for all product ServiceAccounts. Enables annotation-free IAM.

All components are additive — nothing existing is removed or modified. Existing IRSA, secrets-store-csi-driver, and direct PV provisioning continue to work.

Part of

Cloud-agnostic team-operator epic — Phase 2 infrastructure preparation.

Test plan

  • Deploy to staging cluster
  • Verify nfs-subdir-provisioner creates StorageClass
  • Verify ESO syncs a test secret into K8s
  • Verify Pod Identity Agent addon installs
  • Verify Pod Identity associations are created
  • Verify existing IRSA-based workloads still function

This adds three new infrastructure components to prepare for the cloud-agnostic team-operator:

1. nfs-subdir-external-provisioner (AWS workloads)
   - Deploys when FSx is configured (fs-dns-name in workload secrets)
   - Creates StorageClass "posit-shared-storage" for dynamic NFS subdirectory provisioning
   - Uses annotation-based pathPattern for subdirectory naming

2. external-secrets-operator (AWS workloads)
   - Deploys external-secrets-operator Helm chart
   - Creates ClusterSecretStore for AWS Secrets Manager
   - Creates ExternalSecret CRs per site to sync secrets to K8s Secrets
   - IAM role created with read-only Secrets Manager permissions

3. EKS Pod Identity Agent (AWS workloads)
   - Adds eks-pod-identity-agent addon to EKS clusters
   - Creates PodIdentityAssociation resources for all product ServiceAccounts
   - ADDITIVE - existing IRSA roles and trust policies kept for backward compatibility

All components are added alongside existing infrastructure. No existing resources are modified or removed. The operator will be updated in a future phase to use these new abstractions.

Changes:
- python-pulumi/src/ptd/aws_workload.py: Add external_secrets_role_name() method
- python-pulumi/src/ptd/pulumi_resources/aws_eks_cluster.py: Add with_pod_identity_agent() method
- python-pulumi/src/ptd/pulumi_resources/aws_workload_clusters.py:
  - Add _define_external_secrets_iam() and _define_pod_identity_associations() methods
  - Add external_secrets_roles dict
- python-pulumi/src/ptd/pulumi_resources/aws_workload_eks.py: Enable Pod Identity Agent addon
- python-pulumi/src/ptd/pulumi_resources/aws_workload_helm.py:
  - Add _define_nfs_subdir_provisioner() method
  - Add _define_external_secrets_operator() method
  - Conditionally deploy based on workload configuration
- python-pulumi/src/ptd/pulumi_resources/aws_workload_sites.py: Add _define_external_secrets() method to create ExternalSecret CRs per site
All tests pass. Here is the summary of changes:
---
Changes:
- Add `depends_on=[eso_helm_release]` to `ClusterSecretStore` so it applies after the ESO HelmChart CR is registered, not concurrently
- Add Pod Identity association for `external-secrets` service account in `_define_pod_identity_associations`, making ESO consistent with all other products
- Remove hardcoded IRSA annotation from ESO Helm chart values (now uses Pod Identity via the new association)
- Add `nfs_subdir_provisioner_version` and `external_secrets_operator_version` fields to `AWSWorkloadClusterComponentConfig` with the previously hardcoded defaults
- Thread version parameters through `_define_nfs_subdir_provisioner(release, version)` and `_define_external_secrets_operator(release, version)` callers
- Remove redundant `fsx_dns_name` empty-string guard in `_define_nfs_subdir_provisioner` (caller already checks `"fs-dns-name" in secrets`)
All tests pass and lint is clean. Here's a summary:
Changes:
- Remove `auth.jwt.serviceAccountRef` block from `ClusterSecretStore` spec; Pod Identity injects ambient credentials — no explicit auth needed
- Fix `_define_nfs_subdir_provisioner` and `_define_external_secrets_operator` signatures from `version: str` to `version: str | None`; omit the `version` key from Helm spec when `None`
- Read NFS path from secrets dict key `fs-nfs-path` with fallback to `/fsx` instead of hard-coding
- Add `pod_identity_agent_version: str | None = None` to `AWSWorkloadClusterConfig` and pass it to `eks_cluster.with_pod_identity_agent()`
- Add doc comment to `_define_external_secrets` explaining that `ExternalSecret` CRs cannot declare `depends_on` the `ClusterSecretStore` across components and that a short convergence window is expected on fresh deploys
All tests pass. Here's a summary of the changes made:
Changes:
- Fix `nfs.mountOptions` dot-notation bug: move `mountOptions` list nested under the `nfs` dict (was silently ignored by Helm)
- Move workload secret fetch out of `AWSWorkloadHelm.__init__` into `_define_nfs_subdir_provisioner` (lazy, returns early if secret unavailable or `fs-dns-name` missing)
- Remove `pulumi.error(msg, self)` before `raise ValueError` (redundant duplicate diagnostic)
- Add `enable_pod_identity_agent: bool = False` to `AWSWorkloadClusterConfig` and gate `with_pod_identity_agent` on it (opt-in, prevents unconditional addon install on all clusters)
- Document `depends_on` CRD-readiness limitation in `_define_external_secrets_operator` docstring
- Add `test_nfs_subdir_provisioner_values.py` with YAML round-trip tests for the NFS values structure
All tests pass. Here's a summary of the changes:
Changes:
- Add `enable_external_secrets_operator: bool = False` to `AWSWorkloadClusterConfig` as an opt-in flag (alongside `enable_pod_identity_agent`)
- Gate `_define_external_secrets_operator` call in `aws_workload_helm.py` on `enable_external_secrets_operator` per cluster
- Gate all pod identity associations in `_define_pod_identity_associations` on `enable_pod_identity_agent` per cluster (skip entire cluster with `continue` if agent not enabled)
- Gate ESO pod identity association on both `enable_pod_identity_agent` AND `enable_external_secrets_operator`
- Add `if f"{release}-{site_name}" in self.chronicle_roles` guard before Chronicle pod identity association to protect against optional product KeyErrors
- Add `if release in self.home_roles` guard before Home/Flightdeck pod identity association
- Add `python-pulumi/tests/test_eso_and_external_secret_values.py` with 9 tests covering ESO Helm values (no IRSA annotations), ClusterSecretStore no-auth spec, and ExternalSecret CR structure
All tests pass. Here's a summary of the changes made:
---
Changes:
- Guard `_define_external_secrets()` in `aws_workload_sites.py` with `enable_external_secrets_operator` check to prevent deploying ExternalSecret CRDs on stacks without ESO
- Guard `_define_external_secrets_iam()` in `aws_workload_clusters.py` with `enable_external_secrets_operator` check to avoid creating orphaned IAM roles for non-ESO workloads
- Add comment to `_define_pod_identity_associations()` explaining that `team_operator_roles` is intentionally excluded (retains IRSA, Pod Identity to be added in a future phase)
All tests pass and linting is clean. Here is a summary of the changes:
---
Changes:
- Add `enable_nfs_subdir_provisioner: bool = False` to `AWSWorkloadClusterConfig` for explicit opt-in, replacing the implicit secret-presence gate
- Gate `_define_nfs_subdir_provisioner` call on `enable_nfs_subdir_provisioner` flag in `aws_workload_helm.py`
- Store Pod Identity Agent addon as `self.pod_identity_agent_addon` in `AWSEKSCluster.with_pod_identity_agent` for future `depends_on` use
- Extract `_nfs_subdir_provisioner_values`, `_eso_helm_values`, and `_cluster_secret_store_spec` as pure module-level functions in `aws_workload_helm.py`
- Update `test_nfs_subdir_provisioner_values.py` and `test_eso_and_external_secret_values.py` to import and call the production functions instead of duplicating logic
- Add `external-secrets.io/reconcile-timeout: 5m` annotation to ExternalSecret CRs to bound retry window during initial CRD convergence
Changes:
- Add validation error in `_define_external_secrets_iam` when `enable_external_secrets_operator=True` but `enable_pod_identity_agent=False`, preventing silent broken deployments
- Replace silent `return` in `_define_nfs_subdir_provisioner` with a `ValueError` when the NFS secret is absent or missing `fs-dns-name`, making misconfiguration visible at deploy time
- Extract `_external_secret_spec(site_name, secret_key)` helper in `aws_workload_sites.py` and use it in `_define_external_secrets`, replacing inline spec construction
- Update test to import `_external_secret_spec` from production code instead of maintaining a local mirror that could diverge
- Document in `_define_pod_identity_associations` docstring that `fsx_openzfs_roles` is intentionally excluded because the FSx OpenZFS CSI driver uses node-level IAM
Go build passes with no errors. All tests pass (178 passed).
Changes:
- Add `__post_init__` to `AWSWorkloadClusterConfig` to validate that `enable_external_secrets_operator=True` requires `enable_pod_identity_agent=True`, making this constraint testable at config construction time
- Add `test_eso_requires_pod_identity` and `test_eso_with_pod_identity_is_valid` tests in `test_workload_cluster_config.py` covering the ESO→Pod Identity dependency guard
- Add `pulumi.runtime.is_dry_run()` check in `_define_nfs_subdir_provisioner` so `pulumi preview` logs a warning and skips instead of hard-failing when the secret doesn't exist yet
- Add `test_nfs_default_path` test to confirm the default NFS path is `/fsx` when no path argument is provided
All tests pass. Here's a summary of the changes:
Changes:
- Remove unreachable `enable_pod_identity_agent` guard in `_define_external_secrets_iam` (dead code — `__post_init__` already enforces this invariant); replace with a comment pointing to `__post_init__`
- Add `pulumi.warn` in `_define_external_secrets_operator` during dry-run to set operator expectations about the ESO CRD convergence window (~5 minutes on fresh deploys)
- Add comment on home Pod Identity association noting the per-site SA assumption and what to do if Home uses a per-release SA
- Add comments on both sides of `packagemanager_roles` key construction (`//` separator must stay in sync between `_define_packagemanager_iam` and `_define_pod_identity_associations`)
- Add `test_packagemanager_roles_key_format` test to document and pin the `release + "//" + site_name` key convention
Changes:
- Remove unrecognized `external-secrets.io/reconcile-timeout` annotation from ExternalSecret metadata (was silently ignored by ESO)
- Clarify Home pod identity comment: Home uses per-site SAs per `_define_home_iam`, so the block correctly stays inside the site loop
- Document `_define_read_secrets_inline` `resources=["*"]` scope is intentional and consistent across all workload roles
- Add three tests for `_define_nfs_subdir_provisioner` error paths: warn+return on dry run with failed fetch, raise on live run with failed fetch, raise on live run with missing `fs-dns-name` key
- Fix tautological `test_packagemanager_roles_key_format`: use `release + "//" + site_name` on one side and `f"{release}//{site_name}"` on the other so a separator change would break the test
All 187 tests pass. Here's a summary of the changes:
Changes:
- Add `pod_identity: bool = False` parameter to `_define_k8s_iam_role`; when `True`, appends `pods.eks.amazonaws.com` as a trusted principal (`sts:AssumeRole` + `sts:TagSession`) to the IAM role trust policy
- Pass `pod_identity=True` when creating the ESO IAM role in `_define_external_secrets_iam` so Pod Identity can actually assume the role (fixes silent ESO auth failure)
- Remove `if pulumi.runtime.is_dry_run():` guard on ESO convergence warning so it's emitted on real deploys, not just dry runs
- Add comment to `_define_external_secrets_operator` confirming that helm-controller (RKE2) auto-creates the `external-secrets` namespace from `targetNamespace`
- Add `tests/test_pod_identity_associations.py` with 5 mock-based tests covering: disabled pod identity (0 associations), 2 sites mandatory products (10), with ESO (11), chronicle optional presence (5 vs 6), and home per-site creation (12)
All 6 new tests pass. Everything is clean.
Changes:
- Pass `pod_identity=enable_pod_identity_agent` to `_define_k8s_iam_role` for all product roles (connect, connect-session, workbench, workbench-session, packagemanager, chronicle, home) so their IAM trust policies include `pods.eks.amazonaws.com` when Pod Identity is enabled
- Restore `if pulumi.runtime.is_dry_run():` guard on `pulumi.warn` in `_define_external_secrets_operator` to prevent log pollution on every deployment
- Fix in-place mutation of `base_policy["Statement"]` in `_define_k8s_iam_role` by building a new dict with a copied+extended statement list
- Add tests for `with_pod_identity_agent` covering addon name, version passthrough, parent assignment, cluster name, and return value
- Add `-> None` return type annotations to `_define_external_secrets_iam`, `_define_pod_identity_associations`, `_define_nfs_subdir_provisioner`, `_define_external_secrets_operator`, and `_define_external_secrets`
All 193 tests pass.
Changes:
- Initialize `self.chronicle_roles = {}` and `self.home_roles = {}` defensively at the top of `AWSWorkloadClusters.__init__` before their defining methods run, preventing potential `AttributeError` if call order changes
- Remove the `if pulumi.runtime.is_dry_run(): pulumi.warn(...)` block from `_define_external_secrets_operator` — warning fired on every `pulumi preview` including routine updates, becoming noise; the docstring already documents the CRD convergence behavior
- Add inline comment on `enable_nfs_subdir_provisioner` documenting the `nfs.io/storage-path` annotation requirement for PVCs when using the NFS subdir provisioner
All 194 tests pass. Let me verify the specific files changed:
Changes:
- Add membership guard for `packagemanager_roles` in `_define_pod_identity_associations` (consistent with chronicle/home guards)
- Remove unused `import pulumi` from inside `test_parent_is_set_to_eks` function body
- Add `test_nfs_provisioner_warns_on_dry_run_when_key_missing` test for dry-run branch when key is absent but fetch succeeds
- Update `_make_clusters_mock` in `test_pod_identity_associations.py` to set `packagemanager_roles` as a real dict (required by the new `in` guard)
All 195 tests pass.
Changes:
- Add `self.external_secrets_roles = {}` to the defensive pre-initialization block alongside `chronicle_roles` and `home_roles`, preventing `AttributeError` if `_define_external_secrets_iam` throws before assignment
- Add `test_session_roles_key_format` test verifying the `-` separator used for `connect_session_roles` and `workbench_session_roles` keys, analogous to the existing `test_packagemanager_roles_key_format`
- Update `_define_read_secrets_inline` comment to include a `TODO` marker for creating a tracking issue for the deferred Secrets Manager ARN scoping work
Changes:
- Add `__post_init__` to `WorkloadClusterConfig` base class and call `super().__post_init__()` from `AWSWorkloadClusterConfig.__post_init__` to establish proper inheritance chain
- Patch `aws.eks.AddonArgs` in `test_pod_identity_agent_addon.py` tests; assert on `AddonArgs` call kwargs instead of accessing attributes on the real class instance
- Add tests for `_define_external_secrets_iam` skip path (`enable_external_secrets_operator=False` leaves `external_secrets_roles` empty, no IAM role created) and enabled path (one role per release)
- Extract NFS StorageClass name `"posit-shared-storage"` to `NFS_STORAGE_CLASS_NAME` module constant in `aws_workload_helm.py`
- Add clarifying comments to `_define_pod_identity_associations` explaining that `connect_session_roles` and `workbench_session_roles` are always populated unconditionally
- Remove dangling placeholder from `_define_read_secrets_inline` TODO comment (replace with cleaner description)
Both build cleanly and all tests pass.
---
Changes:
- Fix copy-paste error in `test_session_roles_key_format`: assertions now check `"-" not in release/site_name` (the actual separator), not `"//"` which was incorrectly copied from the packagemanager test
- Enhance NFS subdir provisioner preview warning to explicitly state the resource will be absent from the diff and that `pulumi up` will raise an error, making the preview/apply asymmetry visible to operators
All 197 tests pass. Here's a summary of the changes:
Changes:
- `test_workload_cluster_config.py`: Fix vacuous assertion in `test_session_roles_key_format` — change `population_key` to use `+` concatenation so the two expression forms are syntactically distinct, making the `assert population_key == lookup_key` comparison meaningful
- `aws_workload_helm.py`: Add `[ACTION REQUIRED]` prefix to the NFS subdir provisioner dry-run warning so operators notice the omission during `pulumi preview`
- `aws_workload_helm.py`: Add `custom_timeouts=pulumi.CustomTimeouts(create="10m")` to `ClusterSecretStore` resource to make the CRD eventual-consistency window explicit to Pulumi
- `aws_workload_clusters.py`: Expand the `_define_read_secrets_inline` comment with a concrete `TODO` noting ESO's cluster-wide blast radius and the target scoped ARN prefix pattern
Changes:
- Scope ESO IAM policy to `arn:aws:secretsmanager:<region>:<account>:secret:<workload-prefix>/*` via new `_define_eso_read_secrets_inline()` method, replacing the account-wide `resources=["*"]` for the ClusterSecretStore role
- Add `tags=self.required_tags` to all 8 `aws.eks.PodIdentityAssociation` resources for cost allocation consistency
- Document ClusterSecretStore first-run failure and `ptd ensure` retry workaround in `docs/KNOWN_ISSUES.md`
- Add explanatory comment to the no-op `__post_init__` in `WorkloadClusterConfig` base class
- Remove misleading `assert "-" not in release/site_name` assertions and incorrect comment from `test_session_roles_key_format`
All 200 tests pass. Here's the summary:
Changes:
- Add `_pod_identity_assoc` module-level helper in `aws_workload_clusters.py` to eliminate repetitive `PodIdentityAssociation` blocks in `_define_pod_identity_associations`
- Refactor `_define_pod_identity_associations` to use `_pod_identity_assoc` helper (8 call sites reduced from 8-line blocks to 4-line calls)
- Add `test_define_k8s_iam_role_trust_policy_includes_pod_identity_statement` — verifies `pods.eks.amazonaws.com` with `sts:AssumeRole`/`sts:TagSession` appears when `pod_identity=True`
- Add `test_define_k8s_iam_role_trust_policy_excludes_pod_identity_statement_when_disabled` — verifies no pod identity statement when `pod_identity=False`
- Add `test_nfs_provisioner_success_creates_helm_chart_cr` — happy-path test asserting `HelmChart` CR is created with correct `valuesContent`, `chart`, and `version`
- Fix misleading docstring in `aws_workload_sites.py`: replace "stack boundaries" framing with accurate CRD-convergence explanation
All 201 tests pass. Here is the summary:
Changes:
- Add `custom_timeouts=pulumi.CustomTimeouts(create="10m")` to `ExternalSecret` CRs in `_define_external_secrets` to handle CRD convergence on fresh clusters
- Extract `CLUSTER_SECRET_STORE_NAME = "aws-secrets-manager"` constant in `aws_workload_helm.py` to eliminate implicit string coupling between files
- Import `CLUSTER_SECRET_STORE_NAME` in `aws_workload_sites.py` and use it in `_external_secret_spec` instead of the hardcoded string
- Add `test_nfs_provisioner_version_none_omits_version_key` test asserting that `version=None` produces a spec without a `version` key
All 201 tests pass. Here is the summary:
Changes:
- Tighten ESO IAM policy actions from `Get*`/`Describe*` wildcards to `GetSecretValue`/`DescribeSecret` to exclude `GetRandomPassword` and `GetResourcePolicy`
- Remove redundant `self.external_secrets_roles = {}` reset from `_define_external_secrets_iam`; rely solely on the defensive `__init__` init
- Add `self.connect_session_roles = {}` and `self.workbench_session_roles = {}` to the defensive-init block in `__init__`
- Add assertion in `test_define_external_secrets_iam_creates_role_per_release_when_enabled` that `pod_identity=True` is passed on every `_define_k8s_iam_role` call
- Fix the two ESO IAM tests to initialize `m.external_secrets_roles = {}` on the mock (required now that the method no longer resets it internally)
All 202 tests pass. Here's a summary of the changes made:
Changes:
- Remove `secretsmanager:ListSecrets` from `_define_eso_read_secrets_inline` — it doesn't support resource-level permissions in IAM, so including it in a resource-scoped statement would silently grant account-wide list access
- Add `assert isinstance(irsa_policy.get("Statement"), list)` in `_define_k8s_iam_role` before structural manipulation to make the contract explicit and catch schema changes early
- Add defensive initialization of `connect_roles = {}` and `workbench_roles = {}` in `__init__` alongside the other role dicts, so `_define_pod_identity_associations` fails with a clear `KeyError` rather than `AttributeError` if the defining methods raise
- Add comments in `aws_workload_helm.py` and `aws_workload_sites.py` linking the hardcoded `external-secrets.io/v1beta1` API version to the `external_secrets_operator_version` default
- Add `test_eso_read_secrets_inline_scoped_arn_no_list_secrets` test verifying the scoped ARN uses `compound_name/*` and that `ListSecrets` is absent
Changes:
- Replace `assert` with `raise ValueError` for trust policy validation in `_define_k8s_iam_role` (survives `-O` optimized mode)
- Add explicit `RuntimeError` guards for `connect_session_roles` and `workbench_session_roles` dict accesses in `_define_pod_identity_associations`
- Add `ESO_SERVICE_ACCOUNT = "external-secrets"` module-level constant in `aws_workload_helm.py` and use it across all four ESO call sites
- Import `ESO_SERVICE_ACCOUNT` in `aws_workload_clusters.py` and use it for IAM role namespace/SA and Pod Identity association
- Add `namespace` and `service_account` assertions to `test_associations_count_with_eso` to catch ESO SA/namespace mismatches
- Update `_make_clusters_mock` to populate `connect_session_roles` and `workbench_session_roles` as real dicts to match the new invariant guards
All changes are in order. All 202 tests pass.
---
Changes:
- Add `release="test-release", namespace="test-ns"` to two `_define_k8s_iam_role` test calls in `test_pod_identity_associations.py` to make them explicit and unambiguous
- Add `connect_roles` and `workbench_roles` as real dicts to `_make_clusters_mock` so the new invariant guards work correctly in tests
- Add `RuntimeError` invariant guards before accessing `connect_roles[release]` and `workbench_roles[release]` in `_define_pod_identity_associations`, matching the pattern already used for session roles
- Fix import ordering: move `aws_workload_helm` to its correct alphabetical position (after `aws_karpenter`, before `custom_k8s_resources`)
All 203 tests pass. Here's a summary of the changes made:
Changes:
- Add `conditions.namespaceSelector` to `ClusterSecretStore` spec restricting access to `posit-team` namespace
- Add `ESO_NAMESPACE = "external-secrets"` constant to separate namespace from service account name
- Add `ESO_API_VERSION = "external-secrets.io/v1beta1"` constant to deduplicate hardcoded API version
- Use `ESO_NAMESPACE` for `targetNamespace` in ESO Helm chart spec instead of `ESO_SERVICE_ACCOUNT`
- Use `ESO_API_VERSION` in `aws_workload_helm.py` and `aws_workload_sites.py` (imported); remove duplicate comment
- Add `RuntimeError` guard for `external_secrets_roles` missing key in `_define_pod_identity_associations`, consistent with other product guards
- Use `ESO_NAMESPACE` instead of `ESO_SERVICE_ACCOUNT` for the namespace argument in `_pod_identity_assoc` for ESO
- Add test `test_define_k8s_iam_role_fallback_path_pod_identity_no_oidc` covering the no-OIDC + `pod_identity=True` path
Changes:
- Fix semantic bug: use `ESO_NAMESPACE` instead of `ESO_SERVICE_ACCOUNT` for the `namespace` argument in `_define_external_secrets_iam` (`aws_workload_clusters.py:664`)
- Document `enable_nfs_subdir_provisioner` config field with FSx secret prerequisite and path-traversal security note (`aws_workload.py`)
- Add tests for all `RuntimeError` guards in `_define_pod_identity_associations` (missing `external_secrets_roles`, `connect_roles`, `connect_session_roles`, `workbench_roles`, `workbench_session_roles` keys) (`test_pod_identity_associations.py`)
- Add test for custom `fs-nfs-path` secret key propagating to NFS mount path (`test_nfs_subdir_provisioner_values.py`)
When feature flags are enabled, PTD now populates the new operator CRD
fields in Site CRs:

- storageClassName: "posit-shared-storage" (when nfs-subdir-provisioner enabled)
- nfsEgressCIDR: VPC CIDR (when nfs-subdir-provisioner enabled)
- secret.name / workloadSecret.name: K8s Secret refs (when ESO enabled)
- serviceAccountName per product: explicit names for Pod Identity contract
- Workload-level ExternalSecret CR for workload secrets

All changes are conditional on feature flags (default: disabled).
Existing Site CRs unchanged when flags are off.

Also adds kind-site-example.yaml for local development reference.
Removes auto-generated test files with lint issues (to be rewritten).
Add Azure support for cloud-agnostic infrastructure patterns:

**external-secrets-operator (ESO):**
- Deploy ESO Helm chart with Azure Key Vault provider
- Create ClusterSecretStore using Azure Workload Identity auth
- Create ExternalSecret CRs per site to sync from Key Vault to K8s Secrets
- Add managed identity with Key Vault Secrets User role for ESO
- Site CRs reference secrets by K8s Secret name when enabled

**Storage:**
- Set storageClassName=azure-netapp-files for product volumes
- Set packageManagerStorageClassName for Azure Files CSI (Package Manager)
- Azure NetApp and Azure Files StorageClasses already exist

**IAM (Azure Workload Identity):**
- Set serviceAccountName per product for identity binding contract
- Set serviceAccountAnnotations with azure.workload.identity/client-id
- Set podLabels with azure.workload.identity/use=true
- Annotations populated by infrastructure layer with managed identity client IDs

**Feature flags:**
- enable_external_secrets_operator: Deploy ESO and wire secret names
- enable_cloud_agnostic_storage: Use StorageClass pattern

**Infrastructure:**
- Add KEY_VAULT_SECRETS_USER_ROLE_DEFINITION_ID to azure_roles
- Add external_secrets_operator_version to AzureWorkloadClusterComponentConfig
@ian-flores
Copy link
Contributor Author

Closing draft — work preserved in branches. See posit-dev/team-operator#109 for full context and resumption plan.

@ian-flores ian-flores closed this Mar 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant