Skip to content

test: add bare-metal custom S3 dataplane values test case#253

Draft
mhotan wants to merge 1 commit intomainfrom
mike/baremetal-custom-s3-test
Draft

test: add bare-metal custom S3 dataplane values test case#253
mhotan wants to merge 1 commit intomainfrom
mike/baremetal-custom-s3-test

Conversation

@mhotan
Copy link
Contributor

@mhotan mhotan commented Feb 26, 2026

Summary

  • Adds a test case (dataplane.baremetal-custom-s3) for a bare-metal deployment using custom S3-compatible storage (stow provider)
  • Disabled components: prometheus, flytepropeller, fluentbit, namespaces
  • Enabled: executor (replaces propeller), imageBuilder + buildkit, custom namespace_mapping
  • Values are intentionally kept as-is to surface issues found during deployment review

Known issues in these values

See review comments for details. A follow-up PR will address fixes.

Test plan

  • make generate-expected passes
  • make helm-test passes
  • Review generated manifest for correctness

🤖 Generated with Claude Code

Adds a test case for a bare-metal deployment using custom S3-compatible
storage (stow), with prometheus disabled, flytepropeller disabled (executor
mode), and custom namespace mapping. Intentionally preserves values as-is
to surface issues during review.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@aviator-app
Copy link
Contributor

aviator-app bot commented Feb 26, 2026

Current Aviator status

Aviator will automatically update this comment as the status of the PR changes.
Comment /aviator refresh to force Aviator to re-examine your PR (or learn about other /aviator commands).

This pull request is currently open (not queued).

How to merge

To merge this PR, comment /aviator merge or add the mergequeue label.


See the real-time status of this PR on the Aviator webapp.
Use the Aviator Chrome Extension to see the status of your PR within GitHub.

Copy link
Contributor Author

@mhotan mhotan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deployment Issue Analysis

Review of the generated manifest at tests/generated/dataplane.baremetal-custom-s3.yaml reveals these issues:


🔴 HIGH — imageBuilder.defaultRepository has https:// protocol prefix

Values line: defaultRepository: "https://ghcr.io/acme-corp/acme/union"

Container registries expect a bare hostname/path, not a URL. BuildKit will fail to push images because https://ghcr.io/... is not a valid image reference.

Fix: defaultRepository: "ghcr.io/acme-corp/acme/union"


🔴 HIGH — storage.region not overridden at top level

The custom stow config has region: RNO2A, but the chart's top-level storage.region defaults to us-east-1. The operator's clusterData.bucketRegion resolves to the top-level value:

bucketRegion: 'us-east-1'   # ← WRONG, should be RNO2A

Fix: Add region: RNO2A under the top-level storage: key.


🔴 HIGH — Duplicate container: key in rendered storage config

The storage template emits container: from storage.bucketName, then the custom provider block re-emits it from storage.custom.container:

storage:
  container: "union"    # from storage.bucketName
  container: union      # from storage.custom  ← DUPLICATE

This is invalid YAML. Most parsers silently take the last value, but strict parsers will reject it.

Fix: Remove container: union from storage.custom (it's already provided by storage.bucketName).


🟡 MEDIUM — prometheus.enabled: false but OpenCost still enabled

OpenCost defaults to enabled: true and is not disabled in these values. The rendered manifest creates a full OpenCost deployment pointing at a non-existent Prometheus service:

- name: PROMETHEUS_SERVER_ENDPOINT
  value: "http://union-operator-prometheus.union.svc:80/prometheus"

OpenCost pods will crashloop.

Fix: Add opencost: { enabled: false }


🟡 MEDIUM — Operator references non-existent Prometheus service

Even with OpenCost fixed, the operator's dependenciesHeartbeat.prometheus.endpoint and the global PROMETHEUS_SERVICE_URL env var still reference http://union-operator-prometheus:80 which does not exist. This affects all 5 deployments (executor, webhook, proxy, operator, syncresources).

Fix: Override config.operator.dependenciesHeartbeat.prometheus to remove/disable, or accept operator will log connection errors.


🟡 MEDIUM — AWS IAM role annotations on a bare-metal cluster

The default userRoleAnnotationKey (eks.amazonaws.com/role-arn) and userRoleAnnotationValue (arn:aws:iam::ACCOUNT_ID:role/flyte_project_role) are rendered into cluster resource sync templates. On bare-metal these are meaningless.

Fix: Add userRoleAnnotationKey: "" and userRoleAnnotationValue: ""


🟢 LOW — monitoring.enabled and cost.enabled default to true

Creates orphaned ServiceMonitor resources with no Prometheus to scrape them.

Fix: Add monitoring: { enabled: false } and cost: { enabled: false }


🟢 LOW — Security: S3 credentials in ConfigMaps

The stow access key and secret key appear in plain text in 4 ConfigMap data entries. ConfigMaps are not encrypted at rest even with K8s secret encryption.


🐛 BUG (chart template) — namespace_config.yaml not processed through tpl

In charts/dataplane/templates/clusterresourcesync/configmap.yaml, the namespace_config.yaml key uses toYaml without tpl, causing Helm backtick escapes to appear literally in the output. Compare with namespace_mapping.yaml which correctly uses tpl. This is a chart bug, not a values bug.

🤖 Generated with Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant