Skip to content

Dataplane - Sane defaults#210

Open
EngHabu wants to merge 58 commits intomainfrom
enghabu/sane-defaults
Open

Dataplane - Sane defaults#210
EngHabu wants to merge 58 commits intomainfrom
enghabu/sane-defaults

Conversation

@EngHabu
Copy link
Contributor

@EngHabu EngHabu commented Jan 26, 2026

Summary

  • Support generating MutatingWebhookConfiguration during helm install with 3 options:
    • "helm" generates certs through helm itself and stores them into a secret
    • "external" assumes the certs have been generated and passed to the helm values
    • "certManager" uses certManager to request and issue certs for the MutatingWebhookConfiguration
  • In low-priv deployments, create a default union service account and a PodTemplate so all tasks run with that service account.
  • Sane defaults for single-namespace / low-privilege mode: Setting low_privilege: true (or namespaces.enabled: false) now auto-injects all required namespace-scoping config across operator, propeller,
    executor, and webhook templates — eliminating ~40 lines of manual overrides from values-low-privilege.yaml
  • Static build-image-config configmap: In single-namespace mode, the build-image-config configmap (normally synced dynamically by the operator) is now created statically via Helm
  • Depot integration: New operator.enableDepot helper auto-enables Depot when imageBuilder is on but buildkit is off; task pod template automatically includes depot-token imagePullSecret
  • Webhook overhaul: Refactored webhook into dedicated templates/webhook/ directory, added Helm-managed certificate generation (self-signed, cert-manager, or external), Helm-managed
    MutatingWebhookConfiguration with namespace scoping in low-priv mode
  • Naming standardization: Renamed resources from flyte-/flytepropeller- to union-* (webhook, service accounts, roles), renamed default service account to union
  • New defaults: Enabled flyteconnector, knative-operator, serving, Depot (buildkit disabled), idl2Executor, default-pod-template-name, cacheservicev2
  • Low-privilege RBAC scoping: Executor ClusterRole/ClusterRoleBinding downgraded to Role/RoleBinding, monitoring/cost resources disabled, priorityClassName stripped
  • New values.v2.yaml: Opt-in values file for v2 mode (disables propeller, enables executor IDL2 + flyteconnector)

Test plan

  • Verify helm template renders correctly for default values
  • Verify helm template with low_privilege: true produces namespace-scoped resources and no cluster-wide permissions
  • Verify build-image-config configmap is created only in single-namespace mode
  • Verify depot-token imagePullSecret appears in task pod template when Depot is enabled
  • Verify webhook certificates render correctly for all providers (helm, certManager, external, legacy)
  • Verify generated test fixtures match expected output
  • Test upgrade path from existing deployments (webhook secret reuse, resource renames)

Signed-off-by: Haytham Abuelfutuh <haytham@afutuh.com>
Signed-off-by: Haytham Abuelfutuh <haytham@afutuh.com>
Signed-off-by: Haytham Abuelfutuh <haytham@afutuh.com>
# Conflicts:
#	tests/generated/dataplane.additional-podlabels.yaml
#	tests/generated/dataplane.aws.eks-automode.yaml
#	tests/generated/dataplane.aws.with-ingress.yaml
#	tests/generated/dataplane.aws.yaml
#	tests/generated/dataplane.azure.yaml
#	tests/generated/dataplane.cost.yaml
#	tests/generated/dataplane.dcgm-exporter.yaml
#	tests/generated/dataplane.fully-selfhosted.yaml
#	tests/generated/dataplane.low-priv.yaml
#	tests/generated/dataplane.nodeobserver.yaml
#	tests/generated/dataplane.oci.yaml
Signed-off-by: Haytham Abuelfutuh <haytham@afutuh.com>
@aviator-app
Copy link
Contributor

aviator-app bot commented Jan 26, 2026

Current Aviator status

Aviator will automatically update this comment as the status of the PR changes.
Comment /aviator refresh to force Aviator to re-examine your PR (or learn about other /aviator commands).

This pull request is currently open (not queued).

How to merge

To merge this PR, comment /aviator merge or add the mergequeue label.


See the real-time status of this PR on the Aviator webapp.
Use the Aviator Chrome Extension to see the status of your PR within GitHub.

Signed-off-by: Haytham Abuelfutuh <haytham@afutuh.com>
# Conflicts:
#	charts/dataplane/templates/_helpers.tpl
#	charts/dataplane/values.yaml
#	tests/generated/dataplane.additional-podlabels.yaml
#	tests/generated/dataplane.aws.eks-automode.yaml
#	tests/generated/dataplane.aws.with-ingress.yaml
#	tests/generated/dataplane.aws.yaml
#	tests/generated/dataplane.azure.yaml
#	tests/generated/dataplane.cost.yaml
#	tests/generated/dataplane.dcgm-exporter.yaml
#	tests/generated/dataplane.fully-selfhosted.yaml
#	tests/generated/dataplane.low-priv.yaml
#	tests/generated/dataplane.nodeobserver.yaml
#	tests/generated/dataplane.oci.yaml
Signed-off-by: Haytham Abuelfutuh <haytham@afutuh.com>
Signed-off-by: Haytham Abuelfutuh <haytham@afutuh.com>
@EngHabu EngHabu marked this pull request as ready for review February 6, 2026 21:30
Signed-off-by: Haytham Abuelfutuh <haytham@afutuh.com>
Get the webhook secret name
*/}}
{{- define "flytepropellerwebhook.secretName" -}}
flyte-pod-webhook
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
flyte-pod-webhook
union-pod-webhook

or something that doesn't clash with the Flyte OSS one. Whenever a customer needs to run Union and Flyte OSS in the same namespace, this will make deployment fail

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't these be installed in separate namespaces?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it can, but we have customers who only have permissions for a single namespace

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, this conflicts with Flyte OSS one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, separate namespaces? the only thing that's cluster-wide is the MutatingWebhookConfiguration object for which I added "-org" in the name

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
name: union-pod-webhook

@davidmirror-ops
Copy link
Contributor

I just tested this without overriding the default config for webhook certs and got this from a V2 execution that calls a secret

failed to execute node: failed at Node[a0]. RuntimeExecutionError: failed during plugin execution, caused by: failed to execute handle for plugin [container]: [InternalError] failed to create resource, caused by: Internal error occurred: failed calling webhook "flyte-pod-webhook.flyte.org": failed to call webhook: Post "https://flyte-pod-webhook.union.svc:443/mutate--v1-pod/secrets?timeout=30s": tls: failed to verify certificate: x509: certificate signed by unknown authority

Signed-off-by: Haytham Abuelfutuh <haytham@afutuh.com>
# Conflicts:
#	tests/generated/dataplane.additional-podlabels.yaml
#	tests/generated/dataplane.aws.eks-automode.yaml
#	tests/generated/dataplane.aws.with-ingress.yaml
#	tests/generated/dataplane.aws.yaml
#	tests/generated/dataplane.azure.yaml
#	tests/generated/dataplane.cost.yaml
#	tests/generated/dataplane.dcgm-exporter.yaml
#	tests/generated/dataplane.fully-selfhosted.yaml
#	tests/generated/dataplane.low-priv.yaml
#	tests/generated/dataplane.nodeobserver.yaml
#	tests/generated/dataplane.oci.yaml
Signed-off-by: Haytham Abuelfutuh <haytham@afutuh.com>
EngHabu added 6 commits March 2, 2026 13:15
Signed-off-by: Haytham Abuelfutuh <haytham@afutuh.com>
# Conflicts:
#	tests/generated/dataplane.additional-podlabels.yaml
#	tests/generated/dataplane.aws.eks-automode.yaml
#	tests/generated/dataplane.aws.with-ingress.yaml
#	tests/generated/dataplane.aws.yaml
#	tests/generated/dataplane.azure-custom-storage-prefix.yaml
#	tests/generated/dataplane.azure.yaml
#	tests/generated/dataplane.cost.yaml
#	tests/generated/dataplane.dcgm-exporter.yaml
#	tests/generated/dataplane.fully-selfhosted.yaml
#	tests/generated/dataplane.low-priv.yaml
#	tests/generated/dataplane.nodeobserver.yaml
#	tests/generated/dataplane.oci.yaml
#	tests/values/dataplane.fully-selfhosted.yaml
Signed-off-by: Haytham Abuelfutuh <haytham@afutuh.com>
Signed-off-by: Haytham Abuelfutuh <haytham@afutuh.com>
@EngHabu EngHabu changed the title [WIP] Sane defaults Dataplane - Sane defaults Mar 9, 2026
EngHabu added 3 commits March 9, 2026 13:15
Signed-off-by: Haytham Abuelfutuh <haytham@afutuh.com>
…ommonServiceAccount` helper and `commonServiceAccount` values to consolidate operator, executor, proxy, webhook, and fluentbit service accounts into a single shared ServiceAccount (`union-system`). Automatically enabled in singleNamespace mode. Separate RBAC role/binding names from ServiceAccount names to allow shared SA with distinct permissions.
# Conflicts:
#	charts/dataplane/templates/operator/configmap.yaml
#	tests/generated/dataplane.additional-podlabels.yaml
#	tests/generated/dataplane.aws.eks-automode.yaml
#	tests/generated/dataplane.aws.with-ingress.yaml
#	tests/generated/dataplane.aws.yaml
#	tests/generated/dataplane.azure-custom-storage-prefix.yaml
#	tests/generated/dataplane.azure.yaml
#	tests/generated/dataplane.cost.yaml
#	tests/generated/dataplane.dcgm-exporter.yaml
#	tests/generated/dataplane.fully-selfhosted.yaml
#	tests/generated/dataplane.low-priv.yaml
#	tests/generated/dataplane.nodeobserver.yaml
#	tests/generated/dataplane.oci.yaml
@EngHabu EngHabu requested a review from jeevb March 10, 2026 15:38
@@ -0,0 +1,12 @@
{{- if include "useCommonServiceAccount" . }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great.
Should we also include buildkit so it uses this SA? Or in the light of exploring alternatives to buildkit that is not needed now?
@EngHabu

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think buildkit doesn't need explicit cloud role (IAM, GKA... etc.), right? I mainly wanted to get rid of having to bind a gazillion KSAs to the same Cloud Role because you have no idea what each one is for...

BTW, this also includes (and enables by default for low-priv) depot.dev.... Let me know if you want to test that..

EngHabu added 6 commits March 10, 2026 12:19
…ildkitUri` is set before enabling Depot for image building.
# Conflicts:
#	tests/generated/dataplane.additional-podlabels.yaml
#	tests/generated/dataplane.aws.eks-automode.yaml
#	tests/generated/dataplane.aws.with-ingress.yaml
#	tests/generated/dataplane.aws.yaml
#	tests/generated/dataplane.azure-custom-storage-prefix.yaml
#	tests/generated/dataplane.azure.yaml
#	tests/generated/dataplane.cost.yaml
#	tests/generated/dataplane.dcgm-exporter.yaml
#	tests/generated/dataplane.fully-selfhosted.yaml
#	tests/generated/dataplane.low-priv.yaml
#	tests/generated/dataplane.nodeobserver.yaml
#	tests/generated/dataplane.oci.yaml
Signed-off-by: Haytham Abuelfutuh <haytham@afutuh.com>
Signed-off-by: Haytham Abuelfutuh <haytham@afutuh.com>
# Conflicts:
#	tests/generated/dataplane.additional-podlabels.yaml
#	tests/generated/dataplane.aws.eks-automode.yaml
#	tests/generated/dataplane.aws.with-ingress.yaml
#	tests/generated/dataplane.aws.yaml
#	tests/generated/dataplane.azure-custom-storage-prefix.yaml
#	tests/generated/dataplane.azure.yaml
#	tests/generated/dataplane.cost.yaml
#	tests/generated/dataplane.dcgm-exporter.yaml
#	tests/generated/dataplane.fully-selfhosted.yaml
#	tests/generated/dataplane.low-priv.yaml
#	tests/generated/dataplane.nodeobserver.yaml
#	tests/generated/dataplane.oci.yaml
EngHabu and others added 8 commits March 16, 2026 15:10
Enable action metrics in low-priv environments
# Conflicts:
#	charts/dataplane/Chart.yaml
#	charts/dataplane/templates/_helpers.tpl
#	charts/dataplane/templates/monitoring/prometheusrule.yaml
#	charts/dataplane/templates/monitoring/servicemonitor.yaml
#	charts/dataplane/templates/nodeexecutor/service.yaml
#	charts/dataplane/values.yaml
#	tests/generated/dataplane.additional-podlabels.yaml
#	tests/generated/dataplane.aws.eks-automode.yaml
#	tests/generated/dataplane.aws.with-ingress.yaml
#	tests/generated/dataplane.aws.yaml
#	tests/generated/dataplane.azure-custom-storage-prefix.yaml
#	tests/generated/dataplane.azure.yaml
#	tests/generated/dataplane.cost.yaml
#	tests/generated/dataplane.dcgm-exporter.yaml
#	tests/generated/dataplane.fully-selfhosted.yaml
#	tests/generated/dataplane.low-priv.yaml
#	tests/generated/dataplane.nodeobserver.yaml
#	tests/generated/dataplane.oci.yaml
…unt in imagebuilder templates. Update buildkit deployment and serviceaccount to use the new helper and skip dedicated SA creation when useCommonServiceAccount is enabled.

Signed-off-by: Haytham Abuelfutuh <haytham@afutuh.com>
…ildkit-specific SA annotations. Add imagePullSecrets support to commonServiceAccount. Add additionalServiceAccountAnnotations to dataplane values.

Signed-off-by: Haytham Abuelfutuh <haytham@afutuh.com>
Signed-off-by: Haytham Abuelfutuh <haytham@afutuh.com>
Signed-off-by: Haytham Abuelfutuh <haytham@afutuh.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants