Skip to content

Add selfhosted auth config and buildkit fixes to values files#237

Closed
mhotan wants to merge 9 commits intomike/selfhosted-controlplanefrom
mike/selfhosted-authentication
Closed

Add selfhosted auth config and buildkit fixes to values files#237
mhotan wants to merge 9 commits intomike/selfhosted-controlplanefrom
mike/selfhosted-authentication

Conversation

@mhotan
Copy link
Contributor

@mhotan mhotan commented Feb 18, 2026

Summary

  • Add generic OAuth2/OIDC auth config to selfhosted-intracluster values files (controlplane + dataplane)
    • Auth globals: OIDC_BASE_URL, OIDC_CLIENT_ID, CLI_CLIENT_ID, INTERNAL_CLIENT_ID, AUTH_TOKEN_URL, AUTH_CLIENT_ID
    • Full auth structure with enable: false defaults — Terraform only sets enable flags
    • Flyteadmin OIDC config: appAuth, userAuth, authorizedUris
    • Service-to-service auth: configMap.union.auth (ClientSecret flow)
    • Dataplane auth: config.union.auth, clusterresourcesync, secrets.admin, executor.config.unionAuth
    • Commented ingress auth annotations (nginx auth-url to /me)
  • Fix ingress routing: change connect port to grpc for all services
  • Add connectPort config for cluster, authorizer, and usage services
  • Disable rootless buildkit on all AWS values files (EKS nodes don't set user.max_user_namespaces)

Companion PR: unionai/cloud#14443 (mike/selfhosted-authentication)

Test plan

  • Deployed and tested on mike-test staging environment via ArgoCD
  • All configmaps render correctly (no unresolved Helm templates)
  • Buildkit pod running with non-rootless mode (0 restarts)
  • Executor authenticating successfully with Okta tokens
  • All pods healthy in both union-cp and union namespaces

🤖 Generated with Claude Code

@mhotan mhotan force-pushed the mike/selfhosted-authentication branch from cfa9770 to e485c78 Compare February 18, 2026 17:22
mhotan and others added 2 commits February 18, 2026 13:22
- Add connectPort to sharedService config for services that support
  connect-rpc (authorizer, cluster, usage)
- Fix _helpers.tpl to use toYaml for sharedService and sync helpers
- Switch all service/ingress port references from numbers to named
  ports for clarity and connect protocol routing
- Add named ports to cacheservice deployment
- Update generated test fixtures

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mhotan mhotan force-pushed the mike/selfhosted-controlplane branch from 59097b8 to 0f63396 Compare February 18, 2026 21:23
mhotan and others added 7 commits February 18, 2026 15:32
Internal service-to-service calls (executions → CloudAdminService) go
through nginx but carry no auth headers, causing 401s. CloudAdminService
only serves static config data (cluster pools, domains, namespace
mappings) and flyteadmin already has disableForGrpc:true, so nginx auth
is unnecessary for these routes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The protectedGrpcRoutes template routed cluster, identity, authorizer,
and usage services to port name "connect" (Service port 83). However, no
service actually exposes a connect container port — the deployment template
only adds it when sharedService.connectPort is set at the service config
root level, which none of these services do. The connect protocol is
served on the gRPC port (8080) by default.

This caused connection refused errors (nginx → pod:83 → no listener),
manifesting as DeadlineExceeded on gRPC calls like
ClusterService/Heartbeat from the operator.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The ingress routes these services to the 'connect' port (83), but the
deployment template only declares a connect container port when
sharedService.connectPort is set at the service root level. Without it,
the K8s Service port 83 has no backing container port and requests fail
with connection refused.

Add sharedService.connectPort: 8081 at both the root level (for the
deployment template container port) and configMap level (for the binary
listener config) for cluster, authorizer, and usage. Also restores the
ingress template to correct state where flyteadmin/executions/etc use
grpc port and cluster/authorizer/usage use connect port.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move auth structure from Terraform overrides into values files with
globals for environment-specific values. Auth is disabled by default
(enable: false) and activated by populating globals + setting enable
flags. This makes the values files self-sufficient references for any
OAuth2/OIDC provider.

Controlplane: OIDC globals, service-to-service auth, flyteadmin auth,
executions auth fields, commented ingress auth annotations.
Dataplane: AUTH_CLIENT_ID global, operator auth, clusterresourcesync
auth, secrets.admin, executor unionAuth.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
EKS nodes (managed node groups and Auto Mode) do not set the kernel
sysctl user.max_user_namespaces, which rootless buildkit requires for
user namespace creation. Without it, rootlesskit fails with ENOSPC
("no space left on device"). Use privileged buildkit instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The auth annotations were commented out, forcing Terraform to hardcode
them. Now they're defined as real YAML values that Terraform can
reference from the base values file. When auth is disabled, Terraform
clears them with empty maps so nginx doesn't validate via /me.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mhotan mhotan force-pushed the mike/selfhosted-authentication branch from e485c78 to f95efd3 Compare February 18, 2026 23:33
# When enabled, services acquire OAuth2 tokens via client_credentials flow
# and send them on outgoing calls through nginx, which validates via /me.
auth:
enable: false
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to update to true

# security:
# useAuth: true
auth:
disableForGrpc: true
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to switch to false after ensuring Client Apps are sending tokens.

@mhotan mhotan force-pushed the mike/selfhosted-controlplane branch from 0f63396 to adc5dca Compare February 19, 2026 22:57
@mhotan
Copy link
Contributor Author

mhotan commented Feb 19, 2026

Replaced by #244 which is rebased on the clean branch stack: #243#236#244

@mhotan mhotan closed this Feb 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant