Draft
Conversation
Self-hosted (intra-cluster) deployment docs where both control plane and data plane run in the same Kubernetes cluster. Includes AWS and GCP guides, authentication setup, and deployment glossary. Separated from mike/self-onboarding-doc-updates to hold for product announcement. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Same as the selfmanaged version but the self-hosted control plane section uses the future selfhosted variant tag. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Deploying docs with
|
| Latest commit: |
5f95295
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://bf758704.docs-dog.pages.dev |
| Branch Preview URL: | https://peeter-selfhosted-docs.docs-dog.pages.dev |
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This reverts commit 492ad73.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add selfhosted image builder documentation The build-image task must be registered manually in selfhosted deployments before users can build container images via flyte.Image. This doc covers the registration command and prerequisites. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
# Conflicts: # content/deployment/configuration/namespace-mapping.md
* Include build task source code in image builder docs and remove uctl references The image builder docs previously required access to the cloud repo to get the task definition file. This inlines the full source code so selfhosted customers can create it directly. Also replaces uctl with the flyte CLI for verification. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Hardcode image prefix, rename UNION_IMAGE_TAG to APP_VERSION - Hardcode union_image_name_prefix to public.ecr.aws/g1m2l3c1/imagebuilder-staging - Rename UNION_IMAGE_TAG env var to APP_VERSION to align with docs terminology - Simplify registration command (no UNION_IMAGE_NAME_PREFIX env var needed) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Add selfhosted monitoring guide and update selfmanaged monitoring doc Selfhosted monitoring guide (new): - Accessing Grafana at /grafana - Pre-built CP (62 panels) and DP (37 panels) dashboards - Alerting: 16 rules, AlertManager → Grafana, contact point setup - Custom dashboards, remote write, BYO Prometheus Selfmanaged monitoring doc (updated): - Add selfhosted cross-reference callout - Add independent resource flags section (serviceMonitors, prometheusRules, dashboards work without monitoring.enabled) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add dashboards and alerting sections to selfmanaged monitoring doc Structural parity with selfhosted monitoring guide: - Dashboards: pre-built DP dashboard, custom dashboard ConfigMap - Alerting: enable flag, 8 DP alert rules table, configuring notifications (AlertManager + Grafana webhook option) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Update alerting docs: trimmed alerts, SLOs, and notification setup Both selfhosted and selfmanaged docs now have: - Operational alerts (3 rules): ServiceDown, HighRestartRate, HandlerPanic - SLO-based alerts (3 rules): HighErrorBudgetBurn, ErrorBudgetExhausted, LatencySLOBreach with configurable targets - Language positions SLO targets as recommended starting points that operators should tune to their environment - Notification setup via AlertManager receivers or Grafana contact points Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Document what SLOs measure and why Both selfhosted and selfmanaged docs now explain the four SLO indicators: - Service Availability: deployment replica health - Success Rate: API/ingress success (CP), execution success V1+V2 (DP) - Latency: ingress p99 (CP), propeller round p99 (DP) - Error Budget: remaining budget before availability target is breached Positions SLO targets as recommended starting points to tune. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Michael Hotan <mike@union.ai> * Document PrometheusAgent + AMP mode for selfhosted monitoring Add section covering agentMode=true with AMP for scalable metrics forwarding. Documents IRSA requirements, Grafana SigV4 config, and the recording rule limitation with pointer to AMP Ruler. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add Managed Prometheus examples section to configuration/monitoring.md Sync with selfhosted-deployment/monitoring.md which already has this section. Both docs target the selfmanaged variant and should have matching content. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Regenerate dataplane helm docs for chart version 2026.3.10 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Signed-off-by: Michael Hotan <mike@union.ai> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Separated from #839 (mike/self-onboarding-doc-updates) to hold for product announcement.
New pages
content/deployment/selfhosted-deployment/_index.md— Overview with architecture diagramcontent/deployment/selfhosted-deployment/control-plane-aws.md— AWS control plane deploymentcontent/deployment/selfhosted-deployment/control-plane-gcp.md— GCP control plane deployment (Preview)content/deployment/selfhosted-deployment/data-plane-aws.md— AWS data plane deploymentcontent/deployment/selfhosted-deployment/data-plane-gcp.md— GCP data plane deployment (Preview)content/deployment/selfhosted-deployment/authentication.md— OIDC auth with OAuth appscontent/deployment/glossary.md— Deployment terminologyTest plan
make devrenders selfmanaged variant correctly🤖 Generated with Claude Code