Skip to content

Self-hosted docs holding branch#843

Draft
ppiegaze wants to merge 10 commits intomainfrom
peeter/selfhosted-docs
Draft

Self-hosted docs holding branch#843
ppiegaze wants to merge 10 commits intomainfrom
peeter/selfhosted-docs

Conversation

@ppiegaze
Copy link
Collaborator

Summary

  • Add self-hosted (intra-cluster) deployment guides where both control plane and data plane run in the same Kubernetes cluster
  • Includes AWS and GCP control plane/data plane guides, authentication setup, and deployment glossary
  • Add self-hosted link card to deployment index page

Separated from #839 (mike/self-onboarding-doc-updates) to hold for product announcement.

New pages

  • content/deployment/selfhosted-deployment/_index.md — Overview with architecture diagram
  • content/deployment/selfhosted-deployment/control-plane-aws.md — AWS control plane deployment
  • content/deployment/selfhosted-deployment/control-plane-gcp.md — GCP control plane deployment (Preview)
  • content/deployment/selfhosted-deployment/data-plane-aws.md — AWS data plane deployment
  • content/deployment/selfhosted-deployment/data-plane-gcp.md — GCP data plane deployment (Preview)
  • content/deployment/selfhosted-deployment/authentication.md — OIDC auth with OAuth apps
  • content/deployment/glossary.md — Deployment terminology

Test plan

  • make dev renders selfmanaged variant correctly
  • New selfhosted pages appear in sidebar under Platform deployment
  • GCP pages show Preview notices
  • Glossary page renders correctly
  • Mermaid diagrams and tabbed content render correctly

🤖 Generated with Claude Code

ppiegaze and others added 2 commits March 13, 2026 14:31
Self-hosted (intra-cluster) deployment docs where both control plane
and data plane run in the same Kubernetes cluster. Includes AWS and
GCP guides, authentication setup, and deployment glossary.

Separated from mike/self-onboarding-doc-updates to hold for product
announcement.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Same as the selfmanaged version but the self-hosted control plane
section uses the future selfhosted variant tag.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Mar 13, 2026

Deploying docs with  Cloudflare Pages  Cloudflare Pages

Latest commit: 5f95295
Status: ✅  Deploy successful!
Preview URL: https://bf758704.docs-dog.pages.dev
Branch Preview URL: https://peeter-selfhosted-docs.docs-dog.pages.dev

View logs

ppiegaze and others added 3 commits March 13, 2026 15:37
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ppiegaze ppiegaze changed the title docs: self-hosted deployment guides and glossary Self-hosted docs holding branch Mar 13, 2026
mhotan and others added 5 commits March 16, 2026 12:06
Add selfhosted image builder documentation

The build-image task must be registered manually in selfhosted
deployments before users can build container images via flyte.Image.
This doc covers the registration command and prerequisites.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
# Conflicts:
#	content/deployment/configuration/namespace-mapping.md
* Include build task source code in image builder docs and remove uctl references

The image builder docs previously required access to the cloud repo to get the
task definition file. This inlines the full source code so selfhosted customers
can create it directly. Also replaces uctl with the flyte CLI for verification.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Hardcode image prefix, rename UNION_IMAGE_TAG to APP_VERSION

- Hardcode union_image_name_prefix to public.ecr.aws/g1m2l3c1/imagebuilder-staging
- Rename UNION_IMAGE_TAG env var to APP_VERSION to align with docs terminology
- Simplify registration command (no UNION_IMAGE_NAME_PREFIX env var needed)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Add selfhosted monitoring guide and update selfmanaged monitoring doc

Selfhosted monitoring guide (new):
- Accessing Grafana at /grafana
- Pre-built CP (62 panels) and DP (37 panels) dashboards
- Alerting: 16 rules, AlertManager → Grafana, contact point setup
- Custom dashboards, remote write, BYO Prometheus

Selfmanaged monitoring doc (updated):
- Add selfhosted cross-reference callout
- Add independent resource flags section (serviceMonitors, prometheusRules,
  dashboards work without monitoring.enabled)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add dashboards and alerting sections to selfmanaged monitoring doc

Structural parity with selfhosted monitoring guide:
- Dashboards: pre-built DP dashboard, custom dashboard ConfigMap
- Alerting: enable flag, 8 DP alert rules table, configuring
  notifications (AlertManager + Grafana webhook option)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Update alerting docs: trimmed alerts, SLOs, and notification setup

Both selfhosted and selfmanaged docs now have:
- Operational alerts (3 rules): ServiceDown, HighRestartRate, HandlerPanic
- SLO-based alerts (3 rules): HighErrorBudgetBurn, ErrorBudgetExhausted,
  LatencySLOBreach with configurable targets
- Language positions SLO targets as recommended starting points that
  operators should tune to their environment
- Notification setup via AlertManager receivers or Grafana contact points

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Document what SLOs measure and why

Both selfhosted and selfmanaged docs now explain the four SLO indicators:
- Service Availability: deployment replica health
- Success Rate: API/ingress success (CP), execution success V1+V2 (DP)
- Latency: ingress p99 (CP), propeller round p99 (DP)
- Error Budget: remaining budget before availability target is breached

Positions SLO targets as recommended starting points to tune.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Michael Hotan <mike@union.ai>

* Document PrometheusAgent + AMP mode for selfhosted monitoring

Add section covering agentMode=true with AMP for scalable metrics
forwarding. Documents IRSA requirements, Grafana SigV4 config, and
the recording rule limitation with pointer to AMP Ruler.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add Managed Prometheus examples section to configuration/monitoring.md

Sync with selfhosted-deployment/monitoring.md which already has this
section. Both docs target the selfmanaged variant and should have
matching content.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Regenerate dataplane helm docs for chart version 2026.3.10

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Signed-off-by: Michael Hotan <mike@union.ai>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants