diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml index b748899a..bfa371d5 100644 --- a/.github/workflows/build.yml +++ b/.github/workflows/build.yml @@ -12,12 +12,16 @@ on: branches: - main paths-ignore: - - '*.md' + - '**/*.md' - 'docs/**' + - '.claude/**' + - 'LICENSE' pull_request: paths-ignore: - - '*.md' + - '**/*.md' - 'docs/**' + - '.claude/**' + - 'LICENSE' permissions: actions: write diff --git a/CLAUDE.md b/CLAUDE.md index 3324b1be..3655e267 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -4,26 +4,34 @@ Kubernetes operator for managing Posit Team deployments. ## Project Structure -- **`api/`**: Kubernetes API/CRD definitions +- **`api/`**: Kubernetes API/CRD definitions (core, product, keycloak, templates) - **`cmd/`**: Main operator entry point - **`internal/`**: Core operator logic and controllers - **`config/`**: Kubernetes manifests and Kustomize configurations - **`dist/chart/`**: Helm chart for deployment -- **`flightdeck/`**: Landing page dashboard component +- **`flightdeck/`**: Landing page dashboard component (separate Go module) - **`client-go/`**: Generated Kubernetes client code -- **`pkg/`**: Shared packages +- **`docs/`**: User and contributor documentation ## Build and Development ```bash -just build # Build operator binary -just test # Run tests -just run # Run operator locally -just format # Format code +just build # Build operator binary to ./bin/team-operator +just test # Run go tests +just run # Run operator locally from source +just deps # Install dependencies +just mgenerate # Regenerate manifests after API changes just helm-lint # Lint Helm chart -just helm-template # Render Helm templates +just helm-template # Render Helm templates locally +just helm-install # Install operator via Helm +just helm-uninstall # Uninstall operator via Helm ``` +## Namespaces + +- **`posit-team-system`**: Where the operator runs +- **`posit-team`**: Where Site CRs and products are deployed + ## Helm Installation ```bash @@ -35,8 +43,8 @@ helm install team-operator ./dist/chart \ ## Contributing - Use conventional commits (`feat:`, `fix:`, `docs:`, etc.) -- Run `just format` before committing -- Ensure tests pass with `just test` +- Run `just test` before committing +- See [CONTRIBUTING.md](CONTRIBUTING.md) for detailed guidelines ## License diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 00000000..d2a03f41 --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,380 @@ +# Contributing to Team Operator + +Welcome! We appreciate your interest in contributing to the Team Operator project. This guide will help you get started with development and understand our contribution workflow. + +## Table of Contents + +- [Project Overview](#project-overview) +- [Development Setup](#development-setup) +- [Project Structure](#project-structure) +- [Making Changes](#making-changes) +- [Testing](#testing) +- [Pull Request Process](#pull-request-process) +- [Code Review Guidelines](#code-review-guidelines) +- [Getting Help](#getting-help) + +## Project Overview + +Team Operator is a Kubernetes operator built with [Kubebuilder](https://book.kubebuilder.io/) that automates the deployment, configuration, and management of Posit Team products (Workbench, Connect, Package Manager, and Chronicle) within Kubernetes clusters. + +> **Note**: This repository is under active development and is not yet ready for production use. + +## Development Setup + +### Prerequisites + +Before you begin, ensure you have the following installed: + +- **Go 1.25+** (version specified in `go.mod`) +- **Docker** (for building container images) +- **kubectl** (configured to access a Kubernetes cluster) +- **Kubernetes cluster** (1.29+ for testing; k3d works well for local development) +- **Just** command runner (`brew install just` on macOS, or see [installation guide](https://github.com/casey/just)) +- **Helm** (for chart development and testing) + +On macOS, you may also need `gsed` for some Makefile targets: + +```bash +brew install gnu-sed +``` + +### Cloning the Repository + +```bash +git clone https://github.com/posit-dev/team-operator.git +cd team-operator +``` + +### Installing Dependencies + +```bash +just deps +``` + +### Building the Operator + +```bash +just build +``` + +This compiles the operator binary to `./bin/team-operator`. + +### Running Locally + +To run the operator against your current Kubernetes context: + +```bash +just run +``` + +For development with a local Kubernetes cluster: + +```bash +# Create a k3d cluster +just k3d-up + +# Install CRDs +just crds + +# Run the operator +just run +``` + +### Running Tests + +```bash +just test +``` + +For tests with envtest (Kubebuilder test framework): + +```bash +just mtest +``` + +## Project Structure + +| Directory | Description | +|-----------|-------------| +| `api/` | Kubernetes API/CRD definitions | +| `cmd/` | Main operator entry point | +| `internal/` | Core operator logic and controllers | +| `internal/controller/` | Reconciliation controllers for each resource type | +| `config/` | Kubernetes manifests and Kustomize configurations | +| `dist/chart/` | Helm chart for deployment | +| `flightdeck/` | Landing page dashboard component | +| `client-go/` | Generated Kubernetes client code | +| `pkg/` | Shared packages | +| `openapi/` | Generated OpenAPI specifications | +| `hack/` | Build and development scripts | + +## Making Changes + +### Branching Strategy + +1. Create a feature branch from `main`: + ```bash + git checkout main + git pull origin main + git checkout -b your-feature-name + ``` + +2. Keep branch names descriptive and use hyphens (avoid slashes): + - Good: `add-workbench-scaling`, `fix-database-connection` + - Avoid: `feature/workbench`, `fix/db` + +### Commit Message Conventions + +We use [Conventional Commits](https://www.conventionalcommits.org/). Each commit message should follow this format: + +``` +(): + +[optional body] + +[optional footer] +``` + +**Types:** +- `feat:` - New feature +- `fix:` - Bug fix +- `docs:` - Documentation only changes +- `refactor:` - Code change that neither fixes a bug nor adds a feature +- `test:` - Adding or correcting tests +- `chore:` - Changes to the build process or auxiliary tools + +**Examples:** +``` +feat(connect): add support for custom resource limits +fix(workbench): resolve database connection timeout +docs: update installation instructions +refactor(controller): simplify reconciliation logic +``` + +### Code Style Guidelines + +1. **Run formatters before committing:** + ```bash + just format + ``` + Or using make: + ```bash + make fmt + ``` + +2. **Run linting:** + ```bash + go vet ./... + ``` + +3. **Follow existing patterns** - New code should look like it belongs in the codebase. + +4. **Keep functions focused** - Each function should do one thing well. + +5. **Use descriptive names** - Names should reveal intent. + +### Adding New CRD Fields + +When adding new fields to Custom Resource Definitions: + +1. **Update the API types** in `api/`: + - Add the new field with appropriate JSON tags and Kubebuilder annotations + - Include validation rules where appropriate + +2. **Regenerate manifests:** + ```bash + just mgenerate + ``` + +3. **Update controller logic** in `internal/controller/` if needed. + +4. **Add tests** for the new functionality. + +5. **Update Helm chart** in `dist/chart/` if the change affects deployment. + +Example of adding a new field: + +```go +// +kubebuilder:validation:Optional +// +kubebuilder:default:=1 +// Replicas is the number of instances to deploy +Replicas *int32 `json:"replicas,omitempty"` +``` + +## Testing + +### Unit Tests + +Run all unit tests: + +```bash +just test +``` + +Or run Go tests directly: + +```bash +go test -v ./... +``` + +### Running Specific Tests + +To run tests for a specific package: + +```bash +go test -v ./internal/controller/... +``` + +To run a specific test: + +```bash +go test -v ./internal/controller/... -run TestReconcile +``` + +### Integration Tests with envtest + +The project uses Kubebuilder's envtest for integration testing: + +```bash +just mtest +``` + +This sets up a local Kubernetes API server for testing without requiring a full cluster. + +### Helm Chart Testing + +```bash +# Lint the chart +just helm-lint + +# Render templates locally (useful for debugging) +just helm-template +``` + +### Coverage Reports + +After running tests, view coverage: + +```bash +go tool cover -func coverage.out +``` + +## Pull Request Process + +### Before Submitting + +1. **Ensure all tests pass:** + ```bash + just test + ``` + +2. **Format your code:** + ```bash + just format + ``` + +3. **Verify no uncommitted changes from generation:** + ```bash + git diff --exit-code + ``` + +4. **Lint the Helm chart:** + ```bash + just helm-lint + ``` + +### PR Description + +Include the following in your PR description: + +1. **Summary** - What does this PR do? +2. **Motivation** - Why is this change needed? +3. **Testing** - How was this tested? +4. **Breaking changes** - Does this introduce any breaking changes? +5. **Related issues** - Link to any related GitHub issues + +### CI Checks + +The following checks must pass: + +- **Build** - The operator must compile successfully +- **Unit tests** - All tests must pass +- **Kustomize** - Kustomization must build without errors +- **Helm lint** - Chart must pass linting +- **Helm template** - Templates must render correctly +- **No diff** - Generated files must be committed + +### Review Expectations + +- PRs require at least one approval before merging +- Address all review comments or explain why you disagree +- Keep PRs focused - smaller PRs are easier to review +- Respond to feedback promptly + +## Code Review Guidelines + +We follow specific guidelines for code review. For detailed review standards, see [`.claude/review-guidelines.md`](.claude/review-guidelines.md). + +### Core Principles + +- **Simplicity** - Prefer explicit over clever +- **Maintainability** - Follow existing patterns in the codebase +- **Security** - Extra scrutiny for credential handling, RBAC, and network operations + +### Review Checklist by Area + +**API Changes (`api/`):** +- Kubebuilder annotations are correct +- New fields have sensible defaults +- Validation rules are present +- Breaking changes have migration strategy + +**Controller Changes (`internal/controller/`):** +- Reconciliation is idempotent +- Error handling reports status correctly +- Config flows from Site -> Product correctly +- Both unit and integration tests exist + +**Helm Chart (`dist/chart/`):** +- Values have sensible defaults +- Templates render correctly +- RBAC permissions are minimal +- CRDs are up to date + +**Flightdeck (`flightdeck/`):** +- Go templates render correctly +- Static assets are properly served +- Configuration options are documented + +### What NOT to Comment On + +- Style issues handled by formatters (run `make fmt`) +- Personal preferences without clear benefit +- Theoretical concerns without concrete impact + +## Getting Help + +If you have questions or need help: + +1. **Check existing documentation** - README.md, this guide, and inline code comments +2. **Search existing issues** - Your question may have been answered before +3. **Open an issue** - For bugs, feature requests, or questions +4. **Contact Posit** - For production use inquiries, [contact Posit](https://posit.co/schedule-a-call/) + +## Quick Reference + +| Task | Command | +|------|---------| +| Build | `just build` | +| Test | `just test` | +| Run locally | `just run` | +| Format code | `just format` | +| Regenerate manifests | `just mgenerate` | +| Install CRDs | `just crds` | +| Helm lint | `just helm-lint` | +| Helm template | `just helm-template` | +| Helm install | `just helm-install` | +| Create k3d cluster | `just k3d-up` | +| Delete k3d cluster | `just k3d-down` | + +Thank you for contributing to Team Operator! diff --git a/README.md b/README.md index 9fa3e40d..3790e6be 100644 --- a/README.md +++ b/README.md @@ -5,6 +5,21 @@ A Kubernetes operator that manages the deployment and lifecycle of Posit Team products (Workbench, Connect, Package Manager, and Chronicle) within Kubernetes clusters. +## Table of Contents + +- [Overview](#overview) +- [Components](#components) + - [Flightdeck](#flightdeck) +- [Quick Start](#quick-start) + - [Prerequisites](#prerequisites) + - [Installation](#installation) + - [Local Development](#local-development) +- [Configuration](#configuration) +- [Architecture](#architecture) +- [Troubleshooting](#troubleshooting) +- [License](#license) +- [Documentation](docs/README.md) + ## Overview The Team Operator is a Kubernetes controller built using [Kubebuilder](https://book.kubebuilder.io/) that automates the deployment, configuration, and management of Posit Team products. It handles: @@ -118,110 +133,61 @@ apiVersion: core.posit.team/v1beta1 kind: Site metadata: name: my-site - namespace: posit-team + namespace: posit-team # Where Site CRs are deployed (operator runs in posit-team-system) spec: + # Required: Base domain for product URLs domain: example.com - # Flightdeck configuration (optional) + # Ingress configuration + ingressClass: traefik + + # Flightdeck landing page (optional) flightdeck: featureEnabler: - showAcademy: false # Hide Academy from landing page + showAcademy: false # Products to deploy workbench: image: ghcr.io/rstudio/rstudio-workbench-daily:latest + replicas: 1 connect: image: ghcr.io/rstudio/rstudio-connect-daily:latest - # ... additional config -``` + replicas: 1 -## Architecture Diagrams - -### Database - -```mermaid -flowchart - subgraph db [Team Operator - Databases] - pub-user(Pub User) - pub-user-->pub-main - pub-user-->pub-metrics - subgraph pub[PublishDB] - pub-main[Main Schema] - pub-metrics[Instrumentation Schema] - end - pkg-user(Pkg User) - pkg-user-->pkg-main - pkg-user-->pkg-metrics - subgraph pkg[PackageDB] - pkg-main[Main Schema] - pkg-metrics[Metrics Schema] - end - dev-user(Dev User) - dev-user-->dev-main - subgraph dev[DevDB] - dev-main[Public Schema] - end - end - -classDef other fill:#FAEEE9,stroke:#ab4d26 + packageManager: + image: ghcr.io/rstudio/rstudio-pm-daily:latest + replicas: 1 ``` -### Publish / Connect -```mermaid -flowchart LR - - subgraph publish [Team Operator - Publish] - subgraph other [Other Resources] - ing(Ingress) - rbac(RBAC) - svc(Service Account) - end - pub-->other - manual(Manual) - pvc(PVC) - pv(PV) - op-->pv - op-->pub - op(Site Controller) - op-->dbcon - secretkey(Secret Key) - pub(Connect Controller) - db("Postgres Db (via CRD)") - license(License) - manual-->license - manual-->clientsecret - subgraph secrets [Secret Manager] - clientsecret(Auth Client Secret) - end - mainDbCon(Main Database Connection) - manual-->mainDbCon - mainDbCon-->dbcon - subgraph dbcon [DB Controller] - createdb(Create Databases) - end - license-->pubdeploy - clientsecret-->pubdeploy - cm(Config Maps) - dbsecret(Db Password Secret) - pub-->pvc - pub-->dbsecret - pub-->secretkey - pub-->pubdeploy - pub-->db - pub-->cm - cm-->pubdeploy - secretkey-->pubdeploy - pv-->pvc - db-->pubdeploy - pvc-->pubdeploy - pubdeploy(Connect Pod) - dbsecret-->pubdeploy - end - - - classDef other fill:#FAEEE9,stroke:#ab4d26 +> **Note:** The operator runs in `posit-team-system` namespace, while Site CRs and deployed products live in a separate namespace (typically `posit-team` or a configured `watchNamespace`). See [docs/README.md](docs/README.md) for detailed architecture. + +## Architecture + +The Team Operator uses a hierarchical controller pattern: + ``` +Site CR (single source of truth) + │ + ├── Site Controller + │ ├── Creates Product CRs (Connect, Workbench, PackageManager, Chronicle, Flightdeck) + │ ├── Manages shared storage (PersistentVolumes) + │ └── Coordinates database provisioning + │ + ├── Product Controllers + │ ├── Connect Controller → Pods, Services, Ingress, ConfigMaps + │ ├── Workbench Controller → Pods, Sessions, Job Templates + │ ├── PackageManager Controller → Pods, S3/Azure integration + │ ├── Chronicle Controller → Telemetry service, Sidecar injection + │ └── Flightdeck Controller → Landing page, Product navigation + │ + └── Database Controller + └── PostgreSQL schemas, credentials, migrations +``` + +Each product has dedicated database schemas and isolated credentials. Workbench and Connect support off-host execution where user workloads run in separate Kubernetes Jobs. Chronicle collects telemetry via sidecars injected into product pods. + +For detailed architecture diagrams with component explanations, see the [Architecture Documentation](docs/architecture.md). ## Troubleshooting diff --git a/api/keycloak/v2alpha1/README.md b/api/keycloak/v2alpha1/README.md index ef328827..1fe1fadc 100644 --- a/api/keycloak/v2alpha1/README.md +++ b/api/keycloak/v2alpha1/README.md @@ -1,7 +1,38 @@ # Keycloak CRDs -The Keycloak operator is written in Java... +This package contains Go struct definitions that mirror the Keycloak Operator's CRD schema. -Although it would be fantastic to autogenerate this class from the CRD that they provide, -generation usually goes the other direction. As a result, we have built up this struct to match -their CRD spec ourselves. +## Background + +The [Keycloak Operator](https://www.keycloak.org/operator/installation) is written in Java and provides its own CRDs for managing Keycloak instances. Team Operator needs to create `Keycloak` custom resources to integrate authentication with Posit Team products. + +## Why Manual Struct Definitions? + +Kubebuilder and controller-runtime typically generate CRDs *from* Go structs. However, the Keycloak Operator works in reverse - it defines CRDs in Java, and we consume them. Since there's no standard tooling to generate Go structs from external CRDs, we've manually built these structs to match the Keycloak Operator's CRD spec. + +## Files + +- `keycloak_types.go` - Go struct definitions for `Keycloak` and `KeycloakSpec` +- `groupversion_info.go` - API group and version registration +- `zz_generated.deepcopy.go` - Auto-generated deep copy methods + +## Usage + +These types are used by the Site controller to create Keycloak instances when SSO is configured: + +```go +import keycloakv2alpha1 "github.com/posit-dev/team-operator/api/keycloak/v2alpha1" + +keycloak := &keycloakv2alpha1.Keycloak{ + Spec: keycloakv2alpha1.KeycloakSpec{ + Hostname: &keycloakv2alpha1.KeycloakHostnameSpec{ + Hostname: "auth.example.com", + }, + Instances: 1, + }, +} +``` + +## Maintenance + +When upgrading the Keycloak Operator, verify that the struct definitions in `keycloak_types.go` still match their CRD schema. Check the [Keycloak Operator documentation](https://www.keycloak.org/operator/basic-deployment) for schema changes. diff --git a/dist/chart/README.md b/dist/chart/README.md new file mode 100644 index 00000000..00f9ff5e --- /dev/null +++ b/dist/chart/README.md @@ -0,0 +1,398 @@ +# Team Operator Helm Chart + +A Helm chart for deploying the Team Operator, a Kubernetes operator that manages the deployment and lifecycle of Posit Team products (Workbench, Connect, Package Manager, and Chronicle) within Kubernetes clusters. + +> **Warning** +> This operator is under active development and is not yet ready for production use. Please [contact Posit](https://posit.co/schedule-a-call/) before using this operator. + +## Overview + +The Team Operator automates the deployment, configuration, and management of Posit Team products. It handles: + +- Multi-product Posit Team deployments through a single `Site` Custom Resource +- Database provisioning and management for each product +- Secure credential management via Kubernetes secrets or AWS Secrets Manager +- License configuration and validation +- Ingress routing and load balancing +- Shared storage configuration across products +- Keycloak integration for authentication +- Off-host execution support for Workbench and Connect + +## Prerequisites + +- Kubernetes 1.29+ +- Helm 3.x +- kubectl configured to access your cluster +- (Optional) cert-manager for TLS certificate management +- (Optional) Prometheus Operator for metrics collection + +## What Gets Installed + +This chart installs the following resources: + +| Resource Type | Description | +|---------------|-------------| +| Deployment | Controller manager that runs the operator | +| ServiceAccount | Identity for the operator pod | +| ClusterRole | Cluster-wide permissions (PersistentVolumes) | +| Role | Namespace-scoped permissions for managed resources | +| ClusterRoleBinding | Binds ClusterRole to ServiceAccount | +| RoleBinding | Binds Role to ServiceAccount | +| Service | Metrics endpoint for the operator | +| CRDs | Custom Resource Definitions for Posit Team resources | + +### Custom Resource Definitions (CRDs) + +The chart installs the following CRDs: + +- `sites.core.posit.team` - Top-level resource for Posit Team deployments +- `workbenches.core.posit.team` - RStudio Workbench instances +- `connects.core.posit.team` - Posit Connect instances +- `packagemanagers.core.posit.team` - Posit Package Manager instances +- `chronicles.core.posit.team` - Chronicle instances +- `flightdecks.core.posit.team` - Landing page dashboard +- `postgresdatabases.core.posit.team` - Database provisioning + +## Installation + +### Basic Installation + +```bash +helm install team-operator ./dist/chart \ + --namespace posit-team-system \ + --create-namespace +``` + +### Installation with Custom Values + +```bash +helm install team-operator ./dist/chart \ + --namespace posit-team-system \ + --create-namespace \ + --values my-values.yaml +``` + +### Installation with Inline Overrides + +```bash +helm install team-operator ./dist/chart \ + --namespace posit-team-system \ + --create-namespace \ + --set controllerManager.container.image.repository=posit/team-operator \ + --set controllerManager.container.image.tag=v1.2.0 \ + --set watchNamespace=my-posit-team +``` + +### Upgrading + +```bash +helm upgrade team-operator ./dist/chart \ + --namespace posit-team-system \ + --values my-values.yaml +``` + +### Uninstalling + +```bash +helm uninstall team-operator --namespace posit-team-system +``` + +> **Note**: By default, CRDs are preserved after uninstallation due to the `crd.keep: true` setting. To remove CRDs, set `crd.keep: false` before uninstalling or manually delete them: +> ```bash +> kubectl delete crd sites.core.posit.team workbenches.core.posit.team connects.core.posit.team packagemanagers.core.posit.team chronicles.core.posit.team flightdecks.core.posit.team postgresdatabases.core.posit.team +> ``` + +## Configuration Reference + +### Global Settings + +| Parameter | Description | Default | Required | +|-----------|-------------|---------|----------| +| `watchNamespace` | Namespace where the operator watches for Site CRs | `posit-team` | No | + +### Controller Manager + +| Parameter | Description | Default | Required | +|-----------|-------------|---------|----------| +| `controllerManager.replicas` | Number of operator replicas | `1` | No | +| `controllerManager.serviceAccountName` | Name of the ServiceAccount | `team-operator-controller-manager` | No | +| `controllerManager.terminationGracePeriodSeconds` | Grace period for pod termination | `10` | No | +| `controllerManager.tolerations` | Pod tolerations for scheduling | `[]` | No | +| `controllerManager.nodeSelector` | Node selector for pod placement | `{}` | No | + +### Controller Manager Container + +| Parameter | Description | Default | Required | +|-----------|-------------|---------|----------| +| `controllerManager.container.image.repository` | Operator image repository | `posit/team-operator` | No | +| `controllerManager.container.image.tag` | Operator image tag | `latest` | No | +| `controllerManager.container.args` | Container arguments | See values.yaml | No | +| `controllerManager.container.env` | Environment variables | See values.yaml | No | +| `controllerManager.container.resources.limits.cpu` | CPU limit | `500m` | No | +| `controllerManager.container.resources.limits.memory` | Memory limit | `128Mi` | No | +| `controllerManager.container.resources.requests.cpu` | CPU request | `10m` | No | +| `controllerManager.container.resources.requests.memory` | Memory request | `64Mi` | No | + +### Controller Manager Probes + +| Parameter | Description | Default | Required | +|-----------|-------------|---------|----------| +| `controllerManager.container.livenessProbe.initialDelaySeconds` | Initial delay for liveness probe | `15` | No | +| `controllerManager.container.livenessProbe.periodSeconds` | Period for liveness probe | `20` | No | +| `controllerManager.container.livenessProbe.httpGet.path` | Liveness probe path | `/healthz` | No | +| `controllerManager.container.livenessProbe.httpGet.port` | Liveness probe port | `8081` | No | +| `controllerManager.container.readinessProbe.initialDelaySeconds` | Initial delay for readiness probe | `5` | No | +| `controllerManager.container.readinessProbe.periodSeconds` | Period for readiness probe | `10` | No | +| `controllerManager.container.readinessProbe.httpGet.path` | Readiness probe path | `/readyz` | No | +| `controllerManager.container.readinessProbe.httpGet.port` | Readiness probe port | `8081` | No | + +### Controller Manager Security Context + +| Parameter | Description | Default | Required | +|-----------|-------------|---------|----------| +| `controllerManager.container.securityContext.allowPrivilegeEscalation` | Allow privilege escalation | `false` | No | +| `controllerManager.container.securityContext.capabilities.drop` | Capabilities to drop | `["ALL"]` | No | +| `controllerManager.securityContext.runAsNonRoot` | Run as non-root user | `true` | No | +| `controllerManager.securityContext.seccompProfile.type` | Seccomp profile type | `RuntimeDefault` | No | + +### Service Account + +| Parameter | Description | Default | Required | +|-----------|-------------|---------|----------| +| `controllerManager.serviceAccount.annotations` | Annotations for the ServiceAccount (e.g., for IAM roles) | `{}` | No | + +### RBAC + +| Parameter | Description | Default | Required | +|-----------|-------------|---------|----------| +| `rbac.enable` | Enable RBAC resources | `true` | No | + +### CRDs + +| Parameter | Description | Default | Required | +|-----------|-------------|---------|----------| +| `crd.enable` | Install CRDs with the chart | `true` | No | +| `crd.keep` | Keep CRDs when chart is uninstalled | `true` | No | + +### Metrics + +| Parameter | Description | Default | Required | +|-----------|-------------|---------|----------| +| `metrics.enable` | Enable metrics endpoint | `true` | No | + +### Prometheus + +| Parameter | Description | Default | Required | +|-----------|-------------|---------|----------| +| `prometheus.enable` | Enable ServiceMonitor for Prometheus | `false` | No | + +### Webhooks + +| Parameter | Description | Default | Required | +|-----------|-------------|---------|----------| +| `webhook.enable` | Enable admission webhooks | `false` | No | + +### Cert-Manager + +| Parameter | Description | Default | Required | +|-----------|-------------|---------|----------| +| `certmanager.enable` | Enable cert-manager for TLS certificates | `false` | No | + +### Network Policies + +| Parameter | Description | Default | Required | +|-----------|-------------|---------|----------| +| `networkPolicy.enable` | Enable NetworkPolicies | `false` | No | + +## Examples + +### AWS Deployment with EKS IAM Roles + +For AWS deployments using IAM Roles for Service Accounts (IRSA): + +```yaml +# aws-values.yaml +watchNamespace: posit-team + +controllerManager: + container: + image: + repository: posit/team-operator + tag: v1.2.0 + env: + WATCH_NAMESPACES: "posit-team" + AWS_REGION: "us-east-1" + serviceAccount: + annotations: + eks.amazonaws.com/role-arn: "arn:aws:iam::123456789012:role/team-operator-role" +``` + +```bash +helm install team-operator ./dist/chart \ + --namespace posit-team-system \ + --create-namespace \ + --values aws-values.yaml +``` + +### Azure Deployment with AKS Workload Identity + +For Azure deployments using Workload Identity: + +```yaml +# azure-values.yaml +watchNamespace: posit-team + +controllerManager: + container: + image: + repository: posit/team-operator + tag: v1.2.0 + env: + WATCH_NAMESPACES: "posit-team" + serviceAccount: + annotations: + azure.workload.identity/client-id: "" + pod: + labels: + azure.workload.identity/use: "true" +``` + +```bash +helm install team-operator ./dist/chart \ + --namespace posit-team-system \ + --create-namespace \ + --values azure-values.yaml +``` + +### Custom Resource Limits + +For production deployments with increased resource limits: + +```yaml +# production-values.yaml +controllerManager: + container: + resources: + limits: + cpu: "1" + memory: 512Mi + requests: + cpu: 100m + memory: 128Mi +``` + +### Multi-Namespace Watching + +To watch multiple namespaces for Site CRs: + +```yaml +# multi-namespace-values.yaml +watchNamespace: posit-team + +controllerManager: + container: + env: + WATCH_NAMESPACES: "posit-team,posit-team-staging,posit-team-prod" +``` + +> **Note**: The operator needs appropriate RBAC permissions in each watched namespace. + +### Node Selector and Tolerations + +To schedule the operator on specific nodes: + +```yaml +# node-placement-values.yaml +controllerManager: + nodeSelector: + kubernetes.io/os: linux + node-role.kubernetes.io/control-plane: "" + tolerations: + - key: "node-role.kubernetes.io/control-plane" + operator: "Exists" + effect: "NoSchedule" +``` + +### Enabling Prometheus Metrics + +To enable Prometheus ServiceMonitor with cert-manager for secure metrics: + +```yaml +# prometheus-values.yaml +metrics: + enable: true + +prometheus: + enable: true + +certmanager: + enable: true +``` + +## RBAC Permissions + +The operator requires the following permissions: + +### Cluster-Wide (ClusterRole) + +- **PersistentVolumes**: Full CRUD access for shared storage provisioning + +### Namespace-Scoped (Role) + +- **Core resources**: ConfigMaps, PVCs, Pods, Secrets, ServiceAccounts, Services +- **Apps**: Deployments, StatefulSets, DaemonSets +- **Batch**: Jobs +- **Networking**: Ingresses, NetworkPolicies +- **Policy**: PodDisruptionBudgets +- **RBAC**: Roles, RoleBindings +- **Posit Team CRDs**: Sites, Workbenches, Connects, PackageManagers, Chronicles, Flightdecks, PostgresDatabases +- **Keycloak**: Keycloaks, KeycloakRealmImports (for authentication) +- **Traefik**: Middlewares (for ingress routing) +- **Secrets Store CSI**: SecretProviderClasses (for external secrets) + +## Metrics and Monitoring + +The operator exposes metrics on port 8443 at the `/metrics` endpoint. When `prometheus.enable` is set to `true`, a ServiceMonitor resource is created for automatic scraping by Prometheus Operator. + +### Metrics Endpoint Security + +- Without cert-manager: Metrics are served with insecure TLS (for development/testing) +- With cert-manager: Metrics are served with proper TLS certificates + +## Troubleshooting + +### Check Operator Logs + +```bash +kubectl logs -n posit-team-system deployment/team-operator-controller-manager +``` + +### Check Operator Status + +```bash +kubectl get pods -n posit-team-system +kubectl describe deployment -n posit-team-system team-operator-controller-manager +``` + +### Check Site Status + +```bash +kubectl describe site -n posit-team +``` + +### Verify CRDs are Installed + +```bash +kubectl get crds | grep posit.team +``` + +## Links + +- [Source Code](https://github.com/posit-dev/team-operator) +- [Posit Team Documentation](https://docs.posit.co/) + +## License + +MIT License - see [LICENSE](../../../LICENSE) file for details. + +Copyright (c) 2023-2026 Posit Software, PBC diff --git a/docs/README.md b/docs/README.md index c86208f2..f795c674 100644 --- a/docs/README.md +++ b/docs/README.md @@ -26,6 +26,255 @@ Site CRD (single source of truth) The Site controller watches for Site resources and reconciles product-specific Custom Resources for each enabled product. +### Overall System Architecture + +```mermaid +flowchart TB + subgraph user [User Interface] + kubectl(kubectl / Helm) + end + + subgraph crd [Custom Resources] + site[Site CRD] + connect_cr[Connect CR] + workbench_cr[Workbench CR] + pm_cr[PackageManager CR] + chronicle_cr[Chronicle CR] + keycloak_cr[Keycloak CR] + flightdeck_cr[Flightdeck CR] + pgdb_cr[PostgresDatabase CR] + end + + subgraph controllers [Controllers] + site_ctrl[Site Controller] + connect_ctrl[Connect Controller] + workbench_ctrl[Workbench Controller] + pm_ctrl[PackageManager Controller] + chronicle_ctrl[Chronicle Controller] + db_ctrl[Database Controller] + flightdeck_ctrl[Flightdeck Controller] + end + + subgraph k8s [Kubernetes Resources] + deployments[Deployments] + services[Services] + ingresses[Ingresses] + configmaps[ConfigMaps] + secrets[Secrets] + pvcs[PVCs] + rbac[RBAC] + end + + %% User creates Site + kubectl --> site + + %% Site controller creates product CRs + site --> site_ctrl + site_ctrl --> connect_cr + site_ctrl --> workbench_cr + site_ctrl --> pm_cr + site_ctrl --> chronicle_cr + site_ctrl --> keycloak_cr + site_ctrl --> flightdeck_cr + site_ctrl --> pgdb_cr + + %% Product controllers watch CRs + connect_cr --> connect_ctrl + workbench_cr --> workbench_ctrl + pm_cr --> pm_ctrl + chronicle_cr --> chronicle_ctrl + pgdb_cr --> db_ctrl + flightdeck_cr --> flightdeck_ctrl + + %% Controllers create K8s resources + connect_ctrl --> k8s + workbench_ctrl --> k8s + pm_ctrl --> k8s + chronicle_ctrl --> k8s + db_ctrl --> k8s + flightdeck_ctrl --> k8s + + classDef crdStyle fill:#E8F5E9,stroke:#388E3C + classDef ctrlStyle fill:#E3F2FD,stroke:#1976D2 + classDef k8sStyle fill:#FFF3E0,stroke:#F57C00 + + class site,connect_cr,workbench_cr,pm_cr,chronicle_cr,keycloak_cr,flightdeck_cr,pgdb_cr crdStyle + class site_ctrl,connect_ctrl,workbench_ctrl,pm_ctrl,chronicle_ctrl,db_ctrl,flightdeck_ctrl ctrlStyle + class deployments,services,ingresses,configmaps,secrets,pvcs,rbac k8sStyle +``` + +### Reconciliation Flow + +```mermaid +sequenceDiagram + participant User + participant K8s as Kubernetes API + participant SiteCtrl as Site Controller + participant ProductCR as Product CRs + participant ProductCtrl as Product Controllers + participant Resources as K8s Resources + + User->>K8s: Create/Update Site CR + K8s->>SiteCtrl: Watch event triggered + + rect rgb(227, 242, 253) + Note over SiteCtrl: Site Reconciliation + SiteCtrl->>SiteCtrl: Determine database URL + SiteCtrl->>SiteCtrl: Provision volumes (if needed) + SiteCtrl->>ProductCR: Create/Update Connect CR + SiteCtrl->>ProductCR: Create/Update Workbench CR + SiteCtrl->>ProductCR: Create/Update PackageManager CR + SiteCtrl->>ProductCR: Create/Update Chronicle CR + SiteCtrl->>ProductCR: Create/Update Flightdeck CR + SiteCtrl->>ProductCR: Create/Update Keycloak CR + end + + ProductCR->>ProductCtrl: Watch events triggered + + rect rgb(232, 245, 233) + Note over ProductCtrl: Product Reconciliation + ProductCtrl->>ProductCtrl: Ensure database exists + ProductCtrl->>Resources: Create ConfigMaps + ProductCtrl->>Resources: Create Secrets + ProductCtrl->>Resources: Create PVCs + ProductCtrl->>Resources: Create Deployment + ProductCtrl->>Resources: Create Service + ProductCtrl->>Resources: Create Ingress + ProductCtrl->>Resources: Create RBAC (if off-host) + end + + Resources-->>K8s: Resources created + K8s-->>User: Site ready +``` + +### Workbench Architecture + +```mermaid +flowchart TB + subgraph external [External Access] + user[User Browser] + ingress[Ingress Controller] + end + + subgraph workbench_pod [Workbench Pod] + wb_server[Workbench Server] + launcher[Job Launcher] + end + + subgraph k8s_api [Kubernetes API] + api[API Server] + end + + subgraph sessions [Session Pods] + session1[Session Pod 1
RStudio/VS Code/Jupyter] + session2[Session Pod 2
RStudio/VS Code/Jupyter] + session3[Session Pod N
...] + end + + subgraph storage [Shared Storage] + home_pvc[Home Directory PVC
ReadWriteMany] + shared_pvc[Shared Storage PVC
ReadWriteMany] + end + + subgraph config [Configuration] + cm[ConfigMaps] + templates[Job Templates] + session_cm[Session ConfigMap] + end + + %% User flow + user --> ingress + ingress --> wb_server + + %% Launcher creates sessions + wb_server --> launcher + launcher --> api + api --> session1 + api --> session2 + api --> session3 + + %% Storage connections + wb_server --> home_pvc + session1 --> home_pvc + session2 --> home_pvc + session3 --> home_pvc + session1 --> shared_pvc + session2 --> shared_pvc + session3 --> shared_pvc + + %% Configuration + cm --> wb_server + templates --> launcher + session_cm --> session1 + session_cm --> session2 + session_cm --> session3 + + classDef external fill:#FAEEE9,stroke:#ab4d26 + classDef workbench fill:#E3F2FD,stroke:#1976D2 + classDef session fill:#E8F5E9,stroke:#388E3C + classDef storage fill:#FFF3E0,stroke:#F57C00 + classDef config fill:#F3E5F5,stroke:#7B1FA2 + + class user,ingress external + class wb_server,launcher workbench + class session1,session2,session3 session + class home_pvc,shared_pvc storage + class cm,templates,session_cm config +``` + +### Component Relationships + +```mermaid +flowchart LR + subgraph products [Posit Team Products] + flightdeck[Flightdeck
Landing Page] + workbench[Workbench
Development] + connect[Connect
Publishing] + pm[Package Manager
Packages] + chronicle[Chronicle
Telemetry] + end + + subgraph shared [Shared Infrastructure] + keycloak[Keycloak
Authentication] + postgres[(PostgreSQL
Database)] + storage[(Shared Storage
NFS/EFS/FSx)] + end + + %% Landing page links to products + flightdeck -.-> workbench + flightdeck -.-> connect + flightdeck -.-> pm + + %% Product interactions + workbench -->|Publish content| connect + workbench -->|Fetch packages| pm + connect -->|Fetch packages| pm + + %% Shared infrastructure + workbench --> keycloak + connect --> keycloak + pm --> keycloak + + workbench --> postgres + connect --> postgres + pm --> postgres + chronicle --> postgres + + workbench --> storage + connect --> storage + + %% Chronicle collects from products + chronicle -.->|Collect metrics| workbench + chronicle -.->|Collect metrics| connect + chronicle -.->|Collect metrics| pm + + classDef product fill:#E3F2FD,stroke:#1976D2 + classDef infra fill:#E8F5E9,stroke:#388E3C + + class flightdeck,workbench,connect,pm,chronicle product + class keycloak,postgres,storage infra +``` + ## Key Concepts ### Site CRD @@ -63,10 +312,42 @@ kubectl edit site main -n posit-team ### Check Operator Logs ```bash -kubectl logs -n posit-team deploy/team-operator +# Operator runs in posit-team-system namespace +kubectl logs -n posit-team-system deployment/team-operator-controller-manager ``` +## Namespaces + +Team Operator uses two namespaces: + +| Namespace | Purpose | +|-----------|---------| +| `posit-team-system` | Where the operator controller runs | +| `posit-team` (or configured `watchNamespace`) | Where Site CRs and deployed products live | + ## Related Documentation -- [Site Management Guide](../guides/product-team-site-management.md) - For product teams -- [Adding Config Options](../guides/adding-config-options.md) - For contributors +### Deployment and Operations + +- [Site Management Guide](guides/product-team-site-management.md) - Creating, updating, and managing Site resources +- [Upgrading Guide](guides/upgrading.md) - Upgrade procedures and version migrations +- [Troubleshooting Guide](guides/troubleshooting.md) - Common issues and debugging techniques + +### Product Configuration + +- [Workbench Configuration](guides/workbench-configuration.md) - Interactive development environment setup +- [Connect Configuration](guides/connect-configuration.md) - Publishing platform configuration +- [Package Manager Configuration](guides/packagemanager-configuration.md) - Package repository management + +### Authentication and Security + +- [Authentication Setup](guides/authentication-setup.md) - SSO, OAuth, and Keycloak integration + +### Reference + +- [Architecture](architecture.md) - Detailed architecture diagrams with component explanations +- [API Reference](api-reference.md) - Complete CRD field reference for all resources + +### For Contributors + +- [Adding Config Options](guides/adding-config-options.md) - How to extend Site/product configurations diff --git a/docs/api-reference.md b/docs/api-reference.md new file mode 100644 index 00000000..72547d71 --- /dev/null +++ b/docs/api-reference.md @@ -0,0 +1,821 @@ +# Team Operator API Reference + +This document provides a comprehensive reference for the Custom Resource Definitions (CRDs) provided by the Team Operator. + +**API Group:** `team.posit.co/v1beta1` + +## Table of Contents + +- [Site](#site) +- [Connect](#connect) +- [Workbench](#workbench) +- [PackageManager](#packagemanager) +- [Chronicle](#chronicle) +- [PostgresDatabase](#postgresdatabase) +- [Flightdeck](#flightdeck) +- [Shared Types Reference](#shared-types-reference) + - [AuthSpec](#authspec) + - [SecretConfig](#secretconfig) + - [VolumeSource](#volumesource) + - [VolumeSpec](#volumespec) + - [LicenseSpec](#licensespec) + - [SessionConfig](#sessionconfig) + - [SSHKeyConfig](#sshkeyconfig) + +--- + +## Site + +The Site CRD is the primary resource for managing a complete Posit Team deployment. It orchestrates all product components (Connect, Workbench, Package Manager, Chronicle) within a single site. + +**Kind:** `Site` +**Plural:** `sites` +**Scope:** Namespaced + +### Spec Fields + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `.spec.domain` | `string` | **Yes** | The core domain name associated with the Posit Team Site | +| `.spec.awsAccountId` | `string` | No | AWS Account ID used for EKS-to-IAM annotations | +| `.spec.clusterDate` | `string` | No | Cluster date ID (YYYYmmdd) used for EKS-to-IAM annotations | +| `.spec.workloadCompoundName` | `string` | No | Name for the workload | +| `.spec.secretType` | `SiteSecretType` | No | **DEPRECATED** - Type of secret management to use | +| `.spec.ingressClass` | `string` | No | Ingress class for creating ingress routes | +| `.spec.ingressAnnotations` | `map[string]string` | No | Annotations applied to all ingress routes | +| `.spec.imagePullSecrets` | `[]string` | No | Image pull secrets for all image pulls (must exist in namespace) | +| `.spec.volumeSource` | [`VolumeSource`](#volumesource) | No | Definition of where volumes should be created from | +| `.spec.sharedDirectory` | `string` | No | Name of directory mounted into Workbench and Connect at `/mnt/` (no slashes) | +| `.spec.volumeSubdirJobOff` | `bool` | No | Disables VolumeSubdir provisioning Kubernetes job | +| `.spec.extraSiteServiceAccounts` | `[]ServiceAccountConfig` | No | Additional service accounts prefixed by `-` | +| `.spec.secret` | [`SecretConfig`](#secretconfig) | No | Secret management configuration for this Site | +| `.spec.workloadSecret` | [`SecretConfig`](#secretconfig) | No | Managed persistent secret for the entire workload account | +| `.spec.mainDatabaseCredentialSecret` | [`SecretConfig`](#secretconfig) | No | Secret for storing main database credentials | +| `.spec.disablePrePullImages` | `bool` | No | Disables pre-pulling of images | +| `.spec.dropDatabaseOnTeardown` | `bool` | No | Drop database when tearing down the site | +| `.spec.debug` | `bool` | No | Enable debug settings | +| `.spec.logFormat` | `LogFormat` | No | Log output format | +| `.spec.networkTrust` | `NetworkTrust` | No | Network trust level (0-100, default: 100) | +| `.spec.packageManagerUrl` | `string` | No | Package Manager URL for Workbench (defaults to local Package Manager) | +| `.spec.efsEnabled` | `bool` | No | Enable EFS for this site (allows workbench sessions to access EFS mount targets) | +| `.spec.vpcCIDR` | `string` | No | VPC CIDR block for EFS network policies | +| `.spec.enableFqdnHealthChecks` | `*bool` | No | Enable FQDN-based health check targets for Grafana Alloy (default: true) | + +#### Product Configuration + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `.spec.flightdeck` | [`InternalFlightdeckSpec`](#internalflightdeckspec) | No | Flightdeck (landing page) configuration | +| `.spec.packageManager` | [`InternalPackageManagerSpec`](#internalpackagemanagerspec) | No | Posit Package Manager configuration | +| `.spec.connect` | [`InternalConnectSpec`](#internalconnectspec) | No | Posit Connect configuration | +| `.spec.workbench` | [`InternalWorkbenchSpec`](#internalworkbenchspec) | No | Posit Workbench configuration | +| `.spec.chronicle` | [`InternalChronicleSpec`](#internalchroniclespec) | No | Posit Chronicle configuration | +| `.spec.keycloak` | `InternalKeycloakSpec` | No | Keycloak configuration | + +### Example Manifest + +```yaml +apiVersion: team.posit.co/v1beta1 +kind: Site +metadata: + name: my-site + namespace: posit-team +spec: + domain: example.posit.team + ingressClass: nginx + ingressAnnotations: + nginx.ingress.kubernetes.io/proxy-body-size: "0" + secret: + type: kubernetes + vaultName: my-site-secrets + volumeSource: + type: nfs + dnsName: nfs.example.com + connect: + license: + type: FILE + existingSecretName: connect-license + image: ghcr.io/rstudio/rstudio-connect:2024.06.0 + replicas: 2 + auth: + type: oidc + clientId: connect-client + issuer: https://auth.example.com + workbench: + license: + type: FILE + existingSecretName: workbench-license + image: ghcr.io/rstudio/rstudio-workbench:2024.04.2 + replicas: 1 + packageManager: + license: + type: FILE + existingSecretName: packagemanager-license + image: ghcr.io/rstudio/rstudio-package-manager:2024.04.4 + replicas: 1 +``` + +--- + +## Connect + +The Connect CRD manages standalone Posit Connect deployments. When using the Site CRD, Connect configuration is typically specified via `.spec.connect` rather than creating a separate Connect resource. + +**Kind:** `Connect` +**Plural:** `connects` +**Short Names:** `con`, `cons` +**Scope:** Namespaced + +### Spec Fields + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `.spec.license` | [`LicenseSpec`](#licensespec) | No | License configuration | +| `.spec.config` | `ConnectConfig` | No | Connect application configuration | +| `.spec.sessionConfig` | [`SessionConfig`](#sessionconfig) | No | Session pod configuration | +| `.spec.volume` | [`VolumeSpec`](#volumespec) | No | Data volume configuration | +| `.spec.secretType` | `SiteSecretType` | No | Secret management type | +| `.spec.auth` | [`AuthSpec`](#authspec) | No | Authentication configuration | +| `.spec.url` | `string` | No | Public URL for Connect | +| `.spec.databaseConfig` | `PostgresDatabaseConfig` | No | PostgreSQL database configuration | +| `.spec.ingressClass` | `string` | No | Ingress class for routing | +| `.spec.ingressAnnotations` | `map[string]string` | No | Ingress annotations | +| `.spec.imagePullSecrets` | `[]string` | No | Image pull secrets | +| `.spec.nodeSelector` | `map[string]string` | No | Node selector for pod scheduling | +| `.spec.addEnv` | `map[string]string` | No | Additional environment variables | +| `.spec.offHostExecution` | `bool` | No | Enable off-host execution (Kubernetes launcher) | +| `.spec.image` | `string` | No | Connect container image | +| `.spec.imagePullPolicy` | `PullPolicy` | No | Image pull policy | +| `.spec.sleep` | `bool` | No | Put service to sleep (debugging) | +| `.spec.sessionImage` | `string` | No | Container image for sessions | +| `.spec.awsAccountId` | `string` | No | AWS Account ID for IAM annotations | +| `.spec.clusterDate` | `string` | No | Cluster date ID for IAM annotations | +| `.spec.workloadCompoundName` | `string` | No | Workload name | +| `.spec.chronicleAgentImage` | `string` | No | Chronicle Agent container image | +| `.spec.additionalVolumes` | `[]VolumeSpec` | No | Additional volume definitions | +| `.spec.secret` | [`SecretConfig`](#secretconfig) | No | Secret management configuration | +| `.spec.workloadSecret` | [`SecretConfig`](#secretconfig) | No | Workload secret configuration | +| `.spec.mainDatabaseCredentialSecret` | [`SecretConfig`](#secretconfig) | No | Database credential secret | +| `.spec.debug` | `bool` | No | Enable debug settings | +| `.spec.replicas` | `int` | No | Number of Connect replicas | +| `.spec.dsnSecret` | `string` | No | DSN secret name for sessions | +| `.spec.chronicleSidecarProductApiKeyEnabled` | `bool` | No | Enable Chronicle sidecar API key injection | + +### Status Fields + +| Field | Type | Description | +|-------|------|-------------| +| `.status.keySecretRef` | `SecretReference` | Reference to the key secret | +| `.status.ready` | `bool` | Whether Connect is ready | + +### Example Manifest + +```yaml +apiVersion: team.posit.co/v1beta1 +kind: Connect +metadata: + name: my-connect + namespace: posit-team +spec: + license: + type: FILE + existingSecretName: connect-license + image: ghcr.io/rstudio/rstudio-connect:2024.06.0 + imagePullPolicy: IfNotPresent + replicas: 2 + offHostExecution: true + auth: + type: oidc + clientId: connect-client + issuer: https://auth.example.com + config: + Server: + Address: "https://connect.example.com" + Database: + Provider: postgres + volume: + create: true + size: 100Gi + storageClassName: gp3 +``` + +--- + +## Workbench + +The Workbench CRD manages standalone Posit Workbench deployments. When using the Site CRD, Workbench configuration is typically specified via `.spec.workbench` rather than creating a separate Workbench resource. + +**Kind:** `Workbench` +**Plural:** `workbenches` +**Singular:** `workbench` +**Short Names:** `wb`, `wbs` +**Scope:** Namespaced + +### Spec Fields + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `.spec.license` | [`LicenseSpec`](#licensespec) | No | License configuration | +| `.spec.config` | `WorkbenchConfig` | No | Workbench application configuration | +| `.spec.secretConfig` | `WorkbenchSecretConfig` | No | Secret configuration for Workbench | +| `.spec.sessionConfig` | [`SessionConfig`](#sessionconfig) | No | Session pod configuration | +| `.spec.volume` | [`VolumeSpec`](#volumespec) | No | Home directory volume configuration | +| `.spec.secretType` | `SiteSecretType` | No | Secret management type | +| `.spec.auth` | [`AuthSpec`](#authspec) | No | Authentication configuration | +| `.spec.url` | `string` | No | Public URL for Workbench | +| `.spec.parentUrl` | `string` | No | Parent URL for navigation | +| `.spec.nonRoot` | `bool` | No | Enable rootless execution mode | +| `.spec.databaseConfig` | `PostgresDatabaseConfig` | No | PostgreSQL database configuration | +| `.spec.ingressClass` | `string` | No | Ingress class for routing | +| `.spec.ingressAnnotations` | `map[string]string` | No | Ingress annotations | +| `.spec.imagePullSecrets` | `[]string` | No | Image pull secrets | +| `.spec.nodeSelector` | `map[string]string` | No | Node selector for pod scheduling | +| `.spec.tolerations` | `[]Toleration` | No | Pod tolerations | +| `.spec.addEnv` | `map[string]string` | No | Additional environment variables | +| `.spec.offHostExecution` | `bool` | No | Enable off-host execution (Kubernetes launcher) | +| `.spec.image` | `string` | No | Workbench container image | +| `.spec.imagePullPolicy` | `PullPolicy` | No | Image pull policy | +| `.spec.sleep` | `bool` | No | Put service to sleep (debugging) | +| `.spec.snowflake` | `SnowflakeConfig` | No | Snowflake integration configuration | +| `.spec.awsAccountId` | `string` | No | AWS Account ID for IAM annotations | +| `.spec.clusterDate` | `string` | No | Cluster date ID for IAM annotations | +| `.spec.workloadCompoundName` | `string` | No | Workload name | +| `.spec.chronicleAgentImage` | `string` | No | Chronicle Agent container image | +| `.spec.additionalVolumes` | `[]VolumeSpec` | No | Additional volume definitions | +| `.spec.secret` | [`SecretConfig`](#secretconfig) | No | Secret management configuration | +| `.spec.workloadSecret` | [`SecretConfig`](#secretconfig) | No | Workload secret configuration | +| `.spec.mainDatabaseCredentialSecret` | [`SecretConfig`](#secretconfig) | No | Database credential secret | +| `.spec.replicas` | `int` | No | Number of Workbench replicas | +| `.spec.dsnSecret` | `string` | No | DSN secret name for sessions | +| `.spec.chronicleSidecarProductApiKeyEnabled` | `bool` | No | Enable Chronicle sidecar API key injection | +| `.spec.authLoginPageHtml` | `string` | No | Custom HTML for login page (max 64KB) | + +### Status Fields + +| Field | Type | Description | +|-------|------|-------------| +| `.status.ready` | `bool` | Whether Workbench is ready | +| `.status.keySecretRef` | `SecretReference` | Reference to the key secret | + +### Example Manifest + +```yaml +apiVersion: team.posit.co/v1beta1 +kind: Workbench +metadata: + name: my-workbench + namespace: posit-team +spec: + license: + type: FILE + existingSecretName: workbench-license + image: ghcr.io/rstudio/rstudio-workbench:2024.04.2 + imagePullPolicy: IfNotPresent + replicas: 1 + offHostExecution: true + auth: + type: oidc + clientId: workbench-client + issuer: https://auth.example.com + config: + WorkbenchIniConfig: + RServer: + adminEnabled: 1 + adminGroup: "workbench-admin" + volume: + create: true + size: 500Gi + storageClassName: gp3 +``` + +--- + +## PackageManager + +The PackageManager CRD manages standalone Posit Package Manager deployments. When using the Site CRD, Package Manager configuration is typically specified via `.spec.packageManager` rather than creating a separate PackageManager resource. + +**Kind:** `PackageManager` +**Plural:** `packagemanagers` +**Short Names:** `pm`, `pms` +**Scope:** Namespaced + +### Spec Fields + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `.spec.license` | [`LicenseSpec`](#licensespec) | No | License configuration | +| `.spec.config` | `PackageManagerConfig` | No | Package Manager application configuration | +| `.spec.volume` | [`VolumeSpec`](#volumespec) | No | Data volume configuration | +| `.spec.secretType` | `SiteSecretType` | No | Secret management type | +| `.spec.url` | `string` | No | Public URL for Package Manager | +| `.spec.databaseConfig` | `PostgresDatabaseConfig` | No | PostgreSQL database configuration | +| `.spec.ingressClass` | `string` | No | Ingress class for routing | +| `.spec.ingressAnnotations` | `map[string]string` | No | Ingress annotations | +| `.spec.imagePullSecrets` | `[]string` | No | Image pull secrets | +| `.spec.nodeSelector` | `map[string]string` | No | Node selector for pod scheduling | +| `.spec.addEnv` | `map[string]string` | No | Additional environment variables | +| `.spec.image` | `string` | No | Package Manager container image | +| `.spec.imagePullPolicy` | `PullPolicy` | No | Image pull policy | +| `.spec.sleep` | `bool` | No | Put service to sleep (debugging) | +| `.spec.awsAccountId` | `string` | No | AWS Account ID for IAM annotations | +| `.spec.workloadCompoundName` | `string` | No | Workload name | +| `.spec.clusterDate` | `string` | No | Cluster date ID for IAM annotations | +| `.spec.chronicleAgentImage` | `string` | No | Chronicle Agent container image | +| `.spec.secret` | [`SecretConfig`](#secretconfig) | No | Secret management configuration | +| `.spec.workloadSecret` | [`SecretConfig`](#secretconfig) | No | Workload secret configuration | +| `.spec.mainDatabaseCredentialSecret` | [`SecretConfig`](#secretconfig) | No | Database credential secret | +| `.spec.replicas` | `int` | No | Number of Package Manager replicas | +| `.spec.gitSSHKeys` | [`[]SSHKeyConfig`](#sshkeyconfig) | No | SSH key configurations for Git authentication | +| `.spec.azureFiles` | `AzureFilesConfig` | No | Azure Files integration configuration | + +### Status Fields + +| Field | Type | Description | +|-------|------|-------------| +| `.status.keySecretRef` | `SecretReference` | Reference to the key secret | +| `.status.ready` | `bool` | Whether Package Manager is ready | + +### Example Manifest + +```yaml +apiVersion: team.posit.co/v1beta1 +kind: PackageManager +metadata: + name: my-packagemanager + namespace: posit-team +spec: + license: + type: FILE + existingSecretName: packagemanager-license + image: ghcr.io/rstudio/rstudio-package-manager:2024.04.4 + imagePullPolicy: IfNotPresent + replicas: 1 + config: + Server: + Address: ":4242" + DataDir: /data + Database: + Provider: postgres + S3Storage: + Bucket: my-rspm-bucket + Region: us-east-1 +``` + +--- + +## Chronicle + +The Chronicle CRD manages Posit Chronicle deployments for usage tracking and auditing. + +**Kind:** `Chronicle` +**Plural:** `chronicles` +**Short Names:** `pcr`, `chr` +**Scope:** Namespaced + +### Spec Fields + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `.spec.config` | `ChronicleConfig` | No | Chronicle application configuration | +| `.spec.imagePullSecrets` | `[]string` | No | Image pull secrets | +| `.spec.nodeSelector` | `map[string]string` | No | Node selector for pod scheduling | +| `.spec.addEnv` | `map[string]string` | No | Additional environment variables | +| `.spec.image` | `string` | No | Chronicle container image | +| `.spec.awsAccountId` | `string` | No | AWS Account ID for IAM annotations | +| `.spec.clusterDate` | `string` | No | Cluster date ID for IAM annotations | +| `.spec.workloadCompoundName` | `string` | No | Workload name | + +### ChronicleConfig + +| Field | Type | Description | +|-------|------|-------------| +| `.Http` | `ChronicleHttpConfig` | HTTP server configuration | +| `.Metrics` | `ChronicleMetricsConfig` | Prometheus metrics configuration | +| `.Profiling` | `ChronicleProfilingConfig` | Profiling configuration | +| `.S3Storage` | `ChronicleS3StorageConfig` | S3 storage configuration | +| `.LocalStorage` | `ChronicleLocalStorageConfig` | Local storage configuration | +| `.Logging` | `ChronicleLoggingConfig` | Logging configuration | + +### Status Fields + +| Field | Type | Description | +|-------|------|-------------| +| `.status.ready` | `bool` | Whether Chronicle is ready | + +### Example Manifest + +```yaml +apiVersion: team.posit.co/v1beta1 +kind: Chronicle +metadata: + name: my-chronicle + namespace: posit-team +spec: + image: ghcr.io/rstudio/chronicle:2023.10.4 + config: + Http: + Listen: ":8080" + S3Storage: + Enabled: true + Bucket: my-chronicle-bucket + Region: us-east-1 + Metrics: + Enabled: true + Listen: ":9090" +``` + +--- + +## PostgresDatabase + +The PostgresDatabase CRD manages PostgreSQL database provisioning for Posit Team products. + +**Kind:** `PostgresDatabase` +**Plural:** `postgresdatabases` +**Short Names:** `pgdb`, `pgdbs` +**Scope:** Namespaced + +### Spec Fields + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `.spec.url` | `string` | **Yes** | PostgreSQL connection URL (must match `^postgres.+@.+/.+`) | +| `.spec.secretVault` | `string` | **Yes** | Secret ID for retrieving the password | +| `.spec.secretPasswordKey` | `string` | **Yes** | Password key within the SecretVault | +| `.spec.secret` | [`SecretConfig`](#secretconfig) | No | Secret configuration for password retrieval | +| `.spec.workloadSecret` | [`SecretConfig`](#secretconfig) | No | Workload secret configuration | +| `.spec.mainDbCredentialSecret` | [`SecretConfig`](#secretconfig) | No | Main database credential secret | +| `.spec.extensions` | `[]string` | No | PostgreSQL extensions to enable | +| `.spec.schemas` | `[]string` | No | Database schemas to create | +| `.spec.teardown` | `PostgresDatabaseSpecTeardown` | No | Teardown behavior configuration | + +### PostgresDatabaseConfig + +Used by products to configure database connections: + +| Field | Type | Description | +|-------|------|-------------| +| `.host` | `string` | Database host | +| `.sslMode` | `string` | SSL mode for connections | +| `.dropOnTeardown` | `bool` | Drop database on teardown | +| `.schema` | `string` | Default schema | +| `.instrumentationSchema` | `string` | Schema for instrumentation data | + +### Example Manifest + +```yaml +apiVersion: team.posit.co/v1beta1 +kind: PostgresDatabase +metadata: + name: connect-db + namespace: posit-team +spec: + url: "postgres://connect_user@db.example.com/connect_db" + secretVault: my-site-secrets + secretPasswordKey: connect-db-password + extensions: + - pgcrypto + schemas: + - connect + - instrumentation +``` + +--- + +## Flightdeck + +The Flightdeck CRD manages the Posit Team landing page dashboard. + +**Kind:** `Flightdeck` +**Plural:** `flightdecks` +**Scope:** Namespaced + +### Spec Fields + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `.spec.siteName` | `string` | No | Name of the Site that owns this Flightdeck | +| `.spec.image` | `string` | No | Flightdeck container image | +| `.spec.imagePullPolicy` | `PullPolicy` | No | Image pull policy | +| `.spec.port` | `int32` | No | Container listening port (default: 8080) | +| `.spec.replicas` | `int` | No | Number of replicas (default: 1) | +| `.spec.featureEnabler` | `FeatureEnablerConfig` | No | Feature toggles | +| `.spec.domain` | `string` | No | Domain name for ingress | +| `.spec.ingressClass` | `string` | No | Ingress class to use | +| `.spec.ingressAnnotations` | `map[string]string` | No | Ingress annotations | +| `.spec.imagePullSecrets` | `[]string` | No | Image pull secrets | +| `.spec.awsAccountId` | `string` | No | AWS Account ID | +| `.spec.clusterDate` | `string` | No | Cluster date ID | +| `.spec.workloadCompoundName` | `string` | No | Workload name | +| `.spec.logLevel` | `string` | No | Log level (debug, info, warn, error) - default: "info" | +| `.spec.logFormat` | `string` | No | Log format (text, json) - default: "text" | + +### FeatureEnablerConfig + +| Field | Type | Description | +|-------|------|-------------| +| `.showConfig` | `bool` | Enable configuration page (default: false) | +| `.showAcademy` | `bool` | Enable academy page (default: false) | + +### Status Fields + +| Field | Type | Description | +|-------|------|-------------| +| `.status.ready` | `bool` | Whether Flightdeck is ready | + +### Example Manifest + +```yaml +apiVersion: team.posit.co/v1beta1 +kind: Flightdeck +metadata: + name: my-flightdeck + namespace: posit-team +spec: + siteName: my-site + image: docker.io/posit/ptd-flightdeck:latest + replicas: 1 + domain: flightdeck.example.com + ingressClass: nginx + featureEnabler: + showConfig: true +``` + +--- + +## Shared Types Reference + +### AuthSpec + +Authentication configuration used by Connect and Workbench. + +| Field | Type | Description | +|-------|------|-------------| +| `.type` | `AuthType` | Authentication type: `password`, `oidc`, or `saml` | +| `.clientId` | `string` | OAuth2/OIDC client ID | +| `.issuer` | `string` | OIDC issuer URL | +| `.groups` | `bool` | Enable group synchronization | +| `.usernameClaim` | `string` | OIDC claim for username | +| `.emailClaim` | `string` | OIDC claim for email | +| `.uniqueIdClaim` | `string` | OIDC claim for unique ID | +| `.groupsClaim` | `string` | OIDC claim for groups | +| `.disableGroupsClaim` | `bool` | Disable groups claim processing | +| `.samlMetadataUrl` | `string` | SAML IdP metadata URL | +| `.samlIdPAttributeProfile` | `string` | SAML IdP attribute profile | +| `.samlUsernameAttribute` | `string` | SAML attribute for username | +| `.samlFirstNameAttribute` | `string` | SAML attribute for first name | +| `.samlLastNameAttribute` | `string` | SAML attribute for last name | +| `.samlEmailAttribute` | `string` | SAML attribute for email | +| `.scopes` | `[]string` | Additional OIDC scopes | +| `.viewerRoleMapping` | `[]string` | Groups mapped to viewer role | +| `.publisherRoleMapping` | `[]string` | Groups mapped to publisher role | +| `.administratorRoleMapping` | `[]string` | Groups mapped to administrator role | + +**AuthType Values:** + +| Value | Description | +|-------|-------------| +| `password` | Local username/password authentication | +| `oidc` | OpenID Connect authentication | +| `saml` | SAML 2.0 authentication | + +### SecretConfig + +Configuration for secret management. + +| Field | Type | Description | +|-------|------|-------------| +| `.vaultName` | `string` | Name of the secret vault/secret | +| `.type` | `SiteSecretType` | Secret management type | + +**SiteSecretType Values:** + +| Value | Description | +|-------|-------------| +| `kubernetes` | Use Kubernetes Secrets | +| `aws` | Use AWS Secrets Manager with CSI driver | +| `test` | Test mode (in-memory) | + +### VolumeSource + +Configuration for the source of persistent volumes. + +| Field | Type | Description | +|-------|------|-------------| +| `.type` | `VolumeSourceType` | Volume source type | +| `.volumeId` | `string` | Volume identifier (e.g., FSx volume ID) | +| `.dnsName` | `string` | DNS name for volume access | + +**VolumeSourceType Values:** + +| Value | Description | +|-------|-------------| +| `fsx-zfs` | Amazon FSx for OpenZFS | +| `nfs` | NFS server | +| `azure-netapp` | Azure NetApp Files | + +### VolumeSpec + +Specification for creating or mounting a PersistentVolumeClaim. + +| Field | Type | Description | +|-------|------|-------------| +| `.create` | `bool` | Whether to create the PVC | +| `.accessModes` | `[]string` | Access modes (when creating) | +| `.volumeName` | `string` | PV name to reference (when creating) | +| `.storageClassName` | `string` | Storage class name (when creating) | +| `.size` | `string` | PVC size (when creating) | +| `.pvcName` | `string` | Existing PVC name (when not creating) | +| `.mountPath` | `string` | Mount path for additional volumes | +| `.readOnly` | `bool` | Mount as read-only (default: false) | + +### LicenseSpec + +Product license configuration. + +| Field | Type | Description | +|-------|------|-------------| +| `.type` | `LicenseType` | License type | +| `.key` | `string` | License key (for KEY type) | +| `.existingSecretName` | `string` | Name of existing secret containing license | +| `.existingSecretKey` | `string` | Key within the secret (default: "license.lic") | + +**LicenseType Values:** + +| Value | Description | +|-------|-------------| +| `KEY` | License key string | +| `FILE` | License file | + +### SessionConfig + +Configuration for session pods (Connect and Workbench). + +| Field | Type | Description | +|-------|------|-------------| +| `.service` | `ServiceConfig` | Service configuration for sessions | +| `.pod` | `PodConfig` | Pod configuration for sessions | +| `.job` | `JobConfig` | Job configuration for sessions | + +**ServiceConfig:** + +| Field | Type | Description | +|-------|------|-------------| +| `.type` | `string` | Kubernetes service type | +| `.annotations` | `map[string]string` | Service annotations | +| `.labels` | `map[string]string` | Service labels | + +**PodConfig:** + +| Field | Type | Description | +|-------|------|-------------| +| `.annotations` | `map[string]string` | Pod annotations | +| `.labels` | `map[string]string` | Pod labels | +| `.serviceAccountName` | `string` | Service account for pods | +| `.volumes` | `[]Volume` | Additional volumes | +| `.volumeMounts` | `[]VolumeMount` | Additional volume mounts | +| `.env` | `[]EnvVar` | Environment variables | +| `.imagePullPolicy` | `PullPolicy` | Image pull policy | +| `.imagePullSecrets` | `[]LocalObjectReference` | Image pull secrets | +| `.initContainers` | `[]Container` | Init containers | +| `.extraContainers` | `[]Container` | Sidecar containers | +| `.containerSecurityContext` | `SecurityContext` | Container security context | +| `.tolerations` | `[]Toleration` | Pod tolerations | +| `.affinity` | `*Affinity` | Pod affinity rules | +| `.nodeSelector` | `map[string]string` | Node selector | +| `.priorityClassName` | `string` | Priority class name | +| `.command` | `[]string` | Override container command | + +### SSHKeyConfig + +SSH key configuration for Git authentication in Package Manager. + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `.name` | `string` | **Yes** | Unique identifier (1-63 chars, lowercase alphanumeric with hyphens) | +| `.host` | `string` | **Yes** | Git host domain (e.g., "github.com") | +| `.secretRef` | `SecretReference` | **Yes** | Reference to the SSH key secret | +| `.passphraseSecretRef` | `*SecretReference` | No | Reference to passphrase secret for encrypted keys | + +**SecretReference:** + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `.source` | `string` | **Yes** | Secret source: `aws-secrets-manager`, `kubernetes`, or `azure-key-vault` | +| `.name` | `string` | **Yes** | Secret name in the specified source | +| `.key` | `string` | No | Key within the secret (primarily for Kubernetes secrets) | + +--- + +## Site Internal Specs + +These types are used within the Site CRD for product configuration. + +### InternalFlightdeckSpec + +| Field | Type | Description | +|-------|------|-------------| +| `.enabled` | `*bool` | Enable Flightdeck (default: true) | +| `.image` | `string` | Container image | +| `.imagePullPolicy` | `PullPolicy` | Image pull policy | +| `.replicas` | `int` | Number of replicas | +| `.featureEnabler` | `FeatureEnablerConfig` | Feature toggles | +| `.logLevel` | `string` | Log level (default: "info") | +| `.logFormat` | `string` | Log format (default: "text") | + +### InternalPackageManagerSpec + +| Field | Type | Description | +|-------|------|-------------| +| `.license` | `LicenseSpec` | License configuration | +| `.volume` | `*VolumeSpec` | Data volume | +| `.nodeSelector` | `map[string]string` | Node selector | +| `.addEnv` | `map[string]string` | Environment variables | +| `.image` | `string` | Container image | +| `.imagePullPolicy` | `PullPolicy` | Image pull policy | +| `.s3Bucket` | `string` | S3 bucket for package storage | +| `.replicas` | `int` | Number of replicas | +| `.domainPrefix` | `string` | Domain prefix (default: "packagemanager") | +| `.gitSSHKeys` | `[]SSHKeyConfig` | SSH keys for Git authentication | +| `.azureFiles` | `*AzureFilesConfig` | Azure Files configuration | + +### InternalConnectSpec + +| Field | Type | Description | +|-------|------|-------------| +| `.license` | `LicenseSpec` | License configuration | +| `.volume` | `*VolumeSpec` | Data volume | +| `.nodeSelector` | `map[string]string` | Node selector | +| `.auth` | `AuthSpec` | Authentication configuration | +| `.addEnv` | `map[string]string` | Environment variables | +| `.image` | `string` | Container image | +| `.sessionImage` | `string` | Session container image | +| `.imagePullPolicy` | `PullPolicy` | Image pull policy | +| `.databricks` | `*DatabricksConfig` | Databricks integration | +| `.loggedInWarning` | `string` | Warning message for logged-in users | +| `.publicWarning` | `string` | Public warning message | +| `.replicas` | `int` | Number of replicas | +| `.experimentalFeatures` | `*InternalConnectExperimentalFeatures` | Experimental features | +| `.domainPrefix` | `string` | Domain prefix (default: "connect") | +| `.gpuSettings` | `*GPUSettings` | GPU resource configuration | +| `.databaseSettings` | `*DatabaseSettings` | Database schema settings | +| `.scheduleConcurrency` | `int` | Schedule concurrency (default: 2) | + +### InternalWorkbenchSpec + +| Field | Type | Description | +|-------|------|-------------| +| `.databricks` | `map[string]DatabricksConfig` | Databricks configurations | +| `.snowflake` | `SnowflakeConfig` | Snowflake configuration | +| `.license` | `LicenseSpec` | License configuration | +| `.volume` | `*VolumeSpec` | Home directory volume | +| `.additionalVolumes` | `[]VolumeSpec` | Additional volumes | +| `.nodeSelector` | `map[string]string` | Node selector | +| `.tolerations` | `[]Toleration` | Pod tolerations | +| `.sessionTolerations` | `[]Toleration` | Session-only tolerations | +| `.createUsersAutomatically` | `bool` | Auto-create users | +| `.adminGroups` | `[]string` | Admin groups (default: ["workbench-admin"]) | +| `.adminSuperuserGroups` | `[]string` | Superuser groups | +| `.addEnv` | `map[string]string` | Environment variables | +| `.auth` | `AuthSpec` | Authentication configuration | +| `.image` | `string` | Container image | +| `.imagePullPolicy` | `PullPolicy` | Image pull policy | +| `.defaultSessionImage` | `string` | Default session image | +| `.extraSessionImages` | `[]string` | Additional session images | +| `.sessionInitContainerImageName` | `string` | Init container image name | +| `.sessionInitContainerImageTag` | `string` | Init container image tag | +| `.replicas` | `int` | Number of replicas | +| `.experimentalFeatures` | `*InternalWorkbenchExperimentalFeatures` | Experimental features | +| `.vsCodeExtensions` | `[]string` | VS Code extensions to install | +| `.vsCodeUserSettings` | `map[string]*JSON` | VS Code user settings | +| `.positronConfig` | `PositronConfig` | Positron configuration | +| `.vsCodeConfig` | `VSCodeConfig` | VS Code configuration | +| `.apiSettings` | `ApiSettingsConfig` | API settings | +| `.domainPrefix` | `string` | Domain prefix (default: "workbench") | +| `.authLoginPageHtml` | `string` | Custom login page HTML | +| `.jupyterConfig` | `*WorkbenchJupyterConfig` | Jupyter configuration | + +### InternalChronicleSpec + +| Field | Type | Description | +|-------|------|-------------| +| `.nodeSelector` | `map[string]string` | Node selector | +| `.image` | `string` | Container image | +| `.addEnv` | `map[string]string` | Environment variables | +| `.imagePullPolicy` | `PullPolicy` | Image pull policy | +| `.s3Bucket` | `string` | S3 bucket for storage | +| `.agentImage` | `string` | Agent container image | + +--- + +## Labels Applied by the Operator + +The Team Operator applies the following labels to managed resources: + +| Label | Description | +|-------|-------------| +| `app.kubernetes.io/managed-by: team-operator` | Indicates resource is managed by the operator | +| `app.kubernetes.io/name` | Component type (e.g., "connect", "workbench") | +| `app.kubernetes.io/instance` | Component instance name | +| `posit.team/site` | Site name | +| `posit.team/component` | Component type | diff --git a/docs/architecture.md b/docs/architecture.md new file mode 100644 index 00000000..eee51ef8 --- /dev/null +++ b/docs/architecture.md @@ -0,0 +1,779 @@ +# Team Operator Architecture + +This document provides detailed architecture diagrams and explanations for the Team Operator and its managed products. + +## Table of Contents + +- [System Overview](#system-overview) +- [Database Architecture](#database-architecture) +- [Connect Architecture](#connect-architecture) +- [Workbench Architecture](#workbench-architecture) +- [Package Manager Architecture](#package-manager-architecture) +- [Flightdeck Architecture](#flightdeck-architecture) +- [Chronicle Architecture](#chronicle-architecture) + +--- + +## System Overview + +The Team Operator follows the Kubernetes operator pattern: a Site Custom Resource (CR) serves as the single source of truth, and controllers reconcile the desired state into running Kubernetes resources. + +``` +User creates Site CR + ↓ +Site Controller reconciles + ↓ +Product CRs created (Connect, Workbench, PackageManager, etc.) + ↓ +Product Controllers reconcile + ↓ +Kubernetes resources created (Deployments, Services, Ingress, etc.) +``` + +### Key Concepts + +| Concept | Description | +|---------|-------------| +| **Site CR** | The top-level resource that defines an entire Posit Team deployment | +| **Product CR** | Child resources (Connect, Workbench, PackageManager) created by the Site controller | +| **Controller** | Watches resources and reconciles them to the desired state | +| **Reconciliation** | The process of comparing desired state (CR spec) with actual state and making corrections | + +--- + +## Database Architecture + +Each Posit Team product requires database storage. The operator provisions separate databases with dedicated users and schemas. + +```mermaid +flowchart TB + subgraph db [Team Operator - Databases] + subgraph pub[PublishDB - Connect] + pub-user(Connect User) + pub-main[Main Schema] + pub-metrics[Instrumentation Schema] + end + pub-user-->pub-main + pub-user-->pub-metrics + + subgraph pkg[PackageDB - Package Manager] + pkg-user(Package Manager User) + pkg-main[Main Schema] + pkg-metrics[Metrics Schema] + end + pkg-user-->pkg-main + pkg-user-->pkg-metrics + + subgraph dev[DevDB - Workbench] + dev-user(Workbench User) + dev-main[Public Schema] + end + dev-user-->dev-main + end + + classDef userNode fill:#FAEEE9,stroke:#ab4d26 + class pub-user,pkg-user,dev-user userNode +``` + +### Component Descriptions + +| Component | Description | +|-----------|-------------| +| **PublishDB** | PostgreSQL database for Connect. Stores published content metadata, user accounts, and access controls. | +| **Main Schema** | Primary data storage for the product (content, users, permissions) | +| **Instrumentation Schema** | Metrics and usage tracking data (Connect and Package Manager only) | +| **PackageDB** | PostgreSQL database for Package Manager. Stores package metadata, repository configurations, and sync state. | +| **Metrics Schema** | Analytics data for package downloads and repository usage | +| **DevDB** | PostgreSQL database for Workbench. Stores user sessions, project metadata, and launcher state. | +| **Public Schema** | Workbench uses a single schema for all data | + +### Database User Isolation + +Each product gets a dedicated database user with access only to its own schemas. This provides: +- **Security isolation**: Products cannot access each other's data +- **Resource tracking**: Database connections can be attributed to specific products +- **Independent credentials**: Rotating one product's credentials doesn't affect others + +--- + +## Connect Architecture + +Posit Connect is a publishing platform for data science content. The operator manages its deployment including off-host content execution. + +```mermaid +flowchart TB + subgraph external [External Configuration] + manual(Manual Setup) + license(License) + clientsecret(Auth Client Secret) + mainDbCon(Main DB Connection) + end + + subgraph operator [Team Operator] + site(Site Controller) + dbcon(Database Controller) + connect(Connect Controller) + end + + subgraph k8s [Kubernetes Resources] + subgraph storage [Storage] + pv(PersistentVolume) + pvc(PersistentVolumeClaim) + end + subgraph config [Configuration] + cm(ConfigMaps) + dbsecret(DB Password Secret) + secretkey(Secret Key) + end + subgraph workload [Workload] + pubdeploy(Connect Pod) + ing(Ingress) + svc(Service) + end + end + + %% External to Operator + manual --> license + manual --> clientsecret + manual --> mainDbCon + mainDbCon --> dbcon + + %% Operator flow + site --> pv + site --> connect + site --> dbcon + dbcon --> dbsecret + + %% Connect Controller creates resources + connect --> pvc + connect --> cm + connect --> secretkey + connect --> pubdeploy + connect --> ing + connect --> svc + + %% Resources flow to Pod + pv --> pvc + pvc --> pubdeploy + cm --> pubdeploy + dbsecret --> pubdeploy + secretkey --> pubdeploy + license --> pubdeploy + clientsecret --> pubdeploy + + classDef external fill:#FAEEE9,stroke:#ab4d26 + classDef operator fill:#E3F2FD,stroke:#1976D2 + classDef k8s fill:#E8F5E9,stroke:#388E3C + + class manual,license,clientsecret,mainDbCon external + class site,dbcon,connect operator + class pv,pvc,cm,dbsecret,secretkey,pubdeploy,ing,svc k8s +``` + +### Component Descriptions + +#### External Configuration (Coral) + +| Component | Description | +|-----------|-------------| +| **Manual Setup** | One-time configuration performed by the administrator before deployment | +| **License** | Posit Connect license file or activation key, stored in a Kubernetes Secret or AWS Secrets Manager | +| **Auth Client Secret** | OIDC/SAML client credentials for SSO integration (client ID and secret from your IdP) | +| **Main DB Connection** | PostgreSQL connection string for the external database server | + +#### Team Operator (Blue) + +| Component | Description | +|-----------|-------------| +| **Site Controller** | Watches Site CRs and creates product-specific CRs (Connect, Workbench, etc.). Manages shared resources like PersistentVolumes. | +| **Database Controller** | Creates databases and schemas within the PostgreSQL server. Generates credentials and stores them in Secrets. | +| **Connect Controller** | Watches Connect CRs and creates all Kubernetes resources needed to run Connect. | + +#### Kubernetes Resources (Green) + +| Component | Description | +|-----------|-------------| +| **PersistentVolume (PV)** | Cluster-level storage resource representing physical storage (NFS, FSx, Azure NetApp) | +| **PersistentVolumeClaim (PVC)** | Namespace-scoped claim that binds to a PV. Mounted into the Connect pod for content storage. | +| **ConfigMaps** | Connect configuration files (`rstudio-connect.gcfg`) generated from the CR spec | +| **DB Password Secret** | Auto-generated database credentials created by the Database Controller | +| **Secret Key** | Encryption key for Connect's internal data encryption | +| **Connect Pod** | The main Connect server container running the publishing platform | +| **Ingress** | Routes external traffic to the Connect Service based on hostname | +| **Service** | Kubernetes Service providing stable networking for the Connect Pod | + +### Off-Host Execution + +When off-host execution is enabled, Connect runs content (Shiny apps, APIs, reports) in separate Kubernetes Jobs rather than in the main Connect pod. This provides: +- **Resource isolation**: Content processes don't compete with the Connect server +- **Scalability**: Content can scale independently +- **Security**: Content runs with minimal privileges + +See the [Connect Configuration Guide](guides/connect-configuration.md) for details. + +--- + +## Workbench Architecture + +Posit Workbench provides IDE environments (RStudio, VS Code, Jupyter) for data scientists. The operator manages both the main server and user session pods. + +```mermaid +flowchart TB + subgraph external [External Configuration] + manual(Manual Setup) + license(License) + clientsecret(Auth Client Secret) + mainDbCon(Main DB Connection) + end + + subgraph operator [Team Operator] + site(Site Controller) + dbcon(Database Controller) + workbench(Workbench Controller) + end + + subgraph k8s [Kubernetes Resources] + subgraph storage [Storage] + pv(PersistentVolume) + pvc(PersistentVolumeClaim) + homepvc(Home Directory PVC) + end + subgraph config [Configuration] + cm(ConfigMaps) + dbsecret(DB Password Secret) + secretkey(Secret Key) + jobtpl(Job Templates) + end + subgraph workload [Workload] + wbdeploy(Workbench Pod) + ing(Ingress) + svc(Service) + end + end + + subgraph sessions [Session Infrastructure] + launcher(Job Launcher) + sessionpod1(Session Pod) + sessionpod2(Session Pod) + end + + %% External to Operator + manual --> license + manual --> clientsecret + manual --> mainDbCon + mainDbCon --> dbcon + + %% Operator flow + site --> pv + site --> workbench + site --> dbcon + dbcon --> dbsecret + + %% Workbench Controller creates resources + workbench --> pvc + workbench --> homepvc + workbench --> cm + workbench --> secretkey + workbench --> jobtpl + workbench --> wbdeploy + workbench --> ing + workbench --> svc + + %% Resources flow to Pod + pv --> pvc + pvc --> wbdeploy + homepvc --> wbdeploy + cm --> wbdeploy + dbsecret --> wbdeploy + secretkey --> wbdeploy + license --> wbdeploy + clientsecret --> wbdeploy + jobtpl --> wbdeploy + + %% Session management + wbdeploy --> launcher + launcher --> sessionpod1 + launcher --> sessionpod2 + homepvc --> sessionpod1 + homepvc --> sessionpod2 + pvc --> sessionpod1 + pvc --> sessionpod2 + + classDef external fill:#FAEEE9,stroke:#ab4d26 + classDef operator fill:#E3F2FD,stroke:#1976D2 + classDef k8s fill:#E8F5E9,stroke:#388E3C + classDef session fill:#FFF3E0,stroke:#F57C00 + + class manual,license,clientsecret,mainDbCon external + class site,dbcon,workbench operator + class pv,pvc,homepvc,cm,dbsecret,secretkey,jobtpl,wbdeploy,ing,svc k8s + class launcher,sessionpod1,sessionpod2 session +``` + +### Component Descriptions + +#### External Configuration (Coral) + +Same as Connect - see [Connect Architecture](#component-descriptions) above. + +#### Team Operator (Blue) + +| Component | Description | +|-----------|-------------| +| **Site Controller** | Creates the Workbench CR and manages shared storage volumes | +| **Database Controller** | Provisions the Workbench database (DevDB) for session and project metadata | +| **Workbench Controller** | Creates all Kubernetes resources for Workbench including session templates | + +#### Kubernetes Resources (Green) + +| Component | Description | +|-----------|-------------| +| **PersistentVolume / PVC** | Shared project storage accessible by both the server and all session pods | +| **Home Directory PVC** | User home directories, mounted into session pods at `/home/{username}` | +| **ConfigMaps** | Workbench configuration files including `rserver.conf`, `launcher.conf`, and IDE settings | +| **Job Templates** | Kubernetes Job/Service templates used by the Launcher to create session pods | +| **Workbench Pod** | The main Workbench server handling authentication, the web UI, and session management | +| **Ingress / Service** | Network routing for external access to Workbench | + +#### Session Infrastructure (Orange) + +| Component | Description | +|-----------|-------------| +| **Job Launcher** | Component within Workbench that creates Kubernetes Jobs for user sessions | +| **Session Pod** | Individual IDE sessions (RStudio, VS Code, Jupyter) running as Kubernetes Jobs. Each user session gets its own pod with dedicated resources. | + +### Session Lifecycle + +1. User logs into Workbench and requests a new session +2. Job Launcher creates a Kubernetes Job using the configured template +3. Session Pod starts with the selected IDE and mounts user's home directory +4. User works in the session; all files are saved to persistent storage +5. When the session ends, the Job completes and the Pod is cleaned up +6. User's work persists in the Home Directory PVC for the next session + +### Storage Architecture + +Workbench requires careful storage planning: + +| Storage | Purpose | Access Mode | +|---------|---------|-------------| +| **Home Directory PVC** | User home directories with personal files and settings | ReadWriteMany (multiple sessions) | +| **Shared Storage PVC** | Shared project data accessible by all users | ReadWriteMany | +| **Session Scratch** | Temporary storage for session runtime (optional) | ReadWriteOnce per session | + +See the [Workbench Configuration Guide](guides/workbench-configuration.md) for details. + +--- + +## Package Manager Architecture + +Posit Package Manager provides a local repository for R and Python packages. It can mirror public repositories and host private packages. + +```mermaid +flowchart TB + subgraph external [External Configuration] + manual(Manual Setup) + license(License) + clientsecret(Auth Client Secret) + mainDbCon(Main DB Connection) + sshkeys(Git SSH Keys) + end + + subgraph cloudstorage [Cloud Storage] + s3(S3 Bucket) + azfiles(Azure Files) + end + + subgraph operator [Team Operator] + site(Site Controller) + dbcon(Database Controller) + pm(PackageManager Controller) + end + + subgraph k8s [Kubernetes Resources] + subgraph storage [Storage] + pv(PersistentVolume) + pvc(PersistentVolumeClaim) + end + subgraph config [Configuration] + cm(ConfigMaps) + dbsecret(DB Password Secret) + secretkey(Secret Key) + sshsecret(SSH Key Secret) + end + subgraph workload [Workload] + pmdeploy(Package Manager Pod) + ing(Ingress) + svc(Service) + end + end + + %% External to Operator + manual --> license + manual --> clientsecret + manual --> mainDbCon + manual --> sshkeys + mainDbCon --> dbcon + sshkeys --> sshsecret + + %% Operator flow + site --> pv + site --> pm + site --> dbcon + dbcon --> dbsecret + + %% PackageManager Controller creates resources + pm --> pvc + pm --> cm + pm --> secretkey + pm --> pmdeploy + pm --> ing + pm --> svc + + %% Resources flow to Pod + pv --> pvc + pvc --> pmdeploy + cm --> pmdeploy + dbsecret --> pmdeploy + secretkey --> pmdeploy + license --> pmdeploy + clientsecret --> pmdeploy + sshsecret --> pmdeploy + + %% Cloud storage connections + pmdeploy --> s3 + pmdeploy --> azfiles + + classDef external fill:#FAEEE9,stroke:#ab4d26 + classDef operator fill:#E3F2FD,stroke:#1976D2 + classDef k8s fill:#E8F5E9,stroke:#388E3C + classDef cloud fill:#E1F5FE,stroke:#0288D1 + + class manual,license,clientsecret,mainDbCon,sshkeys external + class site,dbcon,pm operator + class pv,pvc,cm,dbsecret,secretkey,sshsecret,pmdeploy,ing,svc k8s + class s3,azfiles cloud +``` + +### Component Descriptions + +#### External Configuration (Coral) + +| Component | Description | +|-----------|-------------| +| **Manual Setup** | One-time configuration by the administrator | +| **License** | Posit Package Manager license | +| **Auth Client Secret** | OIDC/SAML credentials for SSO | +| **Main DB Connection** | PostgreSQL connection for package metadata | +| **Git SSH Keys** | SSH keys for accessing private Git repositories when building packages from source | + +#### Cloud Storage (Light Blue) + +| Component | Description | +|-----------|-------------| +| **S3 Bucket** | AWS S3 storage for package binaries (recommended for AWS deployments) | +| **Azure Files** | Azure file storage for package binaries (recommended for Azure deployments) | + +Package Manager can use either cloud storage backend. The choice typically depends on your cloud provider: +- **AWS**: Use S3 for best performance and cost +- **Azure**: Use Azure Files with the CSI driver +- **On-premises**: Use the local PVC for package storage + +#### Team Operator (Blue) + +| Component | Description | +|-----------|-------------| +| **Site Controller** | Creates the PackageManager CR | +| **Database Controller** | Provisions the Package Manager database with main and metrics schemas | +| **PackageManager Controller** | Creates all Kubernetes resources for Package Manager | + +#### Kubernetes Resources (Green) + +| Component | Description | +|-----------|-------------| +| **PersistentVolume / PVC** | Local storage for temporary files and cache (when not using cloud storage) | +| **ConfigMaps** | Package Manager configuration (`rstudio-pm.gcfg`) | +| **SSH Key Secret** | Mounted SSH keys for Git authentication during package builds | +| **Package Manager Pod** | The main server handling package requests, sync operations, and builds | +| **Ingress / Service** | Network routing for package installation requests | + +### Package Storage Options + +| Option | Best For | Configuration | +|--------|----------|---------------| +| **S3** | AWS deployments, large repositories | `spec.packageManager.s3Bucket` | +| **Azure Files** | Azure deployments | `spec.packageManager.azureFiles` | +| **Local PVC** | Development, small deployments | Default when no cloud storage configured | + +### Git Builder Integration + +Package Manager can build R packages from Git repositories. This requires: + +1. **SSH Keys**: Private keys with access to your Git repositories +2. **Known Hosts**: SSH host key verification (optional but recommended) +3. **Build Resources**: CPU/memory for compilation + +See the [Package Manager Configuration Guide](guides/packagemanager-configuration.md) for details. + +--- + +## Flightdeck Architecture + +Flightdeck is the landing page and navigation hub for Posit Team deployments. It provides a simple dashboard for users to access the various products. + +```mermaid +flowchart TB + subgraph operator [Team Operator] + site(Site Controller) + flightdeck_ctrl(Flightdeck Controller) + end + + subgraph k8s [Kubernetes Resources] + subgraph config [Configuration] + cm(ConfigMap) + end + subgraph workload [Workload] + fddeploy(Flightdeck Pod) + ing(Ingress) + svc(Service) + end + end + + subgraph products [Product Endpoints] + wb_ing(Workbench Ingress) + conn_ing(Connect Ingress) + pm_ing(Package Manager Ingress) + end + + subgraph users [Users] + browser(Web Browser) + end + + %% Operator flow + site --> flightdeck_ctrl + flightdeck_ctrl --> cm + flightdeck_ctrl --> fddeploy + flightdeck_ctrl --> ing + flightdeck_ctrl --> svc + + %% Config to Pod + cm --> fddeploy + + %% User access + browser --> ing + ing --> svc + svc --> fddeploy + + %% Navigation to products + fddeploy -.-> wb_ing + fddeploy -.-> conn_ing + fddeploy -.-> pm_ing + + classDef operator fill:#E3F2FD,stroke:#1976D2 + classDef k8s fill:#E8F5E9,stroke:#388E3C + classDef product fill:#FFF3E0,stroke:#F57C00 + classDef user fill:#F3E5F5,stroke:#7B1FA2 + + class site,flightdeck_ctrl operator + class cm,fddeploy,ing,svc k8s + class wb_ing,conn_ing,pm_ing product + class browser user +``` + +### Component Descriptions + +#### Team Operator (Blue) + +| Component | Description | +|-----------|-------------| +| **Site Controller** | Creates the Flightdeck CR when Flightdeck is enabled in the Site spec | +| **Flightdeck Controller** | Creates all Kubernetes resources needed to run the landing page | + +#### Kubernetes Resources (Green) + +| Component | Description | +|-----------|-------------| +| **ConfigMap** | Configuration for Flightdeck including enabled features and product URLs | +| **Flightdeck Pod** | Static web server serving the landing page HTML/CSS/JS | +| **Ingress** | Routes traffic from the base domain to Flightdeck | +| **Service** | Kubernetes Service for the Flightdeck Pod | + +#### Product Endpoints (Orange) + +| Component | Description | +|-----------|-------------| +| **Workbench Ingress** | Flightdeck links to `workbench.{domain}` | +| **Connect Ingress** | Flightdeck links to `connect.{domain}` | +| **Package Manager Ingress** | Flightdeck links to `packagemanager.{domain}` | + +### Features + +Flightdeck is intentionally simple: + +- **No database**: Serves static content only +- **No authentication**: Relies on product-level authentication +- **Configurable layout**: Shows only enabled products +- **Optional Academy**: Can display a fourth card for Posit Academy + +### Configuration Options + +| Option | Description | +|--------|-------------| +| `spec.flightdeck.replicas` | Number of replicas (default: 1) | +| `spec.flightdeck.featureEnabler.showConfig` | Show configuration page link | +| `spec.flightdeck.featureEnabler.showAcademy` | Show Academy product card | + +--- + +## Chronicle Architecture + +Chronicle is the telemetry and usage tracking service for Posit Team. It collects metrics from Connect and Workbench via sidecar containers. + +```mermaid +flowchart TB + subgraph operator [Team Operator] + site(Site Controller) + chronicle_ctrl(Chronicle Controller) + connect_ctrl(Connect Controller) + workbench_ctrl(Workbench Controller) + end + + subgraph k8s [Kubernetes Resources] + subgraph config [Configuration] + cm(ConfigMap) + apikey(API Key Secret) + end + subgraph workload [Chronicle Service] + chronicledeploy(Chronicle Pod) + svc(Service) + end + end + + subgraph products [Product Pods with Sidecars] + subgraph connectpod [Connect Pod] + connect_main(Connect Container) + connect_sidecar(Chronicle Sidecar) + end + subgraph workbenchpod [Workbench Pod] + wb_main(Workbench Container) + wb_sidecar(Chronicle Sidecar) + end + end + + subgraph storage [Telemetry Storage] + s3(S3 Bucket) + local(Local Volume) + end + + %% Operator flow + site --> chronicle_ctrl + site --> connect_ctrl + site --> workbench_ctrl + chronicle_ctrl --> cm + chronicle_ctrl --> apikey + chronicle_ctrl --> chronicledeploy + chronicle_ctrl --> svc + + %% Sidecar injection + connect_ctrl --> connect_sidecar + workbench_ctrl --> wb_sidecar + + %% API key distribution + apikey --> connect_sidecar + apikey --> wb_sidecar + + %% Metrics flow + connect_main -.->|metrics| connect_sidecar + wb_main -.->|metrics| wb_sidecar + connect_sidecar -->|send| chronicledeploy + wb_sidecar -->|send| chronicledeploy + + %% Storage + chronicledeploy --> s3 + chronicledeploy --> local + + classDef operator fill:#E3F2FD,stroke:#1976D2 + classDef k8s fill:#E8F5E9,stroke:#388E3C + classDef product fill:#FFF3E0,stroke:#F57C00 + classDef storage fill:#E1F5FE,stroke:#0288D1 + classDef sidecar fill:#FFEBEE,stroke:#C62828 + + class site,chronicle_ctrl,connect_ctrl,workbench_ctrl operator + class cm,apikey,chronicledeploy,svc k8s + class connect_main,wb_main product + class connect_sidecar,wb_sidecar sidecar + class s3,local storage +``` + +### Component Descriptions + +#### Team Operator (Blue) + +| Component | Description | +|-----------|-------------| +| **Site Controller** | Creates the Chronicle CR when Chronicle is enabled | +| **Chronicle Controller** | Creates the Chronicle service and manages API keys | +| **Connect Controller** | Injects Chronicle sidecar into Connect pods when enabled | +| **Workbench Controller** | Injects Chronicle sidecar into Workbench pods when enabled | + +#### Kubernetes Resources (Green) + +| Component | Description | +|-----------|-------------| +| **ConfigMap** | Chronicle server configuration | +| **API Key Secret** | Shared secret for sidecar authentication to the Chronicle service | +| **Chronicle Pod** | Central telemetry aggregation service | +| **Service** | Internal endpoint for sidecars to send metrics | + +#### Product Pods (Orange/Red) + +| Component | Description | +|-----------|-------------| +| **Connect/Workbench Container** | Main product container that generates usage metrics | +| **Chronicle Sidecar** | Lightweight agent that collects metrics from the main container and forwards them to the Chronicle service | + +#### Telemetry Storage (Light Blue) + +| Component | Description | +|-----------|-------------| +| **S3 Bucket** | Cloud storage for telemetry data (recommended for production) | +| **Local Volume** | Local storage option for development or air-gapped environments | + +### Data Flow + +1. **Metrics Generation**: Connect and Workbench generate usage metrics (content views, session starts, etc.) +2. **Sidecar Collection**: Chronicle sidecars collect metrics from the product containers +3. **Aggregation**: Sidecars send data to the central Chronicle service +4. **Storage**: Chronicle persists data to S3 or local storage +5. **Analysis**: Data can be queried for usage reports and analytics + +### Sidecar Injection + +The Chronicle sidecar is automatically injected into product pods when: +- Chronicle is enabled in the Site spec (`spec.chronicle.enabled: true`) +- The product has Chronicle integration enabled + +The sidecar: +- Runs as a secondary container in the same pod +- Shares the pod's network namespace (can reach localhost) +- Uses the API key secret for authentication +- Has minimal resource requirements (~50Mi memory) + +### Configuration Options + +| Option | Description | +|--------|-------------| +| `spec.chronicle.enabled` | Enable Chronicle telemetry collection | +| `spec.chronicle.image` | Chronicle agent container image | +| `spec.chronicle.s3Bucket` | S3 bucket for telemetry storage | +| `spec.chronicle.localStorage` | Use local volume instead of S3 | + +--- + +## Related Documentation + +- [Site Management Guide](guides/product-team-site-management.md) - Managing Site CRs +- [Connect Configuration](guides/connect-configuration.md) - Detailed Connect setup +- [Workbench Configuration](guides/workbench-configuration.md) - Detailed Workbench setup +- [Package Manager Configuration](guides/packagemanager-configuration.md) - Detailed Package Manager setup +- [API Reference](api-reference.md) - Complete CRD field reference diff --git a/docs/guides/adding-config-options.md b/docs/guides/adding-config-options.md new file mode 100644 index 00000000..b1cf8f19 --- /dev/null +++ b/docs/guides/adding-config-options.md @@ -0,0 +1,529 @@ +# Adding Configuration Options to Team Operator + +This guide walks through the process of adding new configuration options to Posit Team products managed by the Team Operator. It covers the complete flow from Site CRD to product-specific configuration. + +## Configuration Architecture Overview + +The Team Operator uses a hierarchical configuration model: + +``` +Site CRD (user-facing) + | + v +Internal{Product}Spec (in site_types.go) + | + v +site_controller_{product}.go (propagation logic) + | + v +Product CR (Connect, Workbench, PackageManager, etc.) + | + v +Product Controller (generates actual config files) +``` + +### Key Concepts + +1. **Site CRD**: The primary user-facing resource. Users configure their entire Posit Team deployment through a single Site resource. + +2. **Internal{Product}Spec**: Nested structs within SiteSpec that contain product-specific configuration at the Site level. + +3. **Product CRs**: Individual Custom Resources (Connect, Workbench, etc.) created by the Site controller. These are implementation details users typically don't interact with directly. + +4. **Propagation**: The Site controller maps Site-level configuration to the appropriate Product CR fields. + +## Step-by-Step: Adding a New Config Option + +### Prerequisites + +Before adding a config option, gather the following: + +| Item | Description | Example | +|------|-------------|---------| +| Product | Which product does this config affect? | Workbench | +| Site Field Name | Go-style field name (PascalCase) | `MaxConnections` | +| Product Config Path | The actual config key the product expects | `Scheduler.MaxConnections` | +| Go Type | string, int, bool, *int, struct, etc. | `*int` | +| Description | What does this config control? | "Maximum concurrent connections" | +| Default Value | What's the default if not specified? | `100` | + +### Step 1: Add Field to Site Types + +**File**: `api/core/v1beta1/site_types.go` + +Find the appropriate `Internal{Product}Spec` struct and add your field. + +```go +type InternalConnectSpec struct { + // ... existing fields ... + + // MaxConnections sets the maximum number of concurrent connections + // Maps to product config: Scheduler.MaxConnections + // +optional + MaxConnections *int `json:"maxConnections,omitempty"` +} +``` + +#### Field Documentation Pattern + +Always include: +1. A description of what the field controls +2. A comment showing the product config path it maps to +3. The `+optional` marker for optional fields +4. Kubebuilder validation markers if applicable + +```go +// Description of the field and what it controls +// Maps to product config: Section.ConfigKey +// +kubebuilder:validation:Minimum=1 +// +kubebuilder:validation:Maximum=1000 +// +optional +FieldName Type `json:"fieldName,omitempty"` +``` + +### Step 2: Add Field to Product Types (If Needed) + +**File**: `api/core/v1beta1/{product}_types.go` + +If the product needs the config in its spec (for the product controller to use), add it to the appropriate config struct. + +```go +// In connect_types.go +type ConnectSchedulerConfig struct { + // ... existing fields ... + + // MaxConnections sets the maximum concurrent connections + // +optional + MaxConnections int `json:"maxConnections,omitempty"` +} +``` + +### Step 3: Add Propagation Logic + +**File**: `internal/controller/core/site_controller_{product}.go` + +In the `reconcile{Product}` function, add logic to map the Site field to the Product CR. + +#### Pattern 1: Simple Value Propagation + +```go +func (r *SiteReconciler) reconcileConnect( + ctx context.Context, + req controllerruntime.Request, + site *v1beta1.Site, + // ... other params +) error { + // ... existing code ... + + targetConnect := v1beta1.Connect{ + // ... existing fields ... + Spec: v1beta1.ConnectSpec{ + Config: v1beta1.ConnectConfig{ + Scheduler: &v1beta1.ConnectSchedulerConfig{ + // ... existing fields ... + }, + }, + }, + } + + // Propagate MaxConnections if set + if site.Spec.Connect.MaxConnections != nil { + targetConnect.Spec.Config.Scheduler.MaxConnections = *site.Spec.Connect.MaxConnections + } + + // ... rest of function +} +``` + +#### Pattern 2: Conditional/Nested Propagation + +For fields that require nil-safety or conditional logic: + +```go +// Ensure parent structs exist before setting +if site.Spec.Workbench.ExperimentalFeatures != nil { + if site.Spec.Workbench.ExperimentalFeatures.WwwThreadPoolSize != nil { + threadPoolSize = *site.Spec.Workbench.ExperimentalFeatures.WwwThreadPoolSize + } +} + +// Apply to target +targetWorkbench.Spec.Config.RServer.WwwThreadPoolSize = threadPoolSize +``` + +#### Pattern 3: Struct Propagation + +For nested configuration objects: + +```go +// Propagate entire settings struct +if site.Spec.Connect.GPUSettings != nil { + if site.Spec.Connect.GPUSettings.NvidiaGPULimit > 0 { + targetConnect.Spec.Config.Scheduler.NvidiaGPULimit = site.Spec.Connect.GPUSettings.NvidiaGPULimit + } + if site.Spec.Connect.GPUSettings.MaxNvidiaGPULimit > 0 { + targetConnect.Spec.Config.Scheduler.MaxNvidiaGPULimit = site.Spec.Connect.GPUSettings.MaxNvidiaGPULimit + } +} +``` + +### Step 4: Update Product Controller (If Needed) + +**File**: `internal/controller/core/{product}_controller.go` + +If the product controller needs to use the new config to generate configuration files, update the relevant config generation logic. + +```go +// Example: Adding to a config file generation +func (r *ConnectReconciler) generateConfig(connect *v1beta1.Connect) string { + cfg := connect.Spec.Config + + // ... existing config generation ... + + if cfg.Scheduler.MaxConnections > 0 { + // Add to generated config + } +} +``` + +### Step 5: Add Tests + +**File**: `internal/controller/core/site_test.go` + +Add tests to verify propagation works correctly. + +```go +func TestSiteReconciler_MaxConnections(t *testing.T) { + siteName := "max-connections-site" + siteNamespace := "posit-team" + site := defaultSite(siteName) + + maxConn := 500 + site.Spec.Connect.MaxConnections = &maxConn + + cli, _, err := runFakeSiteReconciler(t, siteNamespace, siteName, site) + assert.Nil(t, err) + + testConnect := getConnect(t, cli, siteNamespace, siteName) + + assert.Equal(t, 500, testConnect.Spec.Config.Scheduler.MaxConnections) +} + +func TestSiteReconciler_MaxConnections_Default(t *testing.T) { + siteName := "default-connections-site" + siteNamespace := "posit-team" + site := defaultSite(siteName) + // Don't set MaxConnections - test default behavior + + cli, _, err := runFakeSiteReconciler(t, siteNamespace, siteName, site) + assert.Nil(t, err) + + testConnect := getConnect(t, cli, siteNamespace, siteName) + + // Verify default value or zero value behavior + assert.Equal(t, 0, testConnect.Spec.Config.Scheduler.MaxConnections) +} +``` + +### Step 6: Regenerate Code + +After modifying types, regenerate the CRD manifests: + +```bash +just generate +just manifests +``` + +## Common Type Patterns + +### Optional Integer (Pointer) + +Use pointers for optional numeric values where zero is a valid setting: + +```go +// MaxWorkers sets the maximum number of workers +// Maps to product config: Scheduler.MaxWorkers +// +optional +MaxWorkers *int `json:"maxWorkers,omitempty"` +``` + +Propagation: +```go +if site.Spec.Product.MaxWorkers != nil { + target.Spec.Config.Scheduler.MaxWorkers = *site.Spec.Product.MaxWorkers +} +``` + +### Enum (String with Validation) + +```go +// LogLevel sets the logging verbosity +// Maps to product config: Logging.All.LogLevel +// +kubebuilder:validation:Enum=debug;info;warn;error +// +optional +LogLevel string `json:"logLevel,omitempty"` +``` + +### Boolean with Default False + +```go +// EnableFeatureX enables experimental feature X +// Maps to product config: Features.ExperimentalX +// +optional +EnableFeatureX bool `json:"enableFeatureX,omitempty"` +``` + +**Note**: With `omitempty`, false values are omitted from JSON. Only propagate when explicitly true: + +```go +if site.Spec.Product.EnableFeatureX { + target.Spec.Config.Features.ExperimentalX = true +} +``` + +### Nested Struct + +```go +// GPUSettings configures GPU resources for sessions +// +optional +GPUSettings *GPUSettings `json:"gpuSettings,omitempty"` + +type GPUSettings struct { + // NvidiaGPULimit sets the default NVIDIA GPU limit + // Maps to product config: Scheduler.NvidiaGPULimit + // +optional + NvidiaGPULimit int `json:"nvidiaGPULimit,omitempty"` + + // MaxNvidiaGPULimit sets the maximum NVIDIA GPU limit + // Maps to product config: Scheduler.MaxNvidiaGPULimit + // +optional + MaxNvidiaGPULimit int `json:"maxNvidiaGPULimit,omitempty"` +} +``` + +### String Slice + +```go +// AdminGroups specifies groups with admin access +// +optional +AdminGroups []string `json:"adminGroups,omitempty"` +``` + +Propagation (joining for product config): +```go +adminGroup := "default-admin" +if len(site.Spec.Workbench.AdminGroups) > 0 { + adminGroup = strings.Join(site.Spec.Workbench.AdminGroups, ",") +} +targetWorkbench.Spec.Config.RServer.AdminGroup = adminGroup +``` + +### Map of Strings + +```go +// AddEnv adds arbitrary environment variables +// +optional +AddEnv map[string]string `json:"addEnv,omitempty"` +``` + +## File Reference + +| Product | Site Types Struct | Product Types File | Controller | +|---------|-------------------|-------------------|------------| +| Connect | `InternalConnectSpec` | `connect_types.go` | `site_controller_connect.go` | +| Workbench | `InternalWorkbenchSpec` | `workbench_types.go` | `site_controller_workbench.go` | +| Package Manager | `InternalPackageManagerSpec` | `packagemanager_types.go` | `site_controller_package_manager.go` | +| Chronicle | `InternalChronicleSpec` | `chronicle_types.go` | `site_controller_chronicle.go` | +| Keycloak | `InternalKeycloakSpec` | `keycloak_types.go` | `site_controller_keycloak.go` | +| Flightdeck | `InternalFlightdeckSpec` | `flightdeck_types.go` | `site_controller_flightdeck.go` | + +## Validation Checklist + +Before submitting your PR, verify: + +- [ ] Field has correct JSON tag (camelCase) +- [ ] Field has descriptive comment including product config mapping +- [ ] Field has `+optional` marker if optional +- [ ] Field has kubebuilder validation markers if applicable +- [ ] Propagation logic handles nil/zero values correctly +- [ ] Propagation respects parent struct nil-safety +- [ ] Test covers the happy path (value is propagated) +- [ ] Test covers the default case (value not set) +- [ ] `just generate` and `just manifests` run without errors +- [ ] `just test` passes +- [ ] Product config name matches product documentation + +## Common Pitfalls + +### 1. Forgetting Nil Checks + +**Wrong**: +```go +// Panic if ExperimentalFeatures is nil +threadPoolSize := *site.Spec.Workbench.ExperimentalFeatures.WwwThreadPoolSize +``` + +**Right**: +```go +if site.Spec.Workbench.ExperimentalFeatures != nil && + site.Spec.Workbench.ExperimentalFeatures.WwwThreadPoolSize != nil { + threadPoolSize = *site.Spec.Workbench.ExperimentalFeatures.WwwThreadPoolSize +} +``` + +### 2. Wrong JSON Tag Case + +**Wrong**: +```go +MaxConnections *int `json:"MaxConnections,omitempty"` // PascalCase +``` + +**Right**: +```go +MaxConnections *int `json:"maxConnections,omitempty"` // camelCase +``` + +### 3. Missing omitempty + +Including `omitempty` prevents zero values from appearing in serialized YAML: + +```go +// Without omitempty, "enabled: false" appears in output +Enabled bool `json:"enabled"` + +// With omitempty, false is omitted entirely +Enabled bool `json:"enabled,omitempty"` +``` + +### 4. Product Config Name Mismatch + +Always verify the product config path matches what the product expects: +- Check product admin guides +- Look at existing examples in the codebase +- Test with the actual product + +### 5. Not Regenerating CRDs + +After modifying types, always run: +```bash +just generate +just manifests +``` + +### 6. Overwriting Existing Config + +Be careful not to overwrite configuration that may have been set elsewhere: + +**Wrong**: +```go +// Overwrites any existing Scheduler config +targetConnect.Spec.Config.Scheduler = &v1beta1.ConnectSchedulerConfig{ + MaxConnections: maxConn, +} +``` + +**Right**: +```go +// Preserves existing config, only sets the new field +if targetConnect.Spec.Config.Scheduler == nil { + targetConnect.Spec.Config.Scheduler = &v1beta1.ConnectSchedulerConfig{} +} +targetConnect.Spec.Config.Scheduler.MaxConnections = maxConn +``` + +## Finding Product Config Names + +To determine the correct product config path: + +1. **Check product documentation**: Admin guides typically list all configuration options +2. **Look at existing examples**: Search `site_controller_{product}.go` for similar options +3. **Check Helm charts**: Product Helm chart `values.yaml` files show config structure +4. **Ask the product team**: When uncertain, verify with the product team + +## Example: Complete Walkthrough + +Let's add a `SessionTimeout` config to Connect that maps to `Scheduler.SessionTimeout`. + +### 1. Add to site_types.go + +```go +type InternalConnectSpec struct { + // ... existing fields ... + + // SessionTimeout sets the session timeout in seconds + // Maps to product config: Scheduler.SessionTimeout + // +kubebuilder:validation:Minimum=60 + // +optional + SessionTimeout *int `json:"sessionTimeout,omitempty"` +} +``` + +### 2. Add to connect_types.go + +```go +type ConnectSchedulerConfig struct { + // ... existing fields ... + + // SessionTimeout in seconds + SessionTimeout int `json:"sessionTimeout,omitempty"` +} +``` + +### 3. Add propagation in site_controller_connect.go + +```go +func (r *SiteReconciler) reconcileConnect(...) error { + // ... existing setup ... + + targetConnect := v1beta1.Connect{ + Spec: v1beta1.ConnectSpec{ + Config: v1beta1.ConnectConfig{ + Scheduler: &v1beta1.ConnectSchedulerConfig{ + // ... existing fields ... + }, + }, + }, + } + + // Add after targetConnect initialization + if site.Spec.Connect.SessionTimeout != nil { + targetConnect.Spec.Config.Scheduler.SessionTimeout = *site.Spec.Connect.SessionTimeout + } + + // ... rest of function ... +} +``` + +### 4. Add test in site_test.go + +```go +func TestSiteReconciler_ConnectSessionTimeout(t *testing.T) { + siteName := "session-timeout-site" + siteNamespace := "posit-team" + site := defaultSite(siteName) + + timeout := 300 + site.Spec.Connect.SessionTimeout = &timeout + + cli, _, err := runFakeSiteReconciler(t, siteNamespace, siteName, site) + assert.Nil(t, err) + + testConnect := getConnect(t, cli, siteNamespace, siteName) + + assert.Equal(t, 300, testConnect.Spec.Config.Scheduler.SessionTimeout) +} +``` + +### 5. Regenerate and test + +```bash +just generate +just manifests +just test +``` + +## Related Documentation + +- [Kubernetes Custom Resource Definitions](https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/) +- [Kubebuilder Markers](https://book.kubebuilder.io/reference/markers.html) +- [Team Operator README](/README.md) diff --git a/docs/guides/authentication-setup.md b/docs/guides/authentication-setup.md new file mode 100644 index 00000000..9506e339 --- /dev/null +++ b/docs/guides/authentication-setup.md @@ -0,0 +1,774 @@ +# Authentication Setup Guide + +This guide provides comprehensive documentation for configuring authentication in Posit Team Operator. Team Operator supports multiple authentication methods for both Posit Connect and Posit Workbench. + +## Table of Contents + +1. [Overview](#overview) +2. [Authentication Types](#authentication-types) +3. [OIDC Configuration](#oidc-configuration) +4. [SAML Configuration](#saml-configuration) +5. [Password Authentication](#password-authentication) +6. [Role-Based Access Control](#role-based-access-control) +7. [Keycloak Integration](#keycloak-integration) +8. [Secrets Management](#secrets-management) +9. [Troubleshooting](#troubleshooting) + +## Overview + +Team Operator uses the `AuthSpec` structure to configure authentication for Posit products. Authentication is configured per-product (Connect and Workbench) through the `auth` field in each product's spec. + +### AuthSpec Structure + +The complete `AuthSpec` type definition: + +```go +type AuthSpec struct { + Type AuthType `json:"type,omitempty"` + ClientId string `json:"clientId,omitempty"` + Issuer string `json:"issuer,omitempty"` + Groups bool `json:"groups,omitempty"` + UsernameClaim string `json:"usernameClaim,omitempty"` + EmailClaim string `json:"emailClaim,omitempty"` + UniqueIdClaim string `json:"uniqueIdClaim,omitempty"` + GroupsClaim string `json:"groupsClaim,omitempty"` + DisableGroupsClaim bool `json:"disableGroupsClaim,omitempty"` + SamlMetadataUrl string `json:"samlMetadataUrl,omitempty"` + SamlIdPAttributeProfile string `json:"samlIdPAttributeProfile,omitempty"` + SamlUsernameAttribute string `json:"samlUsernameAttribute,omitempty"` + SamlFirstNameAttribute string `json:"samlFirstNameAttribute,omitempty"` + SamlLastNameAttribute string `json:"samlLastNameAttribute,omitempty"` + SamlEmailAttribute string `json:"samlEmailAttribute,omitempty"` + Scopes []string `json:"scopes,omitempty"` + ViewerRoleMapping []string `json:"viewerRoleMapping,omitempty"` + PublisherRoleMapping []string `json:"publisherRoleMapping,omitempty"` + AdministratorRoleMapping []string `json:"administratorRoleMapping,omitempty"` +} +``` + +## Authentication Types + +Team Operator supports three authentication types: + +| Type | Value | Use Case | +|------|-------|----------| +| Password | `password` | Development, simple deployments | +| OIDC | `oidc` | Enterprise SSO with OAuth2/OpenID Connect | +| SAML | `saml` | Enterprise SSO with SAML 2.0 | + +## OIDC Configuration + +OpenID Connect (OIDC) is the recommended authentication method for enterprise deployments. + +### Basic OIDC Configuration + +```yaml +apiVersion: core.posit.team/v1beta1 +kind: Site +metadata: + name: production + namespace: posit-team +spec: + connect: + auth: + type: "oidc" + clientId: "connect-client-id" + issuer: "https://idp.example.com" +``` + +### Required IdP Settings + +Before configuring OIDC in Team Operator, you must configure your Identity Provider: + +1. **Create an OAuth2/OIDC Application** in your IdP +2. **Configure Redirect URIs**: + - Connect: `https://connect.example.com/__login__/callback` + - Workbench: `https://workbench.example.com/oidc/callback` +3. **Note the Client ID** (provided in the spec) +4. **Generate a Client Secret** (stored in secrets) + +### Client Secret Configuration + +The client secret must be stored in your secrets provider: + +**For Kubernetes secrets:** +- Connect: `pub-client-secret` key +- Workbench: `dev-client-secret` key + +**For AWS Secrets Manager:** +- Connect: `pub-client-secret` in your vault +- Workbench: `dev-client-secret` in your vault + +### Claims Mapping + +Configure how OIDC claims map to user attributes: + +```yaml +spec: + connect: + auth: + type: "oidc" + clientId: "connect-client-id" + issuer: "https://idp.example.com" + usernameClaim: "preferred_username" # Claim for username + emailClaim: "email" # Claim for email + uniqueIdClaim: "sub" # Claim for unique identifier +``` + +**Default behavior:** +- If `emailClaim` is set but `uniqueIdClaim` is not, the email claim is used for unique ID +- Default `uniqueIdClaim` is `email` + +### Group Claim Configuration + +Enable group synchronization from your IdP: + +```yaml +spec: + connect: + auth: + type: "oidc" + clientId: "connect-client-id" + issuer: "https://idp.example.com" + groups: true # Enable group auto-provisioning + groupsClaim: "groups" # Claim containing group membership + scopes: + - "openid" + - "profile" + - "email" + - "groups" # Scope to request groups +``` + +**Disabling Groups Claim:** + +Some IdPs do not support a groups claim. To explicitly disable it: + +```yaml +spec: + connect: + auth: + type: "oidc" + clientId: "connect-client-id" + issuer: "https://idp.example.com" + groups: true # Still auto-provision groups + disableGroupsClaim: true # But don't try to read from token +``` + +### Custom Scopes + +Override the default OIDC scopes: + +```yaml +spec: + connect: + auth: + type: "oidc" + clientId: "connect-client-id" + issuer: "https://idp.example.com" + scopes: + - "openid" + - "profile" + - "email" + - "offline_access" +``` + +### OIDC Examples by IdP + +#### Okta + +```yaml +spec: + connect: + auth: + type: "oidc" + clientId: "0oaxxxxxxxxx" + issuer: "https://your-org.okta.com" + usernameClaim: "preferred_username" + emailClaim: "email" + groups: true + groupsClaim: "groups" + scopes: + - "openid" + - "profile" + - "email" + - "groups" +``` + +#### Azure AD / Entra ID + +```yaml +spec: + connect: + auth: + type: "oidc" + clientId: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + issuer: "https://login.microsoftonline.com/{tenant-id}/v2.0" + usernameClaim: "preferred_username" + emailClaim: "email" + uniqueIdClaim: "oid" # Azure object ID + groups: true + groupsClaim: "groups" + scopes: + - "openid" + - "profile" + - "email" +``` + +> **Note:** Azure AD requires specific application permissions to include group claims. Configure "Groups claim" in the Token configuration. + +#### Auth0 + +```yaml +spec: + connect: + auth: + type: "oidc" + clientId: "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" + issuer: "https://your-tenant.auth0.com/" + usernameClaim: "email" + emailClaim: "email" + groups: true + groupsClaim: "https://your-namespace/groups" # Custom claim namespace + scopes: + - "openid" + - "profile" + - "email" +``` + +#### Keycloak + +```yaml +spec: + connect: + auth: + type: "oidc" + clientId: "connect" + issuer: "https://keycloak.example.com/realms/posit" + usernameClaim: "preferred_username" + emailClaim: "email" + groups: true + groupsClaim: "groups" + scopes: + - "openid" + - "profile" + - "email" + - "groups" +``` + +## SAML Configuration + +SAML 2.0 authentication is supported for enterprise environments using SAML-based IdPs. + +### Basic SAML Configuration + +```yaml +spec: + connect: + auth: + type: "saml" + samlMetadataUrl: "https://idp.example.com/saml/metadata" +``` + +> **Required:** `samlMetadataUrl` must be set for SAML authentication. + +### Attribute Profiles + +Team Operator supports two approaches for SAML attribute mapping: + +#### 1. Using IdP Attribute Profiles + +Use a predefined attribute profile that matches your IdP: + +```yaml +spec: + connect: + auth: + type: "saml" + samlMetadataUrl: "https://idp.example.com/saml/metadata" + samlIdPAttributeProfile: "azure" # Options: default, azure, etc. +``` + +Built-in profiles: +- `default` - Standard SAML attributes +- `azure` - Microsoft Azure AD attributes + +#### 2. Custom Attribute Mapping + +Specify individual attribute URIs for complete control: + +```yaml +spec: + connect: + auth: + type: "saml" + samlMetadataUrl: "https://idp.example.com/saml/metadata" + samlUsernameAttribute: "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/name" + samlFirstNameAttribute: "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/givenname" + samlLastNameAttribute: "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/surname" + samlEmailAttribute: "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/emailaddress" +``` + +> **Important:** `samlIdPAttributeProfile` and individual attribute mappings are mutually exclusive. The operator will return an error if both are specified. + +### SAML Service Provider (SP) Configuration + +Your IdP needs to be configured with the following Service Provider details: + +**Connect:** +- Entity ID: `https://connect.example.com/__login__` +- ACS URL: `https://connect.example.com/__login__/callback` + +**Workbench:** +- Entity ID: `https://workbench.example.com/saml` +- ACS URL: `https://workbench.example.com/saml/acs` + +### SAML Examples by IdP + +#### Azure AD / Entra ID + +```yaml +spec: + connect: + auth: + type: "saml" + samlMetadataUrl: "https://login.microsoftonline.com/{tenant-id}/federationmetadata/2007-06/federationmetadata.xml" + samlIdPAttributeProfile: "azure" +``` + +Or with custom attributes: + +```yaml +spec: + connect: + auth: + type: "saml" + samlMetadataUrl: "https://login.microsoftonline.com/{tenant-id}/federationmetadata/2007-06/federationmetadata.xml" + samlUsernameAttribute: "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/upn" + samlEmailAttribute: "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/emailaddress" + samlFirstNameAttribute: "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/givenname" + samlLastNameAttribute: "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/surname" +``` + +#### Okta + +```yaml +spec: + connect: + auth: + type: "saml" + samlMetadataUrl: "https://your-org.okta.com/app/xxxxxxxx/sso/saml/metadata" + samlUsernameAttribute: "NameID" + samlEmailAttribute: "email" + samlFirstNameAttribute: "firstName" + samlLastNameAttribute: "lastName" +``` + +### Workbench SAML Configuration + +Workbench SAML uses the `usernameClaim` field for the username attribute: + +```yaml +spec: + workbench: + auth: + type: "saml" + samlMetadataUrl: "https://idp.example.com/saml/metadata" + usernameClaim: "email" # Maps to auth-saml-sp-attribute-username +``` + +## Password Authentication + +Password authentication is the simplest authentication method, suitable for development environments. + +### Configuration + +```yaml +spec: + connect: + auth: + type: "password" + workbench: + auth: + type: "password" +``` + +### When to Use Password Authentication + +- Development and testing environments +- Quick proof-of-concept deployments +- Environments without enterprise SSO requirements + +### Security Considerations + +- Password authentication stores credentials in the product's database +- Not recommended for production environments with security requirements +- Does not provide SSO capabilities +- User management must be done within each product + +## Role-Based Access Control + +Team Operator supports automatic role mapping based on group membership from your IdP. + +### Connect Role Mappings + +Connect supports three roles: +- **Viewer** - Can view published content +- **Publisher** - Can publish and manage content +- **Administrator** - Full administrative access + +Configure role mappings: + +```yaml +spec: + connect: + auth: + type: "oidc" + clientId: "connect-client-id" + issuer: "https://idp.example.com" + groups: true + groupsClaim: "groups" + viewerRoleMapping: + - "connect-viewers" + - "readonly-users" + publisherRoleMapping: + - "connect-publishers" + - "data-scientists" + administratorRoleMapping: + - "connect-admins" + - "platform-admins" +``` + +### How Role Mapping Works + +1. When a user logs in, Connect reads their group membership from the `groupsClaim` +2. The user is assigned the highest matching role: + - If any group matches `administratorRoleMapping` -> Administrator + - Else if any group matches `publisherRoleMapping` -> Publisher + - Else if any group matches `viewerRoleMapping` -> Viewer + - Else -> Default role (configured separately) + +### Role Mapping with SAML + +Role mappings work the same way with SAML authentication, provided your IdP sends group membership in the SAML assertion. + +### Workbench Role Mappings + +Workbench uses admin groups for administrative access: + +```yaml +spec: + workbench: + # Admin groups have access to the administrative dashboard + adminGroups: + - "workbench-admin" + - "platform-admins" + # Superuser groups have elevated administrative privileges + adminSuperuserGroups: + - "workbench-superusers" +``` + +### Default User Role + +Set the default role for users who don't match any role mapping: + +```yaml +spec: + connect: + config: + Authorization: + DefaultUserRole: "viewer" # Options: viewer, publisher, administrator +``` + +## Keycloak Integration + +Team Operator can deploy and manage a Keycloak instance for authentication. + +### Enabling Keycloak + +```yaml +apiVersion: core.posit.team/v1beta1 +kind: Site +metadata: + name: production + namespace: posit-team +spec: + keycloak: + enabled: true + image: "quay.io/keycloak/keycloak:latest" + imagePullPolicy: IfNotPresent +``` + +### Keycloak Features + +When enabled, Team Operator: +- Deploys a Keycloak instance in the namespace +- Creates a PostgreSQL database for Keycloak +- Configures ingress routing to `key.` +- Sets up necessary service accounts and RBAC + +### Using Keycloak with Products + +Configure products to use the deployed Keycloak: + +```yaml +spec: + keycloak: + enabled: true + connect: + auth: + type: "oidc" + clientId: "connect" + issuer: "https://key.example.com/realms/posit" + groups: true + groupsClaim: "groups" +``` + +### Keycloak Realm Configuration + +After Keycloak is deployed, you'll need to: +1. Access Keycloak admin console at `https://key.` +2. Create a realm (e.g., "posit") +3. Create clients for each product +4. Configure client credentials and redirect URIs +5. Set up user federation if needed (LDAP, AD, etc.) + +## Secrets Management + +Authentication requires secrets to be properly configured in your secrets provider. + +### Kubernetes Secrets + +For `secret.type: kubernetes`, create a secret with the required keys: + +```yaml +apiVersion: v1 +kind: Secret +metadata: + name: site-secrets + namespace: posit-team +type: Opaque +stringData: + # Connect OIDC + pub-client-secret: "your-connect-client-secret" + + # Workbench OIDC + dev-client-secret: "your-workbench-client-secret" + dev-admin-token: "generated-admin-token" + dev-user-token: "generated-user-token" +``` + +### AWS Secrets Manager + +For `secret.type: aws`, store secrets in AWS Secrets Manager: + +| Secret Key | Description | +|------------|-------------| +| `pub-client-secret` | Connect OIDC client secret | +| `dev-client-secret` | Workbench OIDC client secret | +| `dev-admin-token` | Workbench admin authentication token | +| `dev-user-token` | Workbench user authentication token | + +### Secret Structure Reference + +| Product | Auth Type | Secret Key | Purpose | +|---------|-----------|------------|---------| +| Connect | OIDC | `pub-client-secret` | OAuth2 client secret | +| Workbench | OIDC | `dev-client-secret` | OAuth2 client secret | +| Workbench | OIDC | `dev-admin-token` | Admin API token | +| Workbench | OIDC | `dev-user-token` | User API token | + +## Troubleshooting + +### Common OIDC Issues + +#### 1. "Invalid redirect URI" Error + +**Cause:** The redirect URI in the IdP doesn't match what the product sends. + +**Solution:** Verify redirect URIs are configured exactly: +- Connect: `https:///__login__/callback` +- Workbench: `https:///oidc/callback` + +#### 2. Groups Not Syncing + +**Cause:** Groups claim not configured or not included in token. + +**Debug steps:** +1. Check if `groups: true` is set +2. Verify `groupsClaim` matches what your IdP sends +3. Ensure the `groups` scope is requested +4. Check if your IdP requires special configuration for group claims + +**Enable OIDC logging for Connect:** +```yaml +spec: + connect: + debug: true # Enables OAuth2 logging +``` + +#### 3. User Identity Issues + +**Cause:** Claims mapping doesn't match IdP token. + +**Solution:** Verify your IdP token contains the expected claims: +```yaml +spec: + connect: + auth: + usernameClaim: "preferred_username" # Must exist in token + emailClaim: "email" # Must exist in token +``` + +### Common SAML Issues + +#### 1. "Metadata URL Not Accessible" + +**Cause:** The SAML metadata URL is unreachable from the cluster. + +**Solutions:** +- Ensure the metadata URL is accessible from pods +- Check network policies allow outbound connections +- Verify DNS resolution works + +#### 2. "IdPAttributeProfile Cannot Be Specified Together..." + +**Cause:** Both `samlIdPAttributeProfile` and individual attributes are set. + +**Solution:** Use one approach: +```yaml +# Option 1: Use profile +samlIdPAttributeProfile: "azure" + +# Option 2: Use individual attributes +samlUsernameAttribute: "..." +samlEmailAttribute: "..." +``` + +#### 3. Attribute Mapping Not Working + +**Debug steps:** +1. Check the SAML assertion from your IdP +2. Verify attribute names match exactly (case-sensitive) +3. Use full URIs for standard attributes: + ```yaml + samlUsernameAttribute: "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/upn" + ``` + +### Debugging Token Claims + +To debug OIDC token claims: + +1. **Enable Debug Logging:** + ```yaml + spec: + connect: + debug: true + ``` + +2. **Check Pod Logs:** + ```bash + kubectl logs -n posit-team deploy/-connect -f + ``` + +3. **Decode JWT Tokens:** + Use [jwt.io](https://jwt.io) to inspect tokens and verify claims. + +### Group Membership Issues + +If users aren't getting the correct roles: + +1. **Verify group claim is present:** + - Check the `groupsClaim` field matches your IdP + - Some IdPs use nested claims (e.g., `realm_access.roles`) + +2. **Check group name matching:** + - Group names in role mappings must match exactly + - Group names are case-sensitive + +3. **Verify IdP configuration:** + - Ensure groups are included in the token + - Check token size limits (large group lists may be truncated) + +### Workbench-Specific Issues + +#### OIDC Callback URL Issues + +Workbench may include port numbers in redirect URIs. The operator sets a header to prevent this: +```yaml +X-Rstudio-Request: https:// +``` + +If you see port 443 in redirect URIs, ensure Traefik middleware is correctly applied. + +#### User Provisioning + +For Workbench with OIDC/SAML: +```yaml +spec: + workbench: + createUsersAutomatically: true # Create system users on first login +``` + +## Complete Example + +A complete Site configuration with OIDC authentication: + +```yaml +apiVersion: core.posit.team/v1beta1 +kind: Site +metadata: + name: production + namespace: posit-team +spec: + domain: posit.example.com + + secret: + type: "kubernetes" + vaultName: "production-secrets" + + connect: + image: ghcr.io/rstudio/rstudio-connect:ubuntu22-2024.10.0 + auth: + type: "oidc" + clientId: "connect-production" + issuer: "https://login.microsoftonline.com/tenant-id/v2.0" + usernameClaim: "preferred_username" + emailClaim: "email" + uniqueIdClaim: "oid" + groups: true + groupsClaim: "groups" + scopes: + - "openid" + - "profile" + - "email" + viewerRoleMapping: + - "Connect-Viewers" + publisherRoleMapping: + - "Connect-Publishers" + - "Data-Scientists" + administratorRoleMapping: + - "Connect-Admins" + + workbench: + image: ghcr.io/rstudio/rstudio-workbench:jammy-2024.12.0 + createUsersAutomatically: true + auth: + type: "oidc" + clientId: "workbench-production" + issuer: "https://login.microsoftonline.com/tenant-id/v2.0" + usernameClaim: "preferred_username" + scopes: + - "openid" + - "profile" + - "email" + adminGroups: + - "Workbench-Admins" + adminSuperuserGroups: + - "Platform-Admins" +``` + +## Related Documentation + +- [Product Team Site Management](./product-team-site-management.md) - Complete Site configuration guide +- [Posit Connect Admin Guide](https://docs.posit.co/connect/admin/) - Connect authentication documentation +- [Posit Workbench Admin Guide](https://docs.posit.co/ide/server-pro/admin/) - Workbench authentication documentation diff --git a/docs/guides/connect-configuration.md b/docs/guides/connect-configuration.md new file mode 100644 index 00000000..bbef8197 --- /dev/null +++ b/docs/guides/connect-configuration.md @@ -0,0 +1,1153 @@ +# Connect Configuration Guide + +This comprehensive guide covers all configuration options for Posit Connect when deployed via Team Operator. + +## Table of Contents + +1. [Overview](#overview) +2. [Basic Configuration](#basic-configuration) +3. [Authentication Configuration](#authentication-configuration) +4. [Database Configuration](#database-configuration) +5. [Off-Host Execution / Kubernetes Launcher](#off-host-execution--kubernetes-launcher) +6. [GPU Support](#gpu-support) +7. [Content Execution Settings](#content-execution-settings) +8. [Chronicle Integration](#chronicle-integration) +9. [SMTP/Email Configuration](#smtpemail-configuration-experimental) +10. [Data Integrations](#data-integrations) +11. [Experimental Features](#experimental-features) +12. [Example Configurations](#example-configurations) +13. [Troubleshooting](#troubleshooting) + +--- + +## Overview + +Posit Connect is a publishing and sharing platform that allows data scientists to share their work with stakeholders. When deployed via Team Operator, Connect runs with off-host execution enabled by default, meaning content executes in isolated Kubernetes Jobs rather than on the Connect server itself. + +### Architecture in Team Operator + +``` +Site CR + | + +-> Connect CR (generated by Site controller) + | + +-> Deployment (Connect server) + | +-> Chronicle sidecar (telemetry) + | + +-> Service (ClusterIP) + +-> Ingress (external access) + +-> ConfigMap (rstudio-connect.gcfg, runtime.yaml) + +-> PVC (data storage) + +-> ServiceAccount (for Kubernetes launcher) + +-> RBAC (roles/rolebindings for session jobs) + +-> Session Jobs (content execution - on demand) +``` + +### Configuration Flow + +Configuration for Connect flows through two paths: + +1. **Site-level configuration** (`spec.connect` in Site CR) - Recommended for most deployments +2. **Direct Connect CR configuration** - For advanced use cases + +When using a Site resource, the Site controller generates and manages the Connect CR. Changes to `site.spec.connect` automatically propagate to the Connect deployment. + +--- + +## Basic Configuration + +### Image Configuration + +```yaml +spec: + connect: + # Container image for Connect server + image: "ghcr.io/posit-dev/connect:ubuntu22-2024.10.0" + + # Image pull policy: Always, IfNotPresent, Never + imagePullPolicy: IfNotPresent + + # Session image for content execution (init container) + sessionImage: "ghcr.io/rstudio/rstudio-connect-content-init:ubuntu2204-2024.06.0" +``` + +**Important:** The `sessionImage` is used as an init container in content execution jobs. It prepares the runtime environment before content runs. + +### Resource Scaling + +```yaml +spec: + connect: + # Number of Connect server replicas + replicas: 2 +``` + +The operator automatically creates a PodDisruptionBudget based on replica count to ensure availability during updates. + +### Domain and Ingress + +```yaml +spec: + domain: example.mycompany.com + + connect: + # URL subdomain prefix (default: "connect") + domainPrefix: connect # Results in connect.example.mycompany.com + + # Ingress class for routing + ingressClass: traefik + + # Ingress annotations (applied to Connect ingress) + ingressAnnotations: + traefik.ingress.kubernetes.io/router.middlewares: kube-system-forward-auth@kubernetescrd +``` + +### Node Selection + +```yaml +spec: + connect: + nodeSelector: + node-type: posit-products + workload: connect +``` + +### Additional Environment Variables + +```yaml +spec: + connect: + addEnv: + CONNECT_CUSTOM_VAR: "custom-value" + HTTP_PROXY: "http://proxy.example.com:8080" + HTTPS_PROXY: "http://proxy.example.com:8080" + NO_PROXY: "localhost,127.0.0.1,.cluster.local" +``` + +### Volume Configuration + +Connect requires persistent storage for its data directory (`/var/lib/rstudio-connect`): + +```yaml +spec: + connect: + volume: + # Create a new PVC for Connect + create: true + + # Access modes for the PVC + accessModes: + - ReadWriteMany + + # Storage size + size: "10Gi" + + # Storage class name + storageClassName: "efs-sc" + + # Reference existing PV (optional) + volumeName: "existing-pv-name" +``` + +For existing PVCs: + +```yaml +spec: + connect: + volume: + create: false + pvcName: "existing-connect-pvc" +``` + +### License Configuration + +```yaml +spec: + connect: + license: + # License type: FILE or KEY + type: FILE + + # Reference to existing Kubernetes secret containing license + existingSecretName: "posit-licenses" + existingSecretKey: "connect.lic" +``` + +For inline license key (not recommended for production): + +```yaml +spec: + connect: + license: + type: KEY + key: "XXXX-XXXX-XXXX-XXXX-XXXX-XXXX-XXXX" +``` + +### Warning Messages + +Display warning messages to users on the Connect dashboard: + +```yaml +spec: + connect: + # Warning shown to logged-in users + loggedInWarning: "This is a development environment. Data may be deleted." + + # Warning shown to all visitors (including anonymous) + publicWarning: "For authorized use only." +``` + +--- + +## Authentication Configuration + +Connect supports multiple authentication providers. Authentication is configured through the `auth` section. + +### OIDC Authentication (Recommended) + +```yaml +spec: + connect: + auth: + type: "oidc" + + # Required: OAuth2 client ID from your IdP + clientId: "connect-client-id" + + # Required: OpenID Connect issuer URL + issuer: "https://idp.example.com/realms/posit" + + # Enable group synchronization + groups: true + + # Custom OAuth scopes (optional) + scopes: + - "openid" + - "profile" + - "email" + - "groups" + + # Claim mappings (optional - override defaults) + usernameClaim: "preferred_username" + emailClaim: "email" + uniqueIdClaim: "sub" + groupsClaim: "groups" + + # Disable groups claim entirely (when IdP doesn't support it) + disableGroupsClaim: false +``` + +**Client Secret:** The OIDC client secret must be stored in your secret backend: +- For Kubernetes secrets: Key `pub-client-secret` in the secret specified by `spec.secret.vaultName` +- For AWS Secrets Manager: Key `pub-client-secret` in the vault + +### SAML Authentication + +```yaml +spec: + connect: + auth: + type: "saml" + + # Required: IdP metadata URL + samlMetadataUrl: "https://idp.example.com/metadata" + + # Option 1: Use a predefined profile + samlIdPAttributeProfile: "azure" # or "okta", "default" + + # Option 2: Custom attribute mappings (mutually exclusive with profile) + # samlUsernameAttribute: "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/name" + # samlFirstNameAttribute: "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/givenname" + # samlLastNameAttribute: "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/surname" + # samlEmailAttribute: "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/emailaddress" +``` + +**Note:** `samlIdPAttributeProfile` and individual SAML attribute mappings are mutually exclusive. The controller will reject configurations that specify both. + +### Password Authentication + +For development or isolated environments: + +```yaml +spec: + connect: + auth: + type: "password" +``` + +### Role Mappings + +Map IdP groups to Connect roles (works with both OIDC and SAML when groups are enabled): + +```yaml +spec: + connect: + auth: + type: "oidc" + groups: true + + # Map groups to Connect roles + viewerRoleMapping: + - "connect-viewers" + - "data-consumers" + + publisherRoleMapping: + - "connect-publishers" + - "data-scientists" + + administratorRoleMapping: + - "connect-admins" + - "platform-admins" +``` + +--- + +## Database Configuration + +Connect requires PostgreSQL for storing application metadata, user information, and content settings. + +### Basic Database Settings + +Database connection is automatically configured based on your Site's database credentials secret. You can customize the schema names: + +```yaml +spec: + connect: + databaseSettings: + # Main database schema (default: "connect") + schema: "connect" + + # Instrumentation/metrics schema (default: "instrumentation") + instrumentationSchema: "connect_instrumentation" +``` + +### Database URL Construction + +The operator constructs database URLs automatically: + +``` +postgresql://user:password@host/connect-db?search_path=connect&sslmode=require +postgresql://user:password@host/connect-db?search_path=connect_instrumentation&sslmode=require +``` + +The password is injected via environment variable `CONNECT_POSTGRES_PASSWORD` from your secret backend. + +### Secret Backend Configuration + +```yaml +spec: + # For Kubernetes secrets + secret: + type: "kubernetes" + vaultName: "site-secrets" # Must contain key: pub-db-password + + # For AWS Secrets Manager + # secret: + # type: "aws" + # vaultName: "production-site-secrets" + + # Database credentials (separate from site secrets) + mainDatabaseCredentialSecret: + type: "aws" + vaultName: "rds!db-production-id" +``` + +--- + +## Off-Host Execution / Kubernetes Launcher + +Off-host execution is **enabled by default** when Connect is deployed via Team Operator. Content runs in isolated Kubernetes Jobs rather than on the Connect server. + +### How It Works + +1. User requests content (Shiny app, API, report) +2. Connect server creates a Kubernetes Job +3. Job runs with appropriate runtime image (R, Python, Quarto) +4. Content executes in the Job pod +5. Job terminates when content completes or times out + +### Session Configuration + +Customize session pods via `sessionConfig`: + +```yaml +apiVersion: core.posit.team/v1beta1 +kind: Connect +metadata: + name: example +spec: + sessionConfig: + # Service configuration for session networking + service: + type: ClusterIP + labels: + custom-label: value + + # Pod configuration for session jobs + pod: + annotations: + prometheus.io/scrape: "true" + + labels: + session-type: connect-content + + # Service account for session pods + serviceAccountName: "connect-session" + + # Image pull settings + imagePullPolicy: Always + imagePullSecrets: + - name: ghcr-secret + + # Environment variables for sessions + env: + - name: CUSTOM_VAR + value: "session-value" + - name: SECRET_VAR + valueFrom: + secretKeyRef: + name: session-secrets + key: api-key + + # Additional volumes for session pods + volumes: + - name: shared-data + persistentVolumeClaim: + claimName: shared-data-pvc + + # Volume mounts for session containers + volumeMounts: + - name: shared-data + mountPath: /mnt/shared + + # Node selection for sessions + nodeSelector: + workload: connect-sessions + + # Tolerations for session pods + tolerations: + - key: "dedicated" + operator: "Equal" + value: "connect-sessions" + effect: "NoSchedule" + + # Affinity rules + affinity: + nodeAffinity: + requiredDuringSchedulingIgnoredDuringExecution: + nodeSelectorTerms: + - matchExpressions: + - key: workload + operator: In + values: + - connect-sessions + + # Priority class for session scheduling + priorityClassName: "connect-sessions" + + # Init containers (in addition to runtime init) + initContainers: + - name: setup-cache + image: busybox + command: ["sh", "-c", "mkdir -p /cache && chmod 777 /cache"] + + # Sidecar containers + extraContainers: + - name: log-shipper + image: fluent/fluent-bit + volumeMounts: + - name: logs + mountPath: /var/log + + # Job-level configuration + job: + labels: + job-type: connect-content +``` + +### Session Images (Runtime Configuration) + +Connect uses runtime images for content execution. The default runtime.yaml includes two image configurations: + +```yaml +# Default runtime images (auto-generated) +name: Kubernetes +images: + - name: ghcr.io/rstudio/content-pro:r4.4.1-py3.12.4-ubuntu2204 + python: + installations: + - path: /opt/python/3.12.4/bin/python3 + version: "3.12.4" + r: + installations: + - path: /opt/R/4.4.1/bin/R + version: "4.4.1" + quarto: + installations: + - path: /opt/quarto/1.4.557/bin/quarto + version: "1.4.557" + + - name: ghcr.io/rstudio/content-pro:r4.2.2-py3.11.3-ubuntu2204 + python: + installations: + - path: /opt/python/3.11.3/bin/python3 + version: "3.11.3" + r: + installations: + - path: /opt/R/4.2.2/bin/R + version: "4.2.2" + quarto: + installations: + - path: /opt/quarto/1.3.340/bin/quarto + version: "1.3.340" +``` + +**Custom Runtime Images:** To add custom runtime images, you need to modify the Connect CR directly (advanced use case) or use the Connect admin interface after deployment. + +### Additional Volumes for Sessions + +Mount additional volumes into session pods: + +```yaml +spec: + connect: + # Site-level additional volumes (applied to main Connect and sessions) + additionalVolumes: + - pvcName: shared-datasets + mountPath: /mnt/datasets + readOnly: true + create: false # Use existing PVC + + - pvcName: project-data + mountPath: /mnt/projects + readOnly: false + create: true # Create new PVC + size: "100Gi" + storageClassName: "efs-sc" + accessModes: + - ReadWriteMany +``` + +--- + +## GPU Support + +Connect supports both NVIDIA and AMD GPUs for content execution. + +### NVIDIA GPU Configuration + +```yaml +spec: + connect: + gpuSettings: + # Default GPU limit for new content + nvidiaGPULimit: 1 + + # Maximum GPU limit users can request + maxNvidiaGPULimit: 4 +``` + +This configures the Connect scheduler to: +- Allow content to request NVIDIA GPUs +- Set default allocation to 1 GPU +- Cap maximum allocation at 4 GPUs + +### AMD GPU Configuration + +```yaml +spec: + connect: + gpuSettings: + # Default AMD GPU limit + amdGPULimit: 1 + + # Maximum AMD GPU limit + maxAMDGPULimit: 2 +``` + +### Combined Configuration + +```yaml +spec: + connect: + gpuSettings: + nvidiaGPULimit: 1 + maxNvidiaGPULimit: 8 + amdGPULimit: 0 + maxAMDGPULimit: 4 +``` + +### Prerequisites for GPU Support + +1. **GPU Nodes:** Your cluster must have nodes with GPUs and appropriate drivers +2. **Device Plugin:** NVIDIA or AMD device plugin must be installed +3. **GPU Images:** Content runtime images must include GPU libraries (CUDA, ROCm) +4. **Node Selection:** Configure session tolerations/affinity to schedule on GPU nodes + +Example session configuration for GPU nodes: + +```yaml +spec: + sessionConfig: + pod: + tolerations: + - key: "nvidia.com/gpu" + operator: "Exists" + effect: "NoSchedule" + nodeSelector: + accelerator: nvidia-gpu +``` + +--- + +## Content Execution Settings + +### Scheduler Configuration + +Control resource limits for content execution: + +```yaml +spec: + connect: + # Schedule concurrency - max concurrent scheduled jobs + scheduleConcurrency: 2 # Default: 2 +``` + +The default scheduler configuration includes: + +| Setting | Default Value | Description | +|---------|---------------|-------------| +| MaxCPURequest | 4 | Maximum CPU cores content can request | +| MaxCPULimit | 4 | Maximum CPU cores content can use | +| MaxMemoryRequest | 8GB | Maximum memory content can request | +| MaxMemoryLimit | 8GB | Maximum memory content can use | + +These are set internally by the operator and can be adjusted via Connect admin settings after deployment. + +### Application Settings + +The operator configures sensible defaults for applications: + +```yaml +# Internal configuration (for reference) +Applications: + BundleRetentionLimit: 2 # Keep last 2 bundle versions + PythonEnvironmentReaping: true # Clean up unused Python envs + OAuthIntegrationsEnabled: true # Allow OAuth integrations in content + ScheduleConcurrency: 2 # From scheduleConcurrency setting +``` + +### Authorization Defaults + +```yaml +# Internal authorization defaults +Authorization: + DefaultUserRole: publisher # New users get publisher role + PublishersCanManageVanities: true # Publishers can set custom URLs + ViewersCanOnlySeeThemselves: false # Viewers can see other users +``` + +--- + +## Chronicle Integration + +Chronicle provides telemetry and metrics collection for Connect. When configured, a Chronicle agent sidecar is automatically injected into the Connect deployment. + +### Enabling Chronicle + +Chronicle is enabled when both `chronicle.image` and `chronicle.agentImage` are set at the Site level: + +```yaml +spec: + chronicle: + # Main Chronicle server image + image: "ghcr.io/posit-dev/chronicle:2024.11.0" + + # Chronicle agent image (injected into products) + agentImage: "ghcr.io/posit-dev/chronicle-agent:latest" + + # S3 bucket for telemetry storage + s3Bucket: "my-chronicle-bucket" +``` + +### Chronicle Sidecar Behavior + +The Chronicle agent sidecar: +- Collects metrics from Connect's Prometheus endpoint (`:3232/metrics`) +- Sends telemetry to the Chronicle server +- Runs alongside the main Connect container + +### Product API Key (Experimental) + +For authenticated telemetry: + +```yaml +spec: + connect: + experimentalFeatures: + chronicleSidecarProductApiKeyEnabled: true +``` + +This requires a secret `pub-chronicle-api-key` in your secret backend. + +--- + +## SMTP/Email Configuration (Experimental) + +Enable email notifications from Connect for scheduled reports and user notifications. + +```yaml +spec: + connect: + experimentalFeatures: + # Sender email address (enables SMTP) + mailSender: "connect@example.com" + + # Display name in email subject + mailDisplayName: "Posit Connect" + + # Target email for testing (optional - routes all email here) + mailTarget: "test@example.com" +``` + +### SMTP Secrets + +SMTP credentials must be stored in your secret backend with these keys: + +| Key | Description | +|-----|-------------| +| `pub-smtp-host` | SMTP server hostname | +| `pub-smtp-port` | SMTP server port | +| `pub-smtp-user` | SMTP username | +| `pub-smtp-password` | SMTP password | + +For AWS Secrets Manager, these keys are automatically mapped to the `-connect-smtp` Kubernetes secret. + +### Email Configuration Notes + +- SMTP is considered **experimental** in Team Operator +- For production email, consider using Connect's built-in email configuration after deployment +- The `mailTarget` setting is useful for testing to prevent accidental emails to real users + +--- + +## Data Integrations + +### Databricks Integration + +Configure Databricks connectivity for content: + +```yaml +spec: + connect: + databricks: + # Human-readable name + name: "Production Databricks" + + # Workspace URL + url: "https://example-workspace.cloud.databricks.com" + + # OAuth client ID for service principal + clientId: "databricks-client-id" + + # Azure tenant ID (for Azure Databricks) + tenantId: "azure-tenant-id" +``` + +### DSN/ODBC Secrets + +Mount ODBC configuration files into Connect sessions for database connectivity: + +```yaml +spec: + connect: + experimentalFeatures: + # Reference to DSN secret key in your secret backend + dsnSecret: "connect-odbc-config" +``` + +The DSN secret should contain an `odbc.ini` file content: + +```ini +[MyDatabase] +Driver = PostgreSQL +Server = db.example.com +Port = 5432 +Database = analytics +``` + +This mounts to `/etc/odbc.ini` in session pods. + +--- + +## Experimental Features + +All experimental features are configured under `experimentalFeatures`: + +```yaml +spec: + connect: + experimentalFeatures: + # SMTP configuration + mailSender: "connect@example.com" + mailDisplayName: "Posit Connect" + mailTarget: "" + + # DSN/ODBC configuration + dsnSecret: "odbc-config-key" + + # Session configuration + sessionEnvVars: + - name: CUSTOM_SESSION_VAR + value: "value" + - name: SECRET_VAR + valueFrom: + secretKeyRef: + name: session-secrets + key: api-key + + sessionImagePullPolicy: Always + + # Override default session service account + sessionServiceAccountName: "custom-connect-session" + + # Chronicle product API key + chronicleSidecarProductApiKeyEnabled: false +``` + +### Session Environment Variables + +Inject custom environment variables into all content execution sessions: + +```yaml +spec: + connect: + experimentalFeatures: + sessionEnvVars: + # Simple value + - name: ENVIRONMENT + value: "production" + + # From ConfigMap + - name: CONFIG_VALUE + valueFrom: + configMapKeyRef: + name: connect-config + key: config-value + + # From Secret + - name: API_KEY + valueFrom: + secretKeyRef: + name: connect-secrets + key: api-key + + # AWS Secrets Manager reference (with secret:// prefix) + - name: DB_PASSWORD + value: "secret://db-password-key" +``` + +**Note:** The `secret://` prefix is special - it triggers the operator to create CSI secret mounts for AWS Secrets Manager values. + +--- + +## Example Configurations + +### Minimal Development Configuration + +```yaml +apiVersion: core.posit.team/v1beta1 +kind: Site +metadata: + name: dev + namespace: posit-team +spec: + domain: dev.example.com + + secret: + type: "kubernetes" + vaultName: "dev-secrets" + + mainDatabaseCredentialSecret: + type: "kubernetes" + vaultName: "dev-db-creds" + + connect: + image: ghcr.io/posit-dev/connect:ubuntu22-2024.10.0 + replicas: 1 + + license: + type: FILE + existingSecretName: posit-licenses + existingSecretKey: connect.lic + + auth: + type: "password" +``` + +### Production Configuration with OIDC + +```yaml +apiVersion: core.posit.team/v1beta1 +kind: Site +metadata: + name: production + namespace: posit-team +spec: + domain: posit.example.com + awsAccountId: "123456789012" + + secret: + type: "aws" + vaultName: "production-posit-secrets" + + mainDatabaseCredentialSecret: + type: "aws" + vaultName: "rds!db-production-id" + + ingressClass: traefik + + volumeSource: + type: fsx-zfs + volumeId: fsvol-abcdef123456 + dnsName: fs-abcdef123456.fsx.us-east-1.amazonaws.com + + connect: + image: ghcr.io/posit-dev/connect:ubuntu22-2024.10.0 + replicas: 3 + + license: + type: FILE + existingSecretName: posit-licenses + existingSecretKey: connect.lic + + auth: + type: "oidc" + clientId: "connect-production" + issuer: "https://idp.example.com" + groups: true + viewerRoleMapping: + - "connect-viewers" + publisherRoleMapping: + - "connect-publishers" + administratorRoleMapping: + - "connect-admins" + + nodeSelector: + workload: posit-products + + databaseSettings: + schema: "connect" + instrumentationSchema: "connect_metrics" + + gpuSettings: + nvidiaGPULimit: 1 + maxNvidiaGPULimit: 4 + + scheduleConcurrency: 4 + + databricks: + name: "Production Workspace" + url: "https://workspace.cloud.databricks.com" + clientId: "databricks-oauth-client" + + experimentalFeatures: + mailSender: "connect@example.com" + mailDisplayName: "Posit Connect Production" +``` + +### GPU-Enabled Configuration + +```yaml +apiVersion: core.posit.team/v1beta1 +kind: Site +metadata: + name: gpu-workloads + namespace: posit-team +spec: + domain: gpu.example.com + + secret: + type: "kubernetes" + vaultName: "gpu-site-secrets" + + mainDatabaseCredentialSecret: + type: "kubernetes" + vaultName: "gpu-db-creds" + + connect: + image: ghcr.io/posit-dev/connect:ubuntu22-2024.10.0 + replicas: 2 + sessionImage: ghcr.io/rstudio/rstudio-connect-content-init:gpu-ubuntu2204-2024.06.0 + + license: + type: FILE + existingSecretName: posit-licenses + existingSecretKey: connect.lic + + auth: + type: "oidc" + clientId: "connect-gpu" + issuer: "https://idp.example.com" + + gpuSettings: + nvidiaGPULimit: 1 + maxNvidiaGPULimit: 8 + + experimentalFeatures: + sessionEnvVars: + - name: CUDA_VISIBLE_DEVICES + value: "all" +``` + +### Multi-Schema Database Configuration + +```yaml +spec: + connect: + databaseSettings: + # Separate schemas for different concerns + schema: "connect_main" + instrumentationSchema: "connect_telemetry" +``` + +--- + +## Troubleshooting + +### Connect Pod Not Starting + +1. **Check pod status:** + ```bash + kubectl get pods -n posit-team -l app.kubernetes.io/name=connect + kubectl describe pod -n posit-team + ``` + +2. **Check logs:** + ```bash + kubectl logs -n posit-team deploy/-connect -c connect + kubectl logs -n posit-team deploy/-connect -c chronicle + ``` + +3. **Common issues:** + - License not found: Check license secret exists and key is correct + - Database connection failed: Verify database credentials and connectivity + - Volume mount failed: Ensure PVC exists and storage class is available + +### Content Sessions Not Running + +1. **Check session jobs:** + ```bash + kubectl get jobs -n posit-team -l posit.team/component=connect-session + ``` + +2. **Check session pod logs:** + ```bash + kubectl logs -n posit-team job/ + ``` + +3. **Common issues:** + - Init container failed: Check session image is accessible + - Runtime not found: Verify runtime.yaml configuration + - Resource limits exceeded: Check scheduler limits + +### Authentication Failures + +1. **For OIDC:** + - Verify client ID and issuer URL + - Check client secret is in the correct secret backend + - Ensure redirect URIs are configured in your IdP + - Enable debug logging: `spec.debug: true` + +2. **For SAML:** + - Verify metadata URL is accessible + - Check attribute mappings match your IdP + - Review Connect logs for SAML assertion errors + +### Database Connection Issues + +1. **Verify database secret:** + ```bash + # For Kubernetes secrets + kubectl get secret -n posit-team -o yaml + + # Check the pub-db-password key exists + ``` + +2. **Test database connectivity:** + ```bash + kubectl exec -it -n posit-team -- \ + psql "postgresql://user@host/db?sslmode=require" + ``` + +3. **Check database URL in Connect logs:** + Look for connection string format errors + +### Chronicle Sidecar Issues + +1. **Verify Chronicle is configured:** + ```bash + kubectl get pods -n posit-team -l app.kubernetes.io/name=connect -o yaml | grep chronicle + ``` + +2. **Check Chronicle agent logs:** + ```bash + kubectl logs -n posit-team deploy/-connect -c chronicle + ``` + +3. **Common issues:** + - Missing `agentImage`: Chronicle sidecar won't be created + - Network policy blocking: Ensure Chronicle server is reachable + +### GPU Sessions Not Scheduling + +1. **Check GPU availability:** + ```bash + kubectl describe nodes | grep -A5 "nvidia.com/gpu" + ``` + +2. **Verify device plugin:** + ```bash + kubectl get pods -n kube-system | grep nvidia + ``` + +3. **Check session tolerations match GPU node taints** + +### Debug Mode + +Enable comprehensive debug logging: + +```yaml +spec: + debug: true # Site-level debug + + connect: + # This is set automatically when site.spec.debug is true +``` + +Debug mode enables: +- ServiceLogLevel: DEBUG +- OAuth2.Logging: true +- TableauIntegration.Logging: true +- ProxyHeaderLogging: true + +### Viewing Generated Configuration + +To see the actual Connect configuration: + +```bash +kubectl get configmap -connect -n posit-team -o yaml +``` + +This shows: +- `rstudio-connect.gcfg`: Main Connect configuration +- `runtime.yaml`: Runtime/session image configuration +- `launcher.kubernetes.profiles.conf`: Launcher profiles + +--- + +## Related Documentation + +- [Site Management Guide](product-team-site-management.md) - Overall Site configuration +- [Adding Config Options](adding-config-options.md) - For contributors extending Connect configuration +- [Posit Connect Admin Guide](https://docs.posit.co/connect/admin/) - Official Connect documentation diff --git a/docs/guides/packagemanager-configuration.md b/docs/guides/packagemanager-configuration.md new file mode 100644 index 00000000..cd80f40a --- /dev/null +++ b/docs/guides/packagemanager-configuration.md @@ -0,0 +1,844 @@ +# Package Manager Configuration Guide + +This guide provides comprehensive documentation for configuring Posit Package Manager within the Team Operator framework. + +## Overview + +Posit Package Manager (PPM) is a repository management server that provides R and Python packages from CRAN, Bioconductor, and PyPI, as well as internal packages built from Git repositories. In Team Operator, Package Manager is deployed as a child resource of a Site. + +### Architecture + +``` +Site CR + └── PackageManager CR (created automatically) + ├── Deployment (rspm container) + ├── Service (ClusterIP) + ├── Ingress + ├── ConfigMap (rstudio-pm.gcfg) + ├── ServiceAccount + ├── PersistentVolumeClaim (optional) + ├── SecretProviderClass (for AWS secrets) + └── PodDisruptionBudget +``` + +When you configure Package Manager in a Site spec, the Site controller creates a `PackageManager` Custom Resource. The PackageManager controller then reconciles all the Kubernetes resources needed to run the service. + +## Basic Configuration + +### Minimal Configuration + +```yaml +apiVersion: core.posit.team/v1beta1 +kind: Site +metadata: + name: example + namespace: posit-team +spec: + domain: example.mycompany.com + packageManager: + image: ghcr.io/posit-dev/package-manager:jammy-2024.08.0 + license: + type: FILE + existingSecretName: license + existingSecretKey: ppm.lic +``` + +### Full Configuration Reference + +```yaml +spec: + packageManager: + # Container image (required) + image: ghcr.io/posit-dev/package-manager:jammy-2024.08.0 + + # Image pull policy: Always, IfNotPresent, Never + imagePullPolicy: IfNotPresent + + # Number of replicas (default: 1) + replicas: 2 + + # URL subdomain prefix (default: packagemanager) + domainPrefix: packagemanager + + # License configuration (required) + license: + type: FILE # FILE or KEY + existingSecretName: license # Name of existing K8s secret + existingSecretKey: ppm.lic # Key within the secret + + # Volume for local package cache + volume: + create: true + size: 10Gi + storageClassName: gp3 + accessModes: + - ReadWriteOnce + + # S3 storage backend (recommended for production) + s3Bucket: my-package-manager-bucket + + # Azure Files storage backend (alternative to S3) + azureFiles: + storageClassName: azure-file + shareSizeGiB: 100 + + # Git SSH keys for private repository access + gitSSHKeys: + - name: github + host: github.com + secretRef: + source: aws-secrets-manager + name: github + - name: gitlab + host: gitlab.company.com + secretRef: + source: kubernetes + name: gitlab-ssh + key: private-key + + # Node placement + nodeSelector: + node-type: posit-products + + # Additional environment variables + addEnv: + RSPM_ADDRESS: "https://packagemanager.example.com" +``` + +## License Configuration + +Package Manager requires a valid license. The license can be provided in two ways: + +### File License (Recommended) + +Store your license file in a Kubernetes secret: + +```bash +kubectl create secret generic license \ + --from-file=ppm.lic=/path/to/license.lic \ + -n posit-team +``` + +Then reference it in the Site spec: + +```yaml +spec: + packageManager: + license: + type: FILE + existingSecretName: license + existingSecretKey: ppm.lic +``` + +### Key License + +For license keys (activation keys): + +```yaml +spec: + packageManager: + license: + type: KEY + key: "XXXX-XXXX-XXXX-XXXX" +``` + +### AWS Secrets Manager License + +When using AWS secret management, the license is stored in the site's secret vault: + +```yaml +spec: + secret: + type: aws + vaultName: "my-site-secrets.posit.team" + packageManager: + license: + type: FILE +``` + +The license is expected in the vault under the key `pkg-license`. + +## Database Configuration + +Package Manager uses PostgreSQL for storing metadata and usage metrics. The operator automatically provisions two database schemas: + +| Schema | Purpose | +|--------|---------| +| `pm` | Main application data | +| `metrics` | Usage data and analytics | + +### Database Connection + +Database configuration is managed at the Site level and propagated to Package Manager: + +```yaml +spec: + # Database credentials from AWS Secrets Manager + mainDatabaseCredentialSecret: + type: aws + vaultName: "rds!db-example-database-id" + + # Or from Kubernetes secrets + mainDatabaseCredentialSecret: + type: kubernetes + vaultName: my-db-credentials +``` + +### Database URLs + +The operator constructs database URLs automatically using the format: + +``` +postgres://username:password@host/database?search_path=pm&sslmode=require +postgres://username:password@host/database?search_path=metrics&sslmode=require +``` + +### SSL Mode + +SSL mode is configured at the Site level through the database config: + +```yaml +spec: + mainDatabaseCredentialSecret: + type: aws + vaultName: "rds!db-mydb" +``` + +The operator uses `require` SSL mode by default for production deployments. + +## Storage Backends + +Package Manager supports multiple storage backends for package data. + +### S3 Storage (AWS Recommended) + +For production deployments on AWS, S3 storage is recommended: + +```yaml +spec: + packageManager: + s3Bucket: my-ppm-bucket +``` + +This generates the following configuration: + +```ini +[Storage] +Default = S3 + +[S3Storage] +Bucket = my-ppm-bucket +Prefix = /ppm-v0 +``` + +#### IAM Permissions + +The Package Manager service account requires the following S3 permissions: + +```json +{ + "Version": "2012-10-17", + "Statement": [ + { + "Effect": "Allow", + "Action": [ + "s3:GetObject", + "s3:PutObject", + "s3:DeleteObject", + "s3:ListBucket" + ], + "Resource": [ + "arn:aws:s3:::my-ppm-bucket", + "arn:aws:s3:::my-ppm-bucket//ppm-v0/*" + ] + } + ] +} +``` + +#### IAM Role Association + +The operator automatically creates a ServiceAccount with the appropriate IAM role annotation: + +```yaml +annotations: + eks.amazonaws.com/role-arn: arn:aws:iam:::role/pkg....posit.team +``` + +### Azure Files Storage + +For Azure deployments, use Azure Files: + +```yaml +spec: + packageManager: + azureFiles: + storageClassName: azure-file + shareSizeGiB: 100 # Minimum 100 GiB required +``` + +This creates a PersistentVolumeClaim with: +- `ReadWriteMany` access mode +- Dynamic provisioning via the Azure Files CSI driver +- Mount path at `/mnt/azure-files` + +#### Azure Files Prerequisites + +1. Create an Azure Storage Account +2. Create a StorageClass that uses the Azure Files CSI driver: + +```yaml +apiVersion: storage.k8s.io/v1 +kind: StorageClass +metadata: + name: azure-file +provisioner: file.csi.azure.com +parameters: + skuName: Premium_LRS + protocol: nfs +reclaimPolicy: Delete +volumeBindingMode: Immediate +allowVolumeExpansion: true +``` + +3. Configure workload identity or storage account credentials for the CSI driver. + +### Local Volume Storage + +For development or small deployments, use local persistent volume storage: + +```yaml +spec: + packageManager: + volume: + create: true + size: 10Gi + storageClassName: gp3 + accessModes: + - ReadWriteOnce +``` + +## Git Builder Configuration + +Package Manager can build packages from Git repositories. For private repositories, SSH key authentication is required. + +### SSH Key Configuration + +SSH keys are configured in the `gitSSHKeys` array: + +```yaml +spec: + packageManager: + gitSSHKeys: + - name: github # Unique identifier + host: github.com # Git host domain + secretRef: + source: aws-secrets-manager + name: github # Key name in AWS secret + - name: gitlab-internal + host: gitlab.internal.com + secretRef: + source: kubernetes + name: gitlab-ssh-key + key: id_rsa +``` + +### AWS Secrets Manager SSH Keys + +When using AWS Secrets Manager, SSH keys are stored in a dedicated vault: + +**Vault naming convention:** +``` +{workloadCompoundName}-{siteName}-ssh-ppm-keys.posit.team +``` + +Example vault structure: +```json +{ + "github": "-----BEGIN OPENSSH PRIVATE KEY-----\n...\n-----END OPENSSH PRIVATE KEY-----", + "gitlab": "-----BEGIN OPENSSH PRIVATE KEY-----\n...\n-----END OPENSSH PRIVATE KEY-----" +} +``` + +The operator creates: +1. A `SecretProviderClass` for the SSH secrets +2. CSI volume mounts for each SSH key at `/mnt/ssh-keys/` + +### Kubernetes Secret SSH Keys + +For Kubernetes-native secrets: + +```bash +# Create the SSH key secret +kubectl create secret generic gitlab-ssh-key \ + --from-file=id_rsa=/path/to/private/key \ + -n posit-team +``` + +```yaml +spec: + packageManager: + gitSSHKeys: + - name: gitlab + host: gitlab.company.com + secretRef: + source: kubernetes + name: gitlab-ssh-key + key: id_rsa +``` + +### Passphrase-Protected Keys + +For SSH keys with passphrases: + +```yaml +spec: + packageManager: + gitSSHKeys: + - name: secure-git + host: secure-git.company.com + secretRef: + source: aws-secrets-manager + name: secure-git + passphraseSecretRef: + source: aws-secrets-manager + name: secure-git-passphrase +``` + +### Git Build Settings + +Enable unsandboxed Git builds (required for many build scenarios): + +```yaml +# This is configured automatically by the operator +# The generated rstudio-pm.gcfg will contain: +[Git] +AllowUnsandboxedGitBuilds = true +``` + +## Package Repository Configuration + +The operator pre-configures default repository names: + +```yaml +# Generated configuration +[Repos] +PyPI = pypi +CRAN = cran +Bioconductor = bioconductor +``` + +### R Version Configuration + +R versions available for building packages: + +```yaml +# Generated configuration (default) +[Server] +RVersion = /opt/R/default +``` + +## Secret Management + +Package Manager secrets are managed differently based on the Site's secret type. + +### AWS Secrets Manager + +When `secret.type: aws`, the following secrets are retrieved from AWS Secrets Manager: + +| Secret Key | Purpose | +|------------|---------| +| `pkg-license` | Package Manager license file | +| `pkg-secret-key` | Encryption key for sensitive data | +| `pkg-db-password` | Database password | + +These are stored in the Site's vault (configured via `secret.vaultName`). + +### Kubernetes Secrets + +When `secret.type: kubernetes`, secrets are retrieved from Kubernetes: + +```yaml +spec: + secret: + type: kubernetes + vaultName: my-site-secrets +``` + +Expected secret keys: +- `pkg-secret-key`: Encryption key +- `pkg-db-password`: Database password + +## Resource Configuration + +### Default Resources + +The operator applies these resource limits by default: + +```yaml +resources: + requests: + cpu: 100m + memory: 2Gi + ephemeral-storage: 500Mi + limits: + cpu: 2000m + memory: 4Gi + ephemeral-storage: 2Gi +``` + +### Pod Disruption Budget + +A PodDisruptionBudget is automatically created to ensure availability during cluster maintenance. For single-replica deployments, `minAvailable: 0`. For multi-replica deployments, the operator calculates an appropriate `minAvailable` value. + +### Affinity + +Pods are scheduled with anti-affinity to distribute replicas across nodes: + +```yaml +affinity: + podAntiAffinity: + preferredDuringSchedulingIgnoredDuringExecution: + - weight: 1 + podAffinityTerm: + topologyKey: kubernetes.io/hostname + labelSelector: + matchExpressions: + - key: app.kubernetes.io/instance + operator: In + values: ["-packagemanager"] +``` + +## Example Configurations + +### AWS Production Deployment + +```yaml +apiVersion: core.posit.team/v1beta1 +kind: Site +metadata: + name: production + namespace: posit-team +spec: + domain: posit.example.com + awsAccountId: "123456789012" + clusterDate: "20240101" + workloadCompoundName: my-workload + + secret: + type: aws + vaultName: production-site-secrets.posit.team + + mainDatabaseCredentialSecret: + type: aws + vaultName: "rds!db-production-id" + + ingressClass: traefik + ingressAnnotations: + traefik.ingress.kubernetes.io/router.middlewares: kube-system-forward-auth@kubernetescrd + + packageManager: + image: ghcr.io/posit-dev/package-manager:jammy-2024.08.0 + imagePullPolicy: IfNotPresent + replicas: 2 + + license: + type: FILE + + # S3 for package storage + s3Bucket: production-ppm-packages + + # Git SSH keys for private repos + gitSSHKeys: + - name: github + host: github.com + secretRef: + source: aws-secrets-manager + name: github + - name: gitlab + host: gitlab.company.com + secretRef: + source: aws-secrets-manager + name: gitlab + + nodeSelector: + node-type: posit-products + + addEnv: + RSPM_LOG_LEVEL: info +``` + +### Azure Deployment + +```yaml +apiVersion: core.posit.team/v1beta1 +kind: Site +metadata: + name: azure-site + namespace: posit-team +spec: + domain: posit.azurecompany.com + + secret: + type: kubernetes + vaultName: azure-site-secrets + + mainDatabaseCredentialSecret: + type: kubernetes + vaultName: azure-db-creds + + volumeSource: + type: azure-netapp + + packageManager: + image: ghcr.io/posit-dev/package-manager:jammy-2024.08.0 + replicas: 2 + + license: + type: FILE + existingSecretName: ppm-license + existingSecretKey: license.lic + + # Azure Files for package storage + azureFiles: + storageClassName: azure-file-premium + shareSizeGiB: 500 + + # Kubernetes-native SSH keys + gitSSHKeys: + - name: azure-devops + host: ssh.dev.azure.com + secretRef: + source: kubernetes + name: azure-devops-ssh + key: id_rsa +``` + +### Development/Testing Deployment + +```yaml +apiVersion: core.posit.team/v1beta1 +kind: Site +metadata: + name: dev + namespace: posit-team +spec: + domain: dev.example.com + debug: true + + secret: + type: kubernetes + vaultName: dev-secrets + + mainDatabaseCredentialSecret: + type: kubernetes + vaultName: dev-db-creds + + packageManager: + image: ghcr.io/posit-dev/package-manager:jammy-2024.08.0 + replicas: 1 + + license: + type: FILE + existingSecretName: license + existingSecretKey: ppm.lic + + # Local volume for development + volume: + create: true + size: 5Gi + storageClassName: standard +``` + +## Troubleshooting + +### Viewing Package Manager Resources + +```bash +# List Package Manager CRs +kubectl get packagemanagers -n posit-team + +# Describe Package Manager +kubectl describe packagemanager -n posit-team + +# View Package Manager pods +kubectl get pods -n posit-team -l app.kubernetes.io/name=package-manager + +# View Package Manager logs +kubectl logs -n posit-team -l app.kubernetes.io/name=package-manager --tail=100 + +# View generated configuration +kubectl get configmap -packagemanager -n posit-team -o yaml +``` + +### Common Issues + +#### Pod Stuck in CrashLoopBackOff + +1. Check logs for startup errors: + ```bash + kubectl logs -n posit-team deploy/-packagemanager --previous + ``` + +2. Verify license is valid and accessible: + ```bash + # For Kubernetes secrets + kubectl get secret license -n posit-team + ``` + +3. Check database connectivity: + ```bash + kubectl exec -it deploy/-packagemanager -n posit-team -- \ + /bin/bash -c 'echo "SELECT 1" | psql $PACKAGEMANAGER_POSTGRES_URL' + ``` + +#### S3 Access Denied + +1. Verify the IAM role is correctly associated: + ```bash + kubectl get sa -packagemanager -n posit-team -o yaml | grep role-arn + ``` + +2. Check the bucket policy allows the role. + +3. Verify the bucket prefix is correct (`/ppm-v0`). + +#### SSH Keys Not Working + +1. Verify the SecretProviderClass exists: + ```bash + kubectl get secretproviderclass -packagemanager-ssh-secrets -n posit-team + ``` + +2. Check the SSH key is mounted: + ```bash + kubectl exec -it deploy/-packagemanager -n posit-team -- \ + ls -la /mnt/ssh-keys/ + ``` + +3. Verify the SSH key permissions and format. + +#### Azure Files PVC Pending + +1. Check the StorageClass exists: + ```bash + kubectl get sc azure-file + ``` + +2. Verify the CSI driver is installed: + ```bash + kubectl get pods -n kube-system | grep csi-azurefile + ``` + +3. Check PVC events: + ```bash + kubectl describe pvc -packagemanager-azure-files -n posit-team + ``` + +### Debug Mode + +Enable debug logging for detailed troubleshooting: + +```yaml +spec: + debug: true # Site-level debug + packageManager: + addEnv: + RSPM_LOG_LEVEL: debug +``` + +The generated config will include: + +```ini +[Debug] +Log = verbose +``` + +### Sleep Mode for Debugging + +For debugging crash loops, enable sleep mode: + +```yaml +# Directly on PackageManager CR (not recommended for production) +apiVersion: core.posit.team/v1beta1 +kind: PackageManager +metadata: + name: example + namespace: posit-team +spec: + sleep: true +``` + +This changes the container command to `sleep infinity`, allowing you to exec into the container for debugging. + +## Configuration Reference + +### Generated Config File (rstudio-pm.gcfg) + +The operator generates a configuration file with these sections: + +```ini +[Server] +Address = https://packagemanager.example.com +RVersion = /opt/R/default +LauncherDir = /var/lib/rstudio-pm/launcher_internal +AccessLogFormat = common + +[Http] +Listen = :4242 + +[Git] +AllowUnsandboxedGitBuilds = true + +[Database] +Provider = postgres + +[Postgres] +URL = postgres://user:pass@host/db?search_path=pm&sslmode=require +UsageDataURL = postgres://user:pass@host/db?search_path=metrics&sslmode=require + +[Metrics] +Enabled = true + +[Repos] +PyPI = pypi +CRAN = cran +Bioconductor = bioconductor + +[Storage] +Default = S3 + +[S3Storage] +Bucket = my-bucket +Prefix = site-name/ppm-v0 + +[Debug] +Log = verbose +``` + +### Environment Variables + +| Variable | Purpose | +|----------|---------| +| `PACKAGEMANAGER_SECRET_KEY` | Encryption key for sensitive data | +| `PACKAGEMANAGER_POSTGRES_PASSWORD` | Database password | +| `PACKAGEMANAGER_POSTGRES_USAGEDATAPASSWORD` | Metrics database password | +| `RSPM_LICENSE_FILE_PATH` | Path to license file | + +### Kubernetes Labels + +All Package Manager resources are labeled with: + +```yaml +app.kubernetes.io/managed-by: team-operator +app.kubernetes.io/name: package-manager +app.kubernetes.io/instance: -packagemanager +posit.team/site: +posit.team/component: package-manager +``` + +## Related Documentation + +- [Site Management Guide](product-team-site-management.md) - Complete Site configuration reference +- [Adding Config Options](adding-config-options.md) - Extending Package Manager configuration +- [Posit Package Manager Documentation](https://docs.posit.co/rspm/) - Official product documentation diff --git a/docs/guides/product-team-site-management.md b/docs/guides/product-team-site-management.md new file mode 100644 index 00000000..0c408f39 --- /dev/null +++ b/docs/guides/product-team-site-management.md @@ -0,0 +1,774 @@ +# Site Management Guide + +This guide covers the management of Site resources in Team Operator for platform engineers deploying Posit Team. + +## Overview + +The `Site` Custom Resource Definition (CRD) is the **single source of truth** for a Posit Team deployment. A Site represents a complete deployment environment that includes: + +- **Flightdeck** - Landing page dashboard +- **Connect** - Publishing and sharing platform +- **Workbench** - Interactive development environment +- **Package Manager** - Package repository management +- **Chronicle** - Telemetry and monitoring +- **Keycloak** - Authentication and identity management (optional) + +When you create or update a Site, the Site controller automatically reconciles all child product Custom Resources (Connect, Workbench, Package Manager, Chronicle, Flightdeck) to match your desired configuration. + +## Site Lifecycle + +### Creating a Site + +To create a new Posit Team deployment, apply a Site manifest: + +```bash +kubectl apply -f site.yaml -n posit-team +``` + +When a Site is created, the Site controller: + +1. Provisions storage volumes (FSx, NFS, or Azure NetApp based on configuration) +2. Creates subdirectory provisioning jobs for shared storage +3. Reconciles the Flightdeck landing page +4. Creates Connect, Workbench, Package Manager, and Chronicle CRs +5. Sets up network policies for product communication +6. Creates any extra service accounts specified + +### Updating Site Configuration + +To update a Site: + +```bash +kubectl edit site -n posit-team +``` + +Or apply an updated manifest: + +```bash +kubectl apply -f site.yaml -n posit-team +``` + +The Site controller detects changes and propagates them to child product CRs. Product controllers then reconcile their respective deployments. + +**Configuration Flow:** +``` +Site spec change + -> Site controller reconciles + -> Product CRs updated (Connect, Workbench, PM, Chronicle, Flightdeck) + -> Product controllers reconcile + -> Deployments, Services, Ingress updated +``` + +### Deleting a Site + +When you delete a Site: + +```bash +kubectl delete site -n posit-team +``` + +The Site controller cleans up: + +1. Connect CR and all its resources +2. Workbench CR and all its resources +3. Package Manager CR and all its resources +4. Flightdeck CR and all its resources +5. Network policies + +**Important:** Child resources have owner references to the Site, so Kubernetes garbage collection handles most cleanup automatically. + +If `dropDatabaseOnTearDown: true` is set, product databases will be dropped during cleanup. + +## Site Spec Structure + +### Core Configuration + +```yaml +apiVersion: core.posit.team/v1beta1 +kind: Site +metadata: + name: example-site + namespace: posit-team +spec: + # Required: Base domain for all products + domain: example.mycompany.com + + # AWS-specific configuration (for EKS deployments) + awsAccountId: "123456789012" + clusterDate: "20240101" + workloadCompoundName: "my-workload" + + # Enable debug logging for all products + debug: false + + # Log format: "text" (default) or "json" + logFormat: text + + # Network trust level (0-100, default 100) + networkTrust: 100 +``` + +### Domain Configuration + +Products are accessed via subdomains of the base domain: + +| Product | Default Subdomain | URL Pattern | +|---------|-------------------|-------------| +| Connect | `connect` | `connect.example.mycompany.com` | +| Workbench | `workbench` | `workbench.example.mycompany.com` | +| Package Manager | `packagemanager` | `packagemanager.example.mycompany.com` | + +You can customize prefixes in each product's configuration: + +```yaml +spec: + domain: example.mycompany.com + connect: + domainPrefix: connect # Default + workbench: + domainPrefix: workbench # Default + packageManager: + domainPrefix: packagemanager # Default +``` + +### Ingress Configuration + +```yaml +spec: + # Ingress class for all products + ingressClass: traefik + + # Annotations applied to all ingress resources + ingressAnnotations: + traefik.ingress.kubernetes.io/router.middlewares: kube-system-traefik-forward-auth@kubernetescrd +``` + +### Secret Management + +Team Operator supports multiple secret backends: + +```yaml +spec: + # Site-level secrets configuration + secret: + type: "kubernetes" # or "aws" + vaultName: "site-secrets" + + # Workload-level secrets (for multi-site workloads) + workloadSecret: + type: "kubernetes" + vaultName: "workload-secrets" + + # Database credentials secret + mainDatabaseCredentialSecret: + type: "aws" # AWS Secrets Manager + vaultName: "rds!db-example-database-id" +``` + +**Secret Types:** + +| Type | Description | +|------|-------------| +| `kubernetes` | Standard Kubernetes Secrets | +| `aws` | AWS Secrets Manager | + +### Storage Configuration + +#### Volume Source Types + +```yaml +spec: + volumeSource: + # FSx for OpenZFS (AWS) + type: fsx-zfs + volumeId: fsvol-example123456789 + dnsName: fs-example123456789.fsx.us-east-1.amazonaws.com + + # NFS + type: nfs + volumeId: nfs-server-address + dnsName: nfs.example.com + + # Azure NetApp Files + type: azure-netapp +``` + +**Supported Volume Types:** + +| Type | Description | Cloud Provider | +|------|-------------|----------------| +| `fsx-zfs` | FSx for OpenZFS | AWS | +| `nfs` | Generic NFS | Any | +| `azure-netapp` | Azure NetApp Files | Azure | +| `` (empty) | No managed volumes | Any | + +#### Shared Directory + +Configure a shared directory mounted across Workbench and Connect: + +```yaml +spec: + # Creates /mnt/shared in both Workbench and Connect + sharedDirectory: shared +``` + +#### EFS Configuration (AWS) + +```yaml +spec: + efsEnabled: true + vpcCIDR: "10.0.0.0/16" # Required for EFS network policies +``` + +### Product Enablement + +#### Flightdeck (Landing Page) + +```yaml +spec: + flightdeck: + enabled: true # Default: true + image: "docker.io/posit/ptd-flightdeck:latest" + imagePullPolicy: Always + replicas: 1 + logLevel: info # debug, info, warn, error + logFormat: text # text, json + featureEnabler: + showConfig: false + showAcademy: false +``` + +#### Connect + +```yaml +spec: + connect: + image: "ghcr.io/posit-dev/connect:ubuntu22-2024.10.0" + imagePullPolicy: IfNotPresent + replicas: 1 + domainPrefix: connect + + # License configuration + license: + type: FILE + existingSecretName: license + existingSecretKey: pc.lic + + # Volume for Connect data + volume: + create: false + size: 3Gi + + # Authentication + auth: + type: "oidc" # or "password", "saml" + clientId: "connect-client-id" + issuer: "https://idp.example.com" + + # Node placement + nodeSelector: + node-type: posit-products + + # Additional environment variables + addEnv: + CUSTOM_VAR: "value" + + # GPU settings for content execution + gpuSettings: + nvidiaGPULimit: 1 + maxNvidiaGPULimit: 4 + + # Database schema settings + databaseSettings: + schema: "connect" + instrumentationSchema: "connect_instrumentation" + + # Content scheduling concurrency + scheduleConcurrency: 2 + + # Databricks integration + databricks: + url: "https://workspace.cloud.databricks.com" + clientId: "databricks-client-id" + + # Experimental features + experimentalFeatures: + mailSender: "connect@example.com" + mailDisplayName: "Posit Connect" + sessionServiceAccountName: "custom-session-sa" +``` + +#### Workbench + +```yaml +spec: + workbench: + image: "ghcr.io/posit-dev/workbench:jammy-2024.12.0" + imagePullPolicy: IfNotPresent + replicas: 1 + domainPrefix: workbench + + # License configuration + license: + type: FILE + existingSecretName: license + existingSecretKey: pw.lic + + # Volume for user home directories + volume: + create: false + size: 3Gi + + # Additional volumes (e.g., project data) + additionalVolumes: + - pvcName: project-data + mountPath: /mnt/projects + readOnly: false + + # Authentication + auth: + type: "oidc" + clientId: "workbench-client-id" + issuer: "https://idp.example.com" + + # Auto-create user accounts + createUsersAutomatically: true + + # Admin groups + adminGroups: + - workbench-admin + adminSuperuserGroups: + - workbench-superadmin + + # Session images + defaultSessionImage: "ghcr.io/posit-dev/workbench-session:jammy-2024.12.0" + extraSessionImages: + - "ghcr.io/posit-dev/workbench-session:gpu-2024.12.0" + + # Node placement + nodeSelector: + node-type: posit-products + tolerations: + - key: "dedicated" + operator: "Equal" + value: "posit" + effect: "NoSchedule" + + # Session-specific tolerations + sessionTolerations: + - key: "dedicated" + operator: "Equal" + value: "workbench-sessions" + effect: "NoSchedule" + + # Databricks integration + databricks: + example-workspace: + name: "Example Workspace" + url: "https://example-workspace.cloud.databricks.com" + clientId: "databricks-client-id" + + # Snowflake integration + snowflake: + accountId: "abc12345" + clientId: "snowflake-client-id" + + # VS Code settings + vsCodeExtensions: + - "ms-python.python" + - "quarto.quarto" + + # Positron settings + positronConfig: + enabled: 1 + extensions: + - "posit.positron-r" + + # API settings + apiSettings: + workbenchApiEnabled: 1 + workbenchApiAdminEnabled: 1 + + # Experimental features + experimentalFeatures: + nonRoot: false + privilegedSessions: false + sessionServiceAccountName: "custom-session-sa" + resourceProfiles: + small: + name: "Small" + cpus: "1" + memMb: "2000" + large: + name: "Large" + cpus: "4" + memMb: "8000" +``` + +#### Package Manager + +```yaml +spec: + packageManager: + image: "ghcr.io/posit-dev/package-manager:jammy-2024.08.0" + imagePullPolicy: IfNotPresent + replicas: 1 + domainPrefix: packagemanager + + # License configuration + license: + type: FILE + existingSecretName: license + existingSecretKey: ppm.lic + + # Volume for package cache + volume: + create: false + size: 3Gi + + # S3 storage for packages (recommended for production) + s3Bucket: "my-package-manager-bucket" + + # Azure Files storage (alternative to S3) + azureFiles: + storageClassName: "azure-file" + shareSizeGiB: 100 + + # Git SSH keys for private repositories + gitSSHKeys: + - secretName: git-ssh-key + secretKey: id_rsa +``` + +#### Chronicle (Telemetry) + +```yaml +spec: + chronicle: + image: "ghcr.io/posit-dev/chronicle:2024.11.0" + imagePullPolicy: IfNotPresent + + # S3 storage for telemetry data + s3Bucket: "my-chronicle-bucket" + + # Chronicle agent image (injected into other products) + agentImage: "ghcr.io/posit-dev/chronicle-agent:latest" +``` + +#### Keycloak (Optional IdP) + +```yaml +spec: + keycloak: + enabled: true + image: "quay.io/keycloak/keycloak:latest" + imagePullPolicy: IfNotPresent +``` + +### Authentication Configuration + +Team Operator supports multiple authentication methods: + +#### OIDC Authentication + +```yaml +spec: + connect: + auth: + type: "oidc" + clientId: "connect-client-id" + issuer: "https://idp.example.com" + groups: true + usernameClaim: "preferred_username" + emailClaim: "email" + groupsClaim: "groups" + scopes: + - "openid" + - "profile" + - "email" + # Role mappings + viewerRoleMapping: + - "connect-viewers" + publisherRoleMapping: + - "connect-publishers" + administratorRoleMapping: + - "connect-admins" +``` + +#### SAML Authentication + +```yaml +spec: + workbench: + auth: + type: "saml" + samlMetadataUrl: "https://idp.example.com/metadata" + samlIdPAttributeProfile: "azure" # or custom attribute mappings + samlUsernameAttribute: "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/name" + samlEmailAttribute: "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/emailaddress" +``` + +#### Password Authentication + +```yaml +spec: + connect: + auth: + type: "password" +``` + +### Database Configuration + +All stateful products (Connect, Workbench, Package Manager) use PostgreSQL: + +```yaml +spec: + # Database credentials from AWS Secrets Manager + mainDatabaseCredentialSecret: + type: "aws" + vaultName: "rds!db-example-database-id" + + # Drop databases when Site is deleted (use with caution!) + dropDatabaseOnTearDown: false +``` + +Database URLs are determined automatically from the workload secret configuration. + +### Image Pull Configuration + +```yaml +spec: + # Image pull secrets (must exist in namespace) + imagePullSecrets: + - "regcred" + - "ghcr-secret" + + # Disable pre-pull daemonset + disablePrePullImages: false +``` + +### Extra Service Accounts + +Create additional service accounts for custom workloads: + +```yaml +spec: + extraSiteServiceAccounts: + - nameSuffix: "custom-jobs" + annotations: + eks.amazonaws.com/role-arn: "arn:aws:iam::123456789012:role/CustomJobsRole" +``` + +## Common Site Configurations + +### Minimal Development Site + +```yaml +apiVersion: core.posit.team/v1beta1 +kind: Site +metadata: + name: dev + namespace: posit-team +spec: + domain: dev.example.com + secret: + type: "kubernetes" + vaultName: "dev-secrets" + mainDatabaseCredentialSecret: + type: "kubernetes" + vaultName: "dev-db-creds" + packageManager: + image: ghcr.io/posit-dev/package-manager:jammy-2024.08.0 + license: + type: FILE + existingSecretName: license + existingSecretKey: ppm.lic + connect: + image: ghcr.io/posit-dev/connect:ubuntu22-2024.10.0 + license: + type: FILE + existingSecretName: license + existingSecretKey: pc.lic + auth: + type: "password" + workbench: + image: ghcr.io/posit-dev/workbench:jammy-2024.12.0 + license: + type: FILE + existingSecretName: license + existingSecretKey: pw.lic + auth: + type: "password" +``` + +### Production Site with OIDC and S3 Storage + +```yaml +apiVersion: core.posit.team/v1beta1 +kind: Site +metadata: + name: production + namespace: posit-team +spec: + domain: posit.example.com + awsAccountId: "123456789012" + clusterDate: "20240101" + + secret: + type: "aws" + vaultName: "production-site-secrets" + mainDatabaseCredentialSecret: + type: "aws" + vaultName: "rds!db-production-id" + + volumeSource: + type: fsx-zfs + volumeId: fsvol-abcdef123456 + dnsName: fs-abcdef123456.fsx.us-east-1.amazonaws.com + + sharedDirectory: shared + + ingressClass: traefik + ingressAnnotations: + traefik.ingress.kubernetes.io/router.middlewares: kube-system-forward-auth@kubernetescrd + + packageManager: + image: ghcr.io/posit-dev/package-manager:jammy-2024.08.0 + replicas: 2 + s3Bucket: "production-ppm-bucket" + license: + type: FILE + existingSecretName: license + existingSecretKey: ppm.lic + + connect: + image: ghcr.io/posit-dev/connect:ubuntu22-2024.10.0 + replicas: 2 + license: + type: FILE + existingSecretName: license + existingSecretKey: pc.lic + auth: + type: "oidc" + clientId: "connect-prod" + issuer: "https://idp.example.com" + groups: true + + workbench: + image: ghcr.io/posit-dev/workbench:jammy-2024.12.0 + replicas: 2 + license: + type: FILE + existingSecretName: license + existingSecretKey: pw.lic + auth: + type: "oidc" + clientId: "workbench-prod" + issuer: "https://idp.example.com" + createUsersAutomatically: true + adminGroups: + - posit-admins + + chronicle: + image: ghcr.io/posit-dev/chronicle:2024.11.0 + s3Bucket: "production-chronicle-bucket" + + dropDatabaseOnTearDown: false +``` + +## Troubleshooting + +### Viewing Site Status + +```bash +# List all Sites +kubectl get sites -n posit-team + +# Describe a Site +kubectl describe site -n posit-team + +# View Site controller logs +kubectl logs -n posit-team -l app.kubernetes.io/name=team-operator --tail=100 +``` + +### Common Issues + +#### Products Not Deploying + +1. Check Site controller logs for errors: + ```bash + kubectl logs -n posit-team deploy/team-operator | grep -i error + ``` + +2. Verify product CRs were created: + ```bash + kubectl get connect,workbench,packagemanager,chronicle -n posit-team + ``` + +3. Check individual product controller logs if CRs exist but pods are not running. + +#### Database Connection Failures + +1. Verify database credential secret exists and is accessible: + ```bash + # For Kubernetes secrets + kubectl get secret -n posit-team + + # For AWS Secrets Manager, check operator logs for fetch errors + ``` + +2. Ensure database host is reachable from the cluster. + +3. Check SSL mode configuration matches your database server. + +#### Volume Provisioning Issues + +1. For FSx volumes, verify the volume ID and DNS name are correct. + +2. Check subdirectory provisioning job: + ```bash + kubectl get jobs -n posit-team | grep subdir + kubectl logs job/-subdir-creator -n posit-team + ``` + +3. Verify storage class exists for your volume type. + +#### Ingress Not Working + +1. Verify ingress class is correct and controller is running. + +2. Check ingress resources were created: + ```bash + kubectl get ingress -n posit-team + ``` + +3. Verify DNS records point to your ingress controller. + +#### Authentication Failures + +1. For OIDC, verify client ID and issuer URL are correct. + +2. Check that redirect URIs are configured in your IdP. + +3. Review product logs for detailed auth error messages: + ```bash + kubectl logs -n posit-team deploy/-connect + ``` + +### Reconciliation Loop Detection + +If you notice constant reconciliation: + +1. Check for spec fields that might be mutating: + ```bash + kubectl get site -o yaml | diff - site.yaml + ``` + +2. Look for validation errors in controller logs. + +3. Ensure no external processes are modifying resources. + +## Related Documentation + +- [Team Operator Overview](../README.md) +- [Adding Config Options](adding-config-options.md) - For contributors extending Site configuration diff --git a/docs/guides/troubleshooting.md b/docs/guides/troubleshooting.md new file mode 100644 index 00000000..d8c3bc77 --- /dev/null +++ b/docs/guides/troubleshooting.md @@ -0,0 +1,905 @@ +# Team Operator Troubleshooting Guide + +This comprehensive guide covers common issues and their solutions when running Posit Team products via the Team Operator. + +## Table of Contents + +1. [General Debugging](#general-debugging) +2. [Operator Issues](#operator-issues) +3. [Site Reconciliation Issues](#site-reconciliation-issues) +4. [Database Issues](#database-issues) +5. [Product-Specific Issues](#product-specific-issues) + - [Connect Issues](#connect-issues) + - [Workbench Issues](#workbench-issues) + - [Package Manager Issues](#package-manager-issues) + - [Chronicle Issues](#chronicle-issues) +6. [Networking Issues](#networking-issues) +7. [Storage Issues](#storage-issues) +8. [Authentication Issues](#authentication-issues) +9. [Common Error Messages](#common-error-messages) + +--- + +## General Debugging + +### Checking Operator Logs + +The operator logs are your first stop for diagnosing issues: + +```bash +# View operator logs +kubectl logs -n posit-team-system deployment/team-operator-controller-manager + +# Follow logs in real-time +kubectl logs -n posit-team-system deployment/team-operator-controller-manager -f + +# View logs with timestamps +kubectl logs -n posit-team-system deployment/team-operator-controller-manager --timestamps + +# View last 100 lines +kubectl logs -n posit-team-system deployment/team-operator-controller-manager --tail=100 +``` + +### Viewing CR Status and Conditions + +Check the status of your Custom Resources: + +```bash +# View Site status +kubectl describe site -n posit-team + +# View Connect status +kubectl describe connect -n posit-team + +# View Workbench status +kubectl describe workbench -n posit-team + +# View Package Manager status +kubectl describe packagemanager -n posit-team + +# View PostgresDatabase status +kubectl describe postgresdatabase -n posit-team +``` + +### Common kubectl Commands for Debugging + +```bash +# List all Posit Team resources +kubectl get sites,connects,workbenches,packagemanagers,chronicles -n posit-team + +# List all pods with labels +kubectl get pods -n posit-team --show-labels + +# View pod events +kubectl get events -n posit-team --sort-by='.lastTimestamp' + +# Get all resources managed by the operator +kubectl get all -n posit-team -l app.kubernetes.io/managed-by=team-operator + +# View ConfigMaps +kubectl get configmaps -n posit-team + +# View Secrets (names only) +kubectl get secrets -n posit-team + +# View PVCs +kubectl get pvc -n posit-team + +# View Ingresses +kubectl get ingress -n posit-team +``` + +### Enabling Debug Mode + +Enable debug mode at the Site level for verbose logging: + +```yaml +spec: + debug: true +``` + +This enables debug logging for all products deployed by the Site. + +--- + +## Operator Issues + +### Operator Not Starting + +**Symptoms:** +- Team operator pod not running +- CrashLoopBackOff status on operator pod + +**Diagnosis:** +```bash +# Check operator pod status +kubectl get pods -n posit-team-system + +# View operator logs +kubectl logs -n posit-team-system deployment/team-operator-controller-manager --previous + +# Describe the operator pod +kubectl describe pod -n posit-team-system -l control-plane=controller-manager +``` + +**Common Causes and Solutions:** + +| Cause | Solution | +|-------|----------| +| CRD not installed | Run `kubectl apply -f config/crd/bases/` or reinstall via Helm | +| Image pull error | Verify image exists and pull secrets are configured | +| Insufficient resources | Increase memory/CPU limits for operator deployment | +| Invalid configuration | Check operator ConfigMap for syntax errors | + +### Permission Errors (RBAC) + +**Symptoms:** +- Error messages containing `forbidden` or `unauthorized` +- Resources not being created despite no errors in Site spec + +**Diagnosis:** +```bash +# Check operator service account +kubectl get serviceaccount -n posit-team-system + +# View operator RBAC +kubectl get clusterrole team-operator-manager-role -o yaml +kubectl get rolebinding -n posit-team -l app.kubernetes.io/managed-by=team-operator +``` + +**Common Causes and Solutions:** + +| Error Message | Solution | +|---------------|----------| +| `cannot create resource "deployments"` | Ensure RBAC includes apps/deployments verb | +| `cannot create resource "ingresses"` | Add networking.k8s.io/ingresses to RBAC | +| `cannot patch resource "secrets"` | Verify secrets verbs include patch | +| `object not managed by team-operator` | Resource was created outside operator; delete and let operator recreate | + +### Leader Election Issues + +**Symptoms:** +- Multiple operator instances running but not reconciling +- Operator logs showing leader election failures + +**Diagnosis:** +```bash +# Check for leader election lease +kubectl get lease -n posit-team-system + +# View leader election status in logs +kubectl logs -n posit-team-system deployment/team-operator-controller-manager | grep -i "leader" +``` + +**Solutions:** +- Ensure only one operator instance is running (check replicas) +- Delete the leader election lease to force re-election: + ```bash + kubectl delete lease team-operator-leader-election -n posit-team-system + ``` + +### CRD Installation Problems + +**Symptoms:** +- `no matches for kind "Site" in version "core.posit.team/v1beta1"` +- Resources not recognized by kubectl + +**Diagnosis:** +```bash +# List installed CRDs +kubectl get crd | grep posit + +# Verify CRD details +kubectl describe crd sites.core.posit.team +``` + +**Solutions:** +- Install CRDs manually: + ```bash + kubectl apply -f config/crd/bases/ + ``` +- Reinstall via Helm with CRD installation enabled: + ```bash + helm upgrade --install team-operator ./dist/chart --set installCRDs=true + ``` + +--- + +## Site Reconciliation Issues + +### Site Stuck in Reconciling + +**Symptoms:** +- Site CR exists but products not being created +- Operator continuously reconciling without progress + +**Diagnosis:** +```bash +# Check Site events +kubectl describe site -n posit-team + +# View operator logs for the site +kubectl logs -n posit-team-system deployment/team-operator-controller-manager | grep +``` + +**Common Causes:** + +| Cause | Symptom | Solution | +|-------|---------|----------| +| Invalid domain | Error in logs about domain parsing | Ensure `spec.domain` is valid DNS name | +| Missing secrets | Secret not found errors | Create required secrets before Site | +| Database unreachable | Connection timeout errors | Verify database connectivity and credentials | +| Volume provisioning failed | PVC pending | Check storage class and provisioner | + +### Products Not Being Created + +**Symptoms:** +- Site created but no Connect/Workbench/PackageManager CRs + +**Diagnosis:** +```bash +# Check if product CRs exist +kubectl get connects,workbenches,packagemanagers -n posit-team + +# Check operator logs for specific product +kubectl logs -n posit-team-system deployment/team-operator-controller-manager | grep -i "reconcile" +``` + +**Solutions:** +- Verify product is enabled in Site spec (products are created by default) +- Check for validation errors in operator logs +- Ensure all required fields are populated + +### Status Conditions Not Updating + +**Symptoms:** +- Product status shows `ready: false` despite pods running + +**Diagnosis:** +```bash +# Check product status +kubectl get connect -n posit-team -o jsonpath='{.status}' + +# Check pod readiness +kubectl get pods -n posit-team -l app.kubernetes.io/name=connect +``` + +**Solutions:** +- Status updates occur after successful reconciliation +- Check readiness probes are passing on pods +- Operator may need to be restarted if stuck + +--- + +## Database Issues + +### PostgresDatabase Not Ready + +**Symptoms:** +- PostgresDatabase CR exists but `ready` status is false +- Product pods failing to start due to database errors + +**Diagnosis:** +```bash +# Check PostgresDatabase status +kubectl describe postgresdatabase -n posit-team + +# Check operator logs for database operations +kubectl logs -n posit-team-system deployment/team-operator-controller-manager | grep -i "database\|postgres" +``` + +**Common Causes:** + +| Cause | Solution | +|-------|----------| +| Main database unreachable | Verify `mainDatabaseCredentialSecret` points to valid credentials | +| Invalid database URL | Check database host, port, and SSL mode | +| Role creation failed | Ensure main database user has CREATE ROLE permission | +| Database creation failed | Ensure main database user has CREATE DATABASE permission | + +### Connection Failures + +**Symptoms:** +- `error determining database url` in operator logs +- `postgres database no main database url found` + +**Diagnosis:** +```bash +# Check database credential secret +kubectl get secret -n posit-team | grep -i db + +# View secret contents (base64 encoded) +kubectl get secret -n posit-team -o yaml +``` + +**Solutions:** + +1. **Verify secret exists with correct keys:** + ```bash + kubectl get secret -n posit-team -o jsonpath='{.data}' | jq + ``` + +2. **Test database connectivity from a pod:** + ```bash + kubectl run -it --rm psql-test --image=postgres:15 --restart=Never -- \ + psql "postgresql://:@/?sslmode=require" + ``` + +3. **Check SSL mode configuration:** + - Ensure `sslmode` matches your database requirements (require, verify-full, etc.) + +### Schema Creation Errors + +**Symptoms:** +- `error with alter schema` or `error creating schema` in logs +- Product pod starts but database operations fail + +**Diagnosis:** +```bash +# Check operator logs for schema errors +kubectl logs -n posit-team-system deployment/team-operator-controller-manager | grep -i "schema" +``` + +**Common Error Codes:** + +| PostgreSQL Error Code | Meaning | Solution | +|-----------------------|---------|----------| +| 3F000 | Schema does not exist | Schema will be created automatically | +| 42501 | Insufficient privileges | Grant schema permissions to user | +| 42P04 | Duplicate database | Database already exists (usually OK) | + +### Credential Issues + +**Symptoms:** +- `postgres database no spec url credentials found` +- `postgres database mismatched db host` + +**Solutions:** + +1. **For AWS Secrets Manager:** + ```yaml + spec: + secret: + type: "aws" + vaultName: "your-vault-name" + mainDatabaseCredentialSecret: + type: "aws" + vaultName: "rds!db-identifier" + ``` + +2. **For Kubernetes Secrets:** + ```yaml + apiVersion: v1 + kind: Secret + metadata: + name: site-secrets + stringData: + pub-db-password: "" + dev-db-password: "" + pkg-db-password: "" + ``` + +--- + +## Product-Specific Issues + +### Connect Issues + +#### Connect Not Starting + +**Symptoms:** +- Connect pod in CrashLoopBackOff or Error state +- Container failing readiness probes + +**Diagnosis:** +```bash +# Check Connect pod status +kubectl get pods -n posit-team -l app.kubernetes.io/name=connect + +# View Connect logs +kubectl logs -n posit-team deploy/-connect -c connect + +# Check events +kubectl describe pod -n posit-team -l app.kubernetes.io/name=connect +``` + +**Common Causes:** + +| Symptom | Cause | Solution | +|---------|-------|----------| +| License error in logs | Invalid or missing license | Verify license secret and key | +| Database connection error | Database unreachable or wrong credentials | Check database configuration | +| Permission denied on volume | PVC mounted with wrong permissions | Check storage class and PVC settings | +| Config file not found | ConfigMap not mounted | Verify ConfigMap exists | + +#### Connect Sessions Not Running + +**Symptoms:** +- Content execution fails +- Session jobs not created or failing + +**Diagnosis:** +```bash +# List session jobs +kubectl get jobs -n posit-team -l posit.team/component=connect-session + +# Check session pod logs +kubectl logs -n posit-team job/ + +# View job events +kubectl describe job -n posit-team +``` + +**Common Causes:** + +| Cause | Solution | +|-------|----------| +| Init container failed | Check session image is accessible | +| Runtime image not found | Verify runtime.yaml configuration | +| Service account missing | Check session service account exists | +| RBAC insufficient | Verify session RBAC permissions | + +### Workbench Issues + +#### Workbench Sessions Failing + +**Symptoms:** +- User sessions not starting +- IDE not loading after login + +**Diagnosis:** +```bash +# List session pods +kubectl get pods -n posit-team -l posit.team/component=workbench-session + +# View Workbench launcher logs +kubectl logs -n posit-team deploy/-workbench -c workbench | grep -i launcher +``` + +**Common Causes:** + +| Cause | Solution | +|-------|----------| +| Launcher not starting | Check launcher configuration in ConfigMap | +| Session image unavailable | Verify default session image is accessible | +| Volume mount issues | Check PVC and storage class | +| Databricks config error | Move Databricks config from `Config` to `SecretConfig` | + +**Databricks Configuration Error:** +If you see `the Databricks configuration should be in SecretConfig, not Config`, update your configuration: + +```yaml +# Wrong +spec: + workbench: + config: + databricks: {...} # DO NOT use this + +# Correct - configured at Site level +spec: + workbench: + databricks: + myWorkspace: + name: "My Workspace" + url: "https://workspace.cloud.databricks.com" + clientId: "client-id" +``` + +#### Workbench HTML Login Page Too Large + +**Symptoms:** +- Error about `authLoginPageHtml content exceeds maximum size` + +**Solution:** +The custom login HTML is limited to 64KB. Reduce the HTML content size or externalize assets. + +### Package Manager Issues + +#### Package Manager Build Issues + +**Symptoms:** +- Package builds failing +- Git sources not accessible + +**Diagnosis:** +```bash +# Check Package Manager logs +kubectl logs -n posit-team deploy/-packagemanager + +# Check for SSH key issues +kubectl get secretproviderclass -n posit-team | grep ssh +``` + +**Common Causes:** + +| Cause | Solution | +|-------|----------| +| SSH keys not mounted | Verify GitSSHKeys configuration | +| S3 bucket inaccessible | Check IAM role and bucket permissions | +| Azure Files PVC pending | Verify storage class and share size | + +**Azure Files Configuration Error:** +``` +Invalid AzureFiles configuration. Missing StorageClassName or invalid ShareSizeGiB (minimum 100 GiB). +``` + +**Solution:** +```yaml +spec: + packageManager: + azureFiles: + storageClassName: "azurefile-csi" + shareSizeGiB: 100 # Minimum 100 GiB required +``` + +### Chronicle Issues + +#### Chronicle Sidecar Problems + +**Symptoms:** +- Metrics not being collected +- Chronicle container not running in product pods + +**Diagnosis:** +```bash +# Check if Chronicle sidecar exists +kubectl get pods -n posit-team -l app.kubernetes.io/name=connect -o jsonpath='{.items[*].spec.containers[*].name}' + +# View Chronicle sidecar logs +kubectl logs -n posit-team deploy/-connect -c chronicle +``` + +**Common Causes:** + +| Cause | Solution | +|-------|----------| +| `agentImage` not set | Configure `spec.chronicle.agentImage` at Site level | +| Chronicle server unreachable | Check Chronicle StatefulSet is running | +| Network policy blocking | Verify network policies allow Chronicle traffic | + +--- + +## Networking Issues + +### Ingress Not Working + +**Symptoms:** +- Product URLs return 404 or 502 +- Cannot access products externally + +**Diagnosis:** +```bash +# Check Ingress resources +kubectl get ingress -n posit-team + +# Describe Ingress +kubectl describe ingress -connect -n posit-team + +# Check Ingress controller logs +kubectl logs -n deploy/ +``` + +**Common Causes:** + +| Cause | Solution | +|-------|----------| +| Wrong IngressClass | Set `spec.ingressClass` to match your controller | +| TLS certificate missing | Configure TLS in Ingress annotations | +| Backend service unavailable | Verify product service and pods are running | +| Middleware error | Check Traefik middleware configuration | + +### TLS/Certificate Problems + +**Symptoms:** +- Certificate errors in browser +- HTTPS not working + +**Solutions:** + +1. **Check certificate secret:** + ```bash + kubectl get secret -n posit-team | grep tls + ``` + +2. **Verify cert-manager (if used):** + ```bash + kubectl get certificate -n posit-team + kubectl describe certificate -n posit-team + ``` + +3. **Configure TLS in Ingress:** + ```yaml + spec: + ingressAnnotations: + cert-manager.io/cluster-issuer: "letsencrypt-prod" + ``` + +### Service Discovery Issues + +**Symptoms:** +- Products cannot communicate with each other +- Chronicle cannot reach product metrics endpoints + +**Diagnosis:** +```bash +# Test DNS resolution +kubectl run -it --rm dns-test --image=busybox --restart=Never -- nslookup ..svc.cluster.local + +# Test service connectivity +kubectl run -it --rm curl-test --image=curlimages/curl --restart=Never -- curl http://..svc.cluster.local +``` + +**Solutions:** +- Ensure services are in the same namespace +- Check network policies allow inter-service communication +- Verify service selectors match pod labels + +--- + +## Storage Issues + +### PVC Not Binding + +**Symptoms:** +- PVC stuck in `Pending` state +- Product pods failing to start due to volume issues + +**Diagnosis:** +```bash +# Check PVC status +kubectl get pvc -n posit-team + +# Describe pending PVC +kubectl describe pvc -n posit-team + +# Check PV availability +kubectl get pv +``` + +**Common Causes:** + +| Cause | Solution | +|-------|----------| +| Storage class not found | Create storage class or use existing one | +| No matching PV | Check storage provisioner is running | +| Access mode mismatch | Verify PVC access modes match PV | +| Capacity insufficient | Increase PV size or reduce request | + +### Volume Mount Failures + +**Symptoms:** +- `MountVolume.SetUp failed` +- Pod stuck in `ContainerCreating` + +**Diagnosis:** +```bash +# Check pod events +kubectl describe pod -n posit-team | grep -A10 Events + +# Check CSI driver status (if using CSI) +kubectl get pods -n kube-system | grep csi +``` + +**Common Causes:** + +| Cause | Solution | +|-------|----------| +| NFS server unreachable | Verify NFS server connectivity | +| FSx volume not found | Check FSx volume ID and DNS name | +| CSI driver not running | Restart CSI driver pods | +| Azure Files secret missing | Create storage account credentials secret | + +### Permission Issues + +**Symptoms:** +- `permission denied` errors in pod logs +- Product cannot write to data directory + +**Diagnosis:** +```bash +# Check file ownership in pod +kubectl exec -it -n posit-team -- ls -la /var/lib/ + +# Check security context +kubectl get pod -n posit-team -o jsonpath='{.spec.securityContext}' +``` + +**Solutions:** + +1. **Set FSGroup in security context:** + ```yaml + spec: + securityContext: + fsGroup: 999 + ``` + +2. **Use init container to fix permissions:** + ```yaml + initContainers: + - name: fix-permissions + image: busybox + command: ["sh", "-c", "chown -R 999:999 /data"] + volumeMounts: + - name: data + mountPath: /data + ``` + +--- + +## Authentication Issues + +### OIDC Callback Errors + +**Symptoms:** +- `Invalid redirect URI` error from IdP +- Login redirects fail + +**Diagnosis:** +```bash +# Check Connect logs for OAuth errors +kubectl logs -n posit-team deploy/-connect -c connect | grep -i oauth + +# Verify callback URL in config +kubectl get configmap -connect -n posit-team -o yaml | grep -i callback +``` + +**Solutions:** + +1. **Verify redirect URIs in IdP:** + - Connect: `https:///__login__/callback` + - Workbench: `https:///oidc/callback` + +2. **Check client ID and issuer:** + ```yaml + spec: + connect: + auth: + type: "oidc" + clientId: "your-client-id" # Must match IdP + issuer: "https://your-idp.com" # Must be exact + ``` + +3. **Enable debug logging:** + ```yaml + spec: + debug: true + ``` + +### SAML Metadata Issues + +**Symptoms:** +- `SAML authentication requires a metadata URL to be specified` +- SAML metadata URL not accessible + +**Diagnosis:** +```bash +# Test metadata URL accessibility +kubectl run -it --rm curl-test --image=curlimages/curl --restart=Never -- \ + curl -v +``` + +**Solutions:** + +1. **Ensure metadata URL is correct:** + ```yaml + spec: + connect: + auth: + type: "saml" + samlMetadataUrl: "https://idp.example.com/saml/metadata" + ``` + +2. **Check network access from cluster:** + - Verify DNS resolution works + - Check firewall rules allow outbound HTTPS + +**Configuration Conflict Error:** +``` +SAML IdPAttributeProfile cannot be specified together with individual SAML attribute mappings +``` + +**Solution:** Use either `samlIdPAttributeProfile` OR individual attributes, not both: +```yaml +# Option 1: Profile +samlIdPAttributeProfile: "azure" + +# Option 2: Individual mappings (mutually exclusive with profile) +# samlUsernameAttribute: "..." +# samlEmailAttribute: "..." +``` + +### Token/Claim Problems + +**Symptoms:** +- Users not getting correct roles +- Groups not syncing from IdP + +**Diagnosis:** +```bash +# Enable debug logging and check logs +kubectl logs -n posit-team deploy/-connect -c connect | grep -i "claim\|group\|role" +``` + +**Solutions:** + +1. **Verify claims configuration:** + ```yaml + spec: + connect: + auth: + usernameClaim: "preferred_username" + emailClaim: "email" + groupsClaim: "groups" + ``` + +2. **Check scopes include groups:** + ```yaml + scopes: + - "openid" + - "profile" + - "email" + - "groups" + ``` + +3. **Disable groups claim if IdP doesn't support it:** + ```yaml + disableGroupsClaim: true + ``` + +4. **Debug JWT tokens:** + - Use [jwt.io](https://jwt.io) to inspect tokens + - Verify expected claims are present + +--- + +## Common Error Messages + +| Error Message | Cause | Solution | +|---------------|-------|----------| +| `Site not found; cleaning up resources` | Site CR was deleted | Expected during cleanup; ignore | +| `error determining database url` | Database credentials not found | Check `mainDatabaseCredentialSecret` configuration | +| `postgres database no main database url found` | Main database URL not configured | Configure database secret or check workload secret | +| `postgres database mismatched db host` | Product database host differs from main | Ensure all products use same database host | +| `postgres database no spec url credentials found` | Database password missing | Add password to secret or check secret key name | +| `SAML authentication requires a metadata URL` | Missing SAML metadata URL | Set `samlMetadataUrl` in auth config | +| `SAML IdPAttributeProfile cannot be specified together...` | Conflicting SAML config | Use profile OR individual attributes, not both | +| `object not managed by team-operator` | Resource created outside operator | Delete resource and let operator recreate | +| `mutateFn must set managed-by label` | Internal operator error | Report as bug; check operator version | +| `Invalid AzureFiles configuration` | Missing Azure Files settings | Ensure `storageClassName` set and `shareSizeGiB >= 100` | +| `the Databricks configuration should be in SecretConfig` | Deprecated Databricks location | Move Databricks config to Site `spec.workbench.databricks` | +| `authLoginPageHtml content exceeds maximum size` | Custom HTML too large | Reduce HTML to under 64KB | +| `failed to generate random bytes` | System entropy issue | Check `/dev/urandom` availability | +| `error provisioning SecretProviderClass` | CSI secrets driver issue | Verify secrets-store CSI driver is installed | + +--- + +## Getting Help + +If you continue to experience issues: + +1. **Collect diagnostic information:** + ```bash + kubectl get all -n posit-team -o yaml > posit-team-resources.yaml + kubectl logs -n posit-team-system deployment/team-operator-controller-manager > operator.log + kubectl get events -n posit-team --sort-by='.lastTimestamp' > events.txt + ``` + +2. **Check Posit documentation:** + - [Connect Admin Guide](https://docs.posit.co/connect/admin/) + - [Workbench Admin Guide](https://docs.posit.co/ide/server-pro/admin/) + - [Package Manager Admin Guide](https://docs.posit.co/rspm/admin/) + +3. **Contact Posit Support:** + - Include diagnostic files + - Describe the issue and steps to reproduce + - Include operator and product versions + +--- + +## Related Documentation + +- [Site Management Guide](product-team-site-management.md) - Overall Site configuration +- [Authentication Setup](authentication-setup.md) - Detailed auth configuration +- [Connect Configuration](connect-configuration.md) - Connect-specific settings +- [Workbench Configuration](workbench-configuration.md) - Workbench-specific settings +- [Package Manager Configuration](packagemanager-configuration.md) - Package Manager settings diff --git a/docs/guides/upgrading.md b/docs/guides/upgrading.md new file mode 100644 index 00000000..1ccb73e7 --- /dev/null +++ b/docs/guides/upgrading.md @@ -0,0 +1,503 @@ +# Upgrading Team Operator + +This guide provides comprehensive instructions for upgrading the Team Operator, including pre-upgrade preparation, upgrade procedures, version-specific migrations, and troubleshooting. + +## Before Upgrading + +### Backup Procedures + +Before performing any upgrade, create backups of critical resources: + +#### 1. Backup Custom Resources + +```bash +# Backup all Site resources +kubectl get sites -A -o yaml > sites-backup.yaml + +# Backup all product resources +kubectl get workbenches -A -o yaml > workbenches-backup.yaml +kubectl get connects -A -o yaml > connects-backup.yaml +kubectl get packagemanagers -A -o yaml > packagemanagers-backup.yaml +kubectl get chronicles -A -o yaml > chronicles-backup.yaml +kubectl get flightdecks -A -o yaml > flightdecks-backup.yaml +kubectl get postgresdatabases -A -o yaml > postgresdatabases-backup.yaml + +# Backup all Posit Team resources at once +kubectl get sites,workbenches,connects,packagemanagers,chronicles,flightdecks,postgresdatabases -A -o yaml > posit-team-resources-backup.yaml +``` + +#### 2. Backup Secrets + +```bash +# Backup secrets in the Posit Team namespace +kubectl get secrets -n posit-team -o yaml > secrets-backup.yaml + +# For sensitive backups, consider encrypting +kubectl get secrets -n posit-team -o yaml | gpg -c > secrets-backup.yaml.gpg +``` + +#### 3. Backup Databases + +If using external databases for products (Connect, Workbench, Package Manager), ensure you have database backups before upgrading. The operator manages `PostgresDatabase` resources that may be affected by schema changes. + +```bash +# List managed databases +kubectl get postgresdatabases -A + +# For each database, create a backup using your database backup procedures +# Example for PostgreSQL: +# pg_dump -h -U -d > database-backup.sql +``` + +### Check Current Version + +Verify your current installation: + +```bash +# Check Helm release version +helm list -n posit-team-system + +# Check operator deployment image +kubectl get deployment team-operator-controller-manager -n posit-team-system -o jsonpath='{.spec.template.spec.containers[0].image}' + +# Check CRD versions +kubectl get crds | grep posit.team +``` + +### Review Changelog + +Always review the [CHANGELOG.md](../../CHANGELOG.md) for breaking changes between your current version and the target version. Pay special attention to: + +- Breaking changes that require configuration updates +- Deprecated fields that need migration +- New required fields + +### Test in Non-Production + +**Critical**: Always test upgrades in a non-production environment first: + +1. Create a staging cluster or namespace that mirrors production +2. Apply the same Site configuration +3. Perform the upgrade +4. Verify all products function correctly +5. Test any automated integrations + +## Upgrade Methods + +### Helm Upgrade Procedure + +The recommended method for upgrading is via Helm: + +#### Standard Upgrade + +```bash +# Update Helm repository (if using external repo) +helm repo update + +# View changes before applying +helm diff upgrade team-operator ./dist/chart \ + --namespace posit-team-system \ + --values my-values.yaml + +# Perform the upgrade +helm upgrade team-operator ./dist/chart \ + --namespace posit-team-system \ + --values my-values.yaml +``` + +#### Upgrade with Specific Version + +```bash +helm upgrade team-operator ./dist/chart \ + --namespace posit-team-system \ + --set controllerManager.container.image.tag=v1.2.0 \ + --values my-values.yaml +``` + +#### Upgrade with CRD Updates + +CRDs are automatically updated during Helm upgrade when `crd.enable: true` (default). However, if you've disabled CRD management: + +```bash +# Manually apply CRD updates first +kubectl apply -f dist/chart/templates/crd/ + +# Then upgrade the operator +helm upgrade team-operator ./dist/chart \ + --namespace posit-team-system \ + --values my-values.yaml +``` + +### Kustomize Upgrade Procedure + +If using Kustomize for deployment: + +```bash +# Update the kustomization.yaml to reference the new version +# Then apply: +kubectl apply -k config/default + +# Or for specific overlays: +kubectl apply -k config/overlays/production +``` + +### CRD Upgrade Considerations + +CRDs require special attention during upgrades: + +1. **CRDs Persist Across Helm Uninstall**: By default (`crd.keep: true`), CRDs remain in the cluster even after `helm uninstall`. This prevents accidental data loss but means CRDs must be managed carefully. + +2. **CRD Version Compatibility**: The operator manages CRDs at API version `core.posit.team/v1beta1` (and `keycloak.k8s.keycloak.org/v2alpha1` for Keycloak). Ensure your CRs are compatible with the CRD schema in the new version. + +3. **Schema Validation**: After CRD updates, existing CRs are validated against the new schema. Invalid CRs may prevent proper reconciliation. + +```bash +# Verify CRDs are updated +kubectl get crds sites.core.posit.team -o jsonpath='{.metadata.resourceVersion}' + +# Check for validation issues +kubectl get sites -A -o json | jq '.items[] | select(.status.conditions[]?.reason == "InvalidSpec")' +``` + +## Version-Specific Migrations + +### v1.2.0 + +**New Features:** +- Added `CreateOrUpdateResource` helper for improved reconciliation +- Post-mutation label validation for Traefik resources + +**Deprecations:** +- `BasicCreateOrUpdate` function is deprecated in favor of `CreateOrUpdateResource` + +No configuration changes required for users. + +### v1.1.0 + +**New Features:** +- Added `tolerations` and `nodeSelector` support for controller manager + +**Migration:** +If you were using workarounds for pod scheduling, update your values: + +```yaml +controllerManager: + tolerations: + - key: "node-role.kubernetes.io/control-plane" + operator: "Exists" + effect: "NoSchedule" + nodeSelector: + kubernetes.io/os: linux +``` + +### v1.0.4 + +**Bug Fixes:** +- Removed `kustomize-adopt` hook that could fail on tainted clusters + +No migration required. + +### v1.0.0 + +**Initial Release:** +- Migration from `rstudio/ptd` repository + +If upgrading from the legacy `rstudio/ptd` operator, contact Posit support for migration assistance. + +### Known Deprecated Fields + +The following fields are deprecated and will be removed in future versions: + +| CRD | Field | Replacement | Notes | +|-----|-------|-------------|-------| +| Site | `spec.secretType` | `spec.secret.type` | Use the new Secret configuration block | +| Workbench | `spec.config.databricks.conf` | `spec.secretConfig.databricks` | Databricks config moved to SecretConfig | +| PackageManager | `spec.config.CRAN` | N/A | PackageManagerCRANConfig is deprecated | + +**Migration Example - Databricks Configuration:** + +Before (deprecated): +```yaml +apiVersion: core.posit.team/v1beta1 +kind: Workbench +spec: + config: + databricks.conf: + workspace1: + name: "My Workspace" + url: "https://workspace.cloud.databricks.com" +``` + +After (recommended): +```yaml +apiVersion: core.posit.team/v1beta1 +kind: Site +spec: + workbench: + databricks: + workspace1: + name: "My Workspace" + url: "https://workspace.cloud.databricks.com" + clientId: "" +``` + +### Key Migration + +The operator automatically migrates legacy UUID-format and binary-format encryption keys to the new hex256 format. This migration happens transparently during reconciliation. Monitor logs for migration messages: + +```bash +kubectl logs -n posit-team-system deployment/team-operator-controller-manager | grep -i "migrating" +``` + +## Post-Upgrade Verification + +### 1. Check Operator Health + +```bash +# Verify the operator pod is running +kubectl get pods -n posit-team-system -l control-plane=controller-manager + +# Check operator logs for errors +kubectl logs -n posit-team-system deployment/team-operator-controller-manager --tail=100 + +# Verify health endpoints +kubectl exec -n posit-team-system deployment/team-operator-controller-manager -- wget -qO- http://localhost:8081/healthz +kubectl exec -n posit-team-system deployment/team-operator-controller-manager -- wget -qO- http://localhost:8081/readyz +``` + +### 2. Verify CRD Versions + +```bash +# List all Posit Team CRDs with versions +kubectl get crds -o custom-columns=NAME:.metadata.name,VERSION:.spec.versions[0].name | grep posit.team + +# Expected output: +# chronicles.core.posit.team v1beta1 +# connects.core.posit.team v1beta1 +# flightdecks.core.posit.team v1beta1 +# packagemanagers.core.posit.team v1beta1 +# postgresdatabases.core.posit.team v1beta1 +# sites.core.posit.team v1beta1 +# workbenches.core.posit.team v1beta1 +``` + +### 3. Test Product Functionality + +```bash +# Check all Sites are reconciling +kubectl get sites -A + +# Check individual product resources +kubectl get workbenches -A +kubectl get connects -A +kubectl get packagemanagers -A + +# Verify deployments are healthy +kubectl get deployments -n posit-team + +# Test product endpoints +curl -I https://workbench. +curl -I https://connect. +curl -I https://packagemanager. +``` + +### 4. Monitor for Issues + +Watch operator logs for the first 15-30 minutes after upgrade: + +```bash +kubectl logs -n posit-team-system deployment/team-operator-controller-manager -f +``` + +Look for: +- Reconciliation errors +- CRD validation failures +- Database connection issues +- Certificate/TLS errors + +## Rollback Procedures + +### Helm Rollback + +If issues occur after upgrade, rollback to the previous release: + +```bash +# List release history +helm history team-operator -n posit-team-system + +# Rollback to previous revision +helm rollback team-operator -n posit-team-system + +# Example: rollback to revision 2 +helm rollback team-operator 2 -n posit-team-system +``` + +### CRD Considerations During Rollback + +**Important**: CRDs are not automatically rolled back with Helm rollback due to the `keep` annotation. If the new CRDs added fields, older operator versions may still work but won't recognize new fields. + +If CRD rollback is necessary: + +```bash +# Save current CRs +kubectl get sites,workbenches,connects,packagemanagers -A -o yaml > pre-rollback-backup.yaml + +# Apply old CRDs (from your backup or previous chart version) +kubectl apply -f old-crds/ + +# Verify CRs are still valid +kubectl get sites -A +``` + +### Data Implications + +Consider these data implications during rollback: + +1. **Database Schema Changes**: If the upgrade included database schema changes, rollback may require database schema rollback as well. + +2. **Secret Format Changes**: The operator's automatic key migration is one-way. Rolled-back operators will still work with migrated keys. + +3. **Configuration Changes**: CRs modified to use new fields will need manual cleanup if rolling back to a version that doesn't support those fields. + +## Zero-Downtime Upgrades + +### Best Practices for Production Upgrades + +1. **Use Maintenance Windows**: Schedule upgrades during low-traffic periods. + +2. **Rolling Update Strategy**: The operator uses a single replica by default. For zero-downtime during operator restarts: + - Products continue running even if the operator is briefly unavailable + - No reconciliation occurs during operator restart (typically < 30 seconds) + +3. **Staged Rollout**: + ```bash + # First, upgrade operator in staging + helm upgrade team-operator ./dist/chart -n posit-team-system-staging + + # Verify staging works + # Then upgrade production + helm upgrade team-operator ./dist/chart -n posit-team-system + ``` + +4. **Health Check Considerations**: + - Liveness probe: `/healthz` (port 8081) + - Readiness probe: `/readyz` (port 8081) + - These ensure the operator is ready before receiving reconciliation requests + +5. **Leader Election**: If running multiple operator replicas (not typical), leader election ensures only one active reconciler: + ```yaml + controllerManager: + container: + args: + - "--leader-elect" + ``` + +### Product Availability During Upgrades + +- **Workbench**: Sessions continue running; new sessions may be delayed +- **Connect**: Published content remains accessible +- **Package Manager**: Package downloads continue working +- **Flightdeck**: Landing page remains accessible + +Only reconciliation (applying changes) is affected during operator restart. + +## Troubleshooting Upgrades + +### Common Upgrade Issues + +#### CRD Validation Failures + +**Symptom**: CRs fail validation after CRD update + +```bash +# Check for invalid CRs +kubectl get sites -A 2>&1 | grep -i error + +# View validation errors +kubectl describe site -n +``` + +**Solution**: Update CRs to match new schema requirements or remove deprecated fields. + +#### Webhook Issues + +**Symptom**: Admission webhook errors after upgrade + +```bash +# Check webhook configuration +kubectl get validatingwebhookconfigurations | grep posit +kubectl get mutatingwebhookconfigurations | grep posit + +# If webhooks are causing issues and you need to disable temporarily +kubectl delete validatingwebhookconfigurations +``` + +**Solution**: Ensure cert-manager is properly configured if webhooks are enabled. + +#### Operator Pod CrashLoopBackOff + +**Symptom**: Operator pod fails to start + +```bash +# Check pod events +kubectl describe pod -n posit-team-system -l control-plane=controller-manager + +# Check logs +kubectl logs -n posit-team-system -l control-plane=controller-manager --previous +``` + +**Common Causes**: +- Missing RBAC permissions for new resources +- Invalid environment variables +- Certificate issues + +**Solution**: Check Helm values and ensure all required permissions are granted. + +#### Reconciliation Loops + +**Symptom**: Operator continuously reconciles resources without reaching stable state + +```bash +# Watch operator logs for repeated reconciliation +kubectl logs -n posit-team-system deployment/team-operator-controller-manager -f | grep "Reconciling" +``` + +**Solution**: Check for label/annotation conflicts or resources being modified by multiple controllers. + +#### Database Connection Errors + +**Symptom**: Products fail to start due to database errors + +```bash +# Check database connectivity +kubectl logs -n posit-team | grep -i database +``` + +**Solution**: Verify database credentials in secrets and ensure network policies allow database access. + +### Getting Help + +If you encounter issues not covered in this guide: + +1. **Check Operator Logs**: + ```bash + kubectl logs -n posit-team-system deployment/team-operator-controller-manager --tail=200 + ``` + +2. **Review GitHub Issues**: Check [existing issues](https://github.com/posit-dev/team-operator/issues) + +3. **Contact Support**: [Contact Posit](https://posit.co/schedule-a-call/) for enterprise support + +4. **Collect Diagnostic Information**: + ```bash + # Create a diagnostic bundle + kubectl get all -n posit-team-system -o yaml > diag-system.yaml + kubectl get sites,workbenches,connects,packagemanagers -A -o yaml > diag-resources.yaml + kubectl logs -n posit-team-system deployment/team-operator-controller-manager > diag-logs.txt + ``` + +## Related Documentation + +- [Helm Chart README](../../dist/chart/README.md) - Installation and configuration reference +- [Site Management Guide](./product-team-site-management.md) - Managing Posit Team sites +- [CHANGELOG](../../CHANGELOG.md) - Version history and release notes diff --git a/docs/guides/workbench-configuration.md b/docs/guides/workbench-configuration.md new file mode 100644 index 00000000..9915e0e9 --- /dev/null +++ b/docs/guides/workbench-configuration.md @@ -0,0 +1,967 @@ +# Workbench Configuration Guide + +This guide covers comprehensive configuration of Posit Workbench in Team Operator, including all available options, authentication, off-host execution, IDE settings, data integrations, and advanced features. + +## Overview + +Posit Workbench provides an interactive development environment for data science teams. In Team Operator, Workbench runs on Kubernetes with off-host execution enabled by default, meaning user sessions run as separate Kubernetes Jobs rather than on the Workbench server pod itself. + +When configured via a Site resource, Workbench: +- Uses the Kubernetes Job Launcher for session management +- Supports multiple IDEs (RStudio, VS Code, Positron, Jupyter) +- Integrates with Site-level authentication +- Provides load balancing across multiple replicas +- Connects to data platforms like Databricks and Snowflake + +## Table of Contents + +1. [Basic Configuration](#basic-configuration) +2. [Authentication](#authentication) +3. [Off-Host Execution / Kubernetes Launcher](#off-host-execution--kubernetes-launcher) +4. [IDE Configuration](#ide-configuration) +5. [Data Integrations](#data-integrations) +6. [Session Customization](#session-customization) +7. [Non-Root Execution Mode](#non-root-execution-mode) +8. [Experimental Features](#experimental-features) +9. [Example Configurations](#example-configurations) +10. [Troubleshooting](#troubleshooting) + +--- + +## Basic Configuration + +### Image and Resources + +Configure the Workbench server image and basic settings: + +```yaml +apiVersion: core.posit.team/v1beta1 +kind: Site +metadata: + name: my-site + namespace: posit-team +spec: + workbench: + # Server image (required) + image: "ghcr.io/posit-dev/workbench:jammy-2024.12.0" + + # Image pull policy + imagePullPolicy: IfNotPresent + + # Number of replicas (enables load balancing when > 1) + replicas: 2 + + # URL prefix for ingress (default: "workbench") + domainPrefix: workbench +``` + +### Licensing + +Workbench requires a valid license. Configure via Kubernetes Secret: + +```yaml +spec: + workbench: + license: + type: FILE + existingSecretName: license + existingSecretKey: pw.lic +``` + +License types: +- `FILE`: License file stored in a Kubernetes Secret +- `KEY`: License key as an environment variable + +### Volume Configuration + +Workbench uses persistent storage for user home directories: + +```yaml +spec: + workbench: + # Primary volume for /home directories + volume: + create: true + size: "100Gi" + accessModes: + - "ReadWriteMany" + storageClassName: "efs-sc" # Optional: use specific storage class + + # Additional volumes (mounted to all sessions) + additionalVolumes: + - pvcName: project-data + mountPath: /mnt/projects + readOnly: false + - pvcName: shared-datasets + mountPath: /mnt/datasets + readOnly: true +``` + +When `replicas > 1`, a shared storage volume is automatically created at `/mnt/shared-storage` for load balancing state. + +### Node Placement + +Control where Workbench server pods are scheduled: + +```yaml +spec: + workbench: + # Node selector for server pods + nodeSelector: + node-type: posit-products + + # Tolerations for server pods + tolerations: + - key: "dedicated" + operator: "Equal" + value: "posit" + effect: "NoSchedule" +``` + +### Environment Variables + +Add custom environment variables to the Workbench server: + +```yaml +spec: + workbench: + addEnv: + R_LIBS_SITE: "/opt/R/libraries" + MY_CUSTOM_VAR: "value" +``` + +--- + +## Authentication + +Workbench integrates with Site-level authentication. Supported methods: + +### OIDC Authentication + +```yaml +spec: + workbench: + auth: + type: "oidc" + clientId: "workbench-client-id" + issuer: "https://idp.example.com" + + # Claim mappings + usernameClaim: "preferred_username" # Optional + + # Request scopes (optional) + scopes: + - "openid" + - "profile" + - "email" +``` + +### SAML Authentication + +```yaml +spec: + workbench: + auth: + type: "saml" + samlMetadataUrl: "https://idp.example.com/metadata" + + # Attribute mappings (optional if using a profile) + samlIdPAttributeProfile: "azure" # Use preset profile + # Or specify custom attributes: + samlUsernameAttribute: "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/name" + samlEmailAttribute: "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/emailaddress" +``` + +### Password Authentication + +For development environments only: + +```yaml +spec: + workbench: + auth: + type: "password" +``` + +### User Provisioning + +Control automatic user account creation: + +```yaml +spec: + workbench: + # Automatically create user accounts on first login + createUsersAutomatically: true + + # Groups with admin dashboard access + adminGroups: + - workbench-admin + - platform-admins + + # Groups with superuser (full) admin access + adminSuperuserGroups: + - workbench-superadmin +``` + +### Custom Login Page + +Customize the login page with HTML content: + +```yaml +spec: + workbench: + authLoginPageHtml: | +
+

Welcome to Data Science Platform

+

Please log in with your corporate credentials.

+
+``` + +The HTML content is mounted at `/etc/rstudio/login.html` and must be less than 64KB. + +--- + +## Off-Host Execution / Kubernetes Launcher + +Off-host execution runs user sessions as Kubernetes Jobs, providing isolation, resource management, and scalability. **This is enabled by default** in Team Operator. + +### How It Works + +1. User requests a session (RStudio, VS Code, Jupyter, etc.) +2. Workbench's Kubernetes Launcher creates a Kubernetes Job +3. The session runs in its own pod with configured resources +4. Session connects back to Workbench for proxying and management + +### Session Images + +Configure the container images available for sessions: + +```yaml +spec: + workbench: + # Default session image + defaultSessionImage: "ghcr.io/posit-dev/workbench-session:jammy-2024.12.0" + + # Additional session images users can select + extraSessionImages: + - "ghcr.io/posit-dev/workbench-session:gpu-2024.12.0" + - "ghcr.io/posit-dev/workbench-session:ml-2024.12.0" + - "custom-registry.io/custom-session:latest" +``` + +### Session Init Containers + +Configure init containers that run before session containers: + +```yaml +spec: + workbench: + # Init container image for sessions + sessionInitContainerImageName: "busybox" + sessionInitContainerImageTag: "latest" +``` + +### Resource Profiles + +Resource profiles define CPU and memory allocations that users can select: + +```yaml +spec: + workbench: + experimentalFeatures: + resourceProfiles: + default: + name: "Small" + cpus: "1" + memMb: "2000" + medium: + name: "Medium" + cpus: "2" + memMb: "4000" + large: + name: "Large" + cpus: "4" + memMb: "8000" + gpu: + name: "GPU Enabled" + cpus: "4" + memMb: "16000" + nvidiaGpus: "1" + placementConstraints: "node-type:gpu" +``` + +**Resource Profile Fields:** + +| Field | Description | +|-------|-------------| +| `name` | Display name in UI | +| `cpus` | CPU limit | +| `cpusRequest` | CPU request (defaults to ratio of limit) | +| `memMb` | Memory limit in MB | +| `memMbRequest` | Memory request (defaults to ratio of limit) | +| `nvidiaGpus` | NVIDIA GPU count | +| `amdGpus` | AMD GPU count | +| `placementConstraints` | Node selector as `key:value` pairs | + +### Request Ratios + +Control the ratio of requests to limits for session pods: + +```yaml +spec: + workbench: + experimentalFeatures: + # CPU requests = limits * 0.6 (default) + cpuRequestRatio: "0.6" + + # Memory requests = limits * 0.8 (default) + memoryRequestRatio: "0.8" +``` + +### Session Configuration Details + +Sessions are configured via launcher templates. The operator manages: + +- `job.tpl` - Kubernetes Job template +- `service.tpl` - Service template for session connectivity +- `rstudio-library-templates-data.tpl` - Configuration data injected into templates + +--- + +## IDE Configuration + +### RStudio IDE + +RStudio is enabled by default. Configure via the Workbench spec: + +```yaml +spec: + workbench: + experimentalFeatures: + # First project template path + firstProjectTemplatePath: "/opt/templates/default-project" + + # Session save behavior: "no", "ask", or "yes" + sessionSaveActionDefault: "no" # Recommended for Kubernetes +``` + +### VS Code / Code Server + +```yaml +spec: + workbench: + # VS Code extensions to pre-install + vsCodeExtensions: + - "ms-python.python" + - "quarto.quarto" + - "posit.shiny" + - "REditorSupport.r" + + # VS Code user settings (JSON) + vsCodeUserSettings: + editor.fontSize: + raw: "14" + editor.tabSize: + raw: "2" + + # VS Code-specific settings + vsCodeConfig: + enabled: 1 # 1 = enabled (default) + sessionTimeoutKillHours: 1 + + experimentalFeatures: + # Custom VS Code executable path + vsCodePath: "/opt/code-server/bin/code-server" + + # Extensions directory for shared extensions + vsCodeExtensionsDir: "/mnt/extensions/vscode" +``` + +### Positron IDE + +Positron is Posit's next-generation IDE. Enable and configure: + +```yaml +spec: + workbench: + positronConfig: + enabled: 1 + exe: "/opt/positron/bin/positron" + args: "--host=0.0.0.0" + + # Default session image for Positron + defaultSessionContainerImage: "ghcr.io/posit-dev/positron-session:latest" + + # Additional Positron session images + sessionContainerImages: + - "ghcr.io/posit-dev/positron-session:gpu" + + # Session behavior + sessionNoProfile: 1 # Skip .profile loading + userDataDir: "/home/{user}/.positron" + allowFileDownloads: 1 + allowFileUploads: 1 + sessionTimeoutKillHours: 24 + + # Positron extensions + extensions: + - "posit.positron-r" + - "posit.positron-python" + + # User settings (JSON) + userSettings: + editor.fontSize: + raw: "14" +``` + +### Jupyter Notebooks and JupyterLab + +```yaml +spec: + workbench: + jupyterConfig: + # Enable Jupyter Notebook Classic + notebooksEnabled: 1 + + # Enable JupyterLab (default: enabled) + labsEnabled: 1 + + # Custom Jupyter executable + jupyterExe: "/opt/python/bin/jupyter" + + # Version detection (default: "auto") + labVersion: "auto" + notebookVersion: "auto" + + # Idle kernel culling (minutes) + sessionCullMinutes: 120 + + # Shutdown after idle (minutes) + sessionShutdownMinutes: 5 + + # Default session image for Jupyter + defaultSessionContainerImage: "ghcr.io/posit-dev/jupyter-session:latest" +``` + +--- + +## Data Integrations + +### Databricks Integration + +Connect to one or more Databricks workspaces: + +```yaml +spec: + workbench: + databricks: + production: + name: "Production Workspace" + url: "https://production.cloud.databricks.com" + clientId: "databricks-app-client-id" + tenantId: "azure-tenant-id" # For Azure Databricks + + development: + name: "Development Workspace" + url: "https://dev.cloud.databricks.com" + clientId: "databricks-dev-client-id" + + experimentalFeatures: + # Force enable Databricks pane even without managed credentials + databricksForceEnabled: true +``` + +**Note:** Databricks client secrets must be stored in the site secret vault with keys like `dev-client-secret-{clientId}`. + +### Snowflake Integration + +```yaml +spec: + workbench: + snowflake: + accountId: "abc12345.us-east-1" + clientId: "snowflake-oauth-client-id" +``` + +The Snowflake client secret must be stored in the site secret vault as `snowflake-client-secret`. + +### DSN / ODBC Configuration + +Mount ODBC data source configurations into sessions: + +```yaml +spec: + workbench: + experimentalFeatures: + # Key in the site secret containing odbc.ini content + dsnSecret: "workbench-odbc-config" +``` + +The DSN file is mounted at `/etc/odbc.ini` in session pods. + +**Example odbc.ini content:** + +```ini +[PostgreSQL] +Driver = PostgreSQL +Server = postgres.example.com +Port = 5432 +Database = analytics + +[Snowflake] +Driver = Snowflake +Server = account.snowflakecomputing.com +Database = ANALYTICS +Schema = PUBLIC +``` + +--- + +## Session Customization + +### Session Tolerations + +Apply tolerations specifically to session pods (not the server): + +```yaml +spec: + workbench: + # Tolerations for Workbench server pods + tolerations: + - key: "dedicated" + operator: "Equal" + value: "posit-products" + effect: "NoSchedule" + + # Tolerations for session pods only + sessionTolerations: + - key: "dedicated" + operator: "Equal" + value: "workbench-sessions" + effect: "NoSchedule" + - key: "nvidia.com/gpu" + operator: "Exists" + effect: "NoSchedule" +``` + +### Session Node Selector + +The server-level `nodeSelector` is inherited by sessions. Sessions use placement constraints from resource profiles for additional targeting. + +### Session Environment Variables + +Inject environment variables into all sessions: + +```yaml +spec: + workbench: + experimentalFeatures: + sessionEnvVars: + - name: "R_LIBS_USER" + value: "~/R/library" + - name: "DATABASE_URL" + valueFrom: + secretKeyRef: + name: db-credentials + key: url +``` + +### Session Service Account + +Specify a custom service account for session pods: + +```yaml +spec: + workbench: + experimentalFeatures: + sessionServiceAccountName: "workbench-session-sa" +``` + +### Session Image Pull Policy + +Control when session images are pulled: + +```yaml +spec: + workbench: + experimentalFeatures: + sessionImagePullPolicy: "Always" # Always, IfNotPresent, Never +``` + +### Launcher Environment (PATH) + +Customize the PATH for launcher sessions: + +```yaml +spec: + workbench: + experimentalFeatures: + launcherEnvPath: "/opt/R/4.3/bin:/opt/python/3.11/bin:/usr/local/bin:/usr/bin:/bin" +``` + +--- + +## Non-Root Execution Mode + +Enable "maximally rootless" execution for enhanced security: + +```yaml +spec: + workbench: + experimentalFeatures: + nonRoot: true +``` + +When enabled: +- Workbench launcher runs with `unprivileged=1` +- Custom supervisord configuration is deployed +- Secure cookie key file is relocated to `/mnt/secure/rstudio/` +- Launcher configuration is managed via mounted ConfigMaps + +**Requirements:** +- Compatible Workbench image version +- Proper file permissions on mounted volumes + +**Limitations:** +- Some features requiring root privileges may not work +- Not all Workbench functionality has been tested in non-root mode + +--- + +## Experimental Features + +The `experimentalFeatures` section contains advanced options. These are subject to change: + +```yaml +spec: + workbench: + experimentalFeatures: + # Enable managed credential jobs + enableManagedCredentialJobs: true + + # Non-root operation + nonRoot: false + + # Privileged sessions (for Docker-in-Docker) + privilegedSessions: false + + # Web server thread pool size (default: 16) + wwwThreadPoolSize: 32 + + # Session proxy timeout (default: 30 seconds) + launcherSessionsProxyTimeoutSeconds: 60 + + # Force admin UI even on Kubernetes + forceAdminUiEnabled: true + + # Chronicle sidecar API key injection + chronicleSidecarProductApiKeyEnabled: true +``` + +### Workbench API Settings + +Enable the Workbench REST API: + +```yaml +spec: + workbench: + apiSettings: + workbenchApiEnabled: 1 + workbenchApiAdminEnabled: 1 + workbenchApiSuperAdminEnabled: 1 +``` + +--- + +## Example Configurations + +### Minimal Development Setup + +```yaml +apiVersion: core.posit.team/v1beta1 +kind: Site +metadata: + name: dev + namespace: posit-team +spec: + domain: dev.example.com + + secret: + type: "kubernetes" + vaultName: "dev-secrets" + + workbench: + image: "ghcr.io/posit-dev/workbench:jammy-2024.12.0" + license: + type: FILE + existingSecretName: license + existingSecretKey: pw.lic + auth: + type: "password" + createUsersAutomatically: true +``` + +### Production Multi-IDE Setup + +```yaml +apiVersion: core.posit.team/v1beta1 +kind: Site +metadata: + name: production + namespace: posit-team +spec: + domain: posit.example.com + + secret: + type: "aws" + vaultName: "production-secrets" + + volumeSource: + type: fsx-zfs + volumeId: fsvol-abcdef123456 + dnsName: fs-abcdef123456.fsx.us-east-1.amazonaws.com + + workbench: + image: "ghcr.io/posit-dev/workbench:jammy-2024.12.0" + replicas: 3 + + license: + type: FILE + existingSecretName: license + existingSecretKey: pw.lic + + auth: + type: "oidc" + clientId: "workbench-prod" + issuer: "https://idp.example.com" + + createUsersAutomatically: true + adminGroups: + - platform-admins + adminSuperuserGroups: + - workbench-superadmins + + defaultSessionImage: "ghcr.io/posit-dev/workbench-session:jammy-2024.12.0" + extraSessionImages: + - "ghcr.io/posit-dev/workbench-session:gpu-2024.12.0" + + nodeSelector: + node-type: posit-products + + sessionTolerations: + - key: "dedicated" + operator: "Equal" + value: "workbench-sessions" + effect: "NoSchedule" + + vsCodeExtensions: + - "ms-python.python" + - "quarto.quarto" + + positronConfig: + enabled: 1 + + databricks: + workspace: + name: "Analytics Workspace" + url: "https://analytics.cloud.databricks.com" + clientId: "databricks-client" + + experimentalFeatures: + resourceProfiles: + default: + name: "Small (1 CPU, 2GB)" + cpus: "1" + memMb: "2000" + medium: + name: "Medium (2 CPU, 4GB)" + cpus: "2" + memMb: "4000" + large: + name: "Large (4 CPU, 8GB)" + cpus: "4" + memMb: "8000" + gpu: + name: "GPU (4 CPU, 16GB, 1 GPU)" + cpus: "4" + memMb: "16000" + nvidiaGpus: "1" +``` + +### GPU-Enabled Data Science Platform + +```yaml +apiVersion: core.posit.team/v1beta1 +kind: Site +metadata: + name: ml-platform + namespace: posit-team +spec: + domain: ml.example.com + + workbench: + image: "ghcr.io/posit-dev/workbench:jammy-2024.12.0" + replicas: 2 + + defaultSessionImage: "ghcr.io/posit-dev/workbench-session:ml-2024.12.0" + extraSessionImages: + - "ghcr.io/posit-dev/workbench-session:gpu-pytorch" + - "ghcr.io/posit-dev/workbench-session:gpu-tensorflow" + + sessionTolerations: + - key: "nvidia.com/gpu" + operator: "Exists" + effect: "NoSchedule" + + experimentalFeatures: + resourceProfiles: + cpu-small: + name: "CPU Small" + cpus: "2" + memMb: "4000" + cpu-large: + name: "CPU Large" + cpus: "8" + memMb: "32000" + gpu-single: + name: "Single GPU" + cpus: "4" + memMb: "32000" + nvidiaGpus: "1" + placementConstraints: "node-type:gpu" + gpu-multi: + name: "Multi GPU" + cpus: "8" + memMb: "64000" + nvidiaGpus: "4" + placementConstraints: "node-type:gpu-multi" +``` + +--- + +## Troubleshooting + +### Common Issues + +#### Sessions Not Starting + +1. **Check launcher logs:** + ```bash + kubectl logs -n posit-team deploy/-workbench | grep -i launcher + ``` + +2. **Verify session service account exists:** + ```bash + kubectl get sa -workbench-session -n posit-team + ``` + +3. **Check for pending session jobs:** + ```bash + kubectl get jobs -n posit-team -l posit.team/component=workbench-session + ``` + +4. **Verify session image is pullable:** + ```bash + kubectl run test --image= --rm -it --command -- echo "Success" + ``` + +#### Authentication Failures + +1. **Check OIDC configuration:** + - Verify issuer URL is accessible from the cluster + - Confirm client ID matches IdP configuration + - Check that redirect URIs are configured in IdP + +2. **View authentication logs:** + ```bash + kubectl logs -n posit-team deploy/-workbench | grep -i auth + ``` + +3. **Verify secrets exist:** + ```bash + kubectl get secret -workbench-config -n posit-team + ``` + +#### Session Resource Issues + +1. **Check resource profile configuration:** + ```bash + kubectl get configmap -workbench -n posit-team -o yaml | grep -A 50 "launcher.kubernetes.resources.conf" + ``` + +2. **Verify nodes have capacity:** + ```bash + kubectl describe nodes | grep -A 10 "Allocated resources" + ``` + +3. **Check session pod events:** + ```bash + kubectl describe pod -n posit-team + ``` + +#### Volume Mount Issues + +1. **Verify PVC exists and is bound:** + ```bash + kubectl get pvc -n posit-team | grep workbench + ``` + +2. **Check volume permissions in session:** + ```bash + kubectl exec -it -n posit-team -- ls -la /home + ``` + +3. **Verify storage class supports RWX:** + ```bash + kubectl get storageclass -o yaml + ``` + +### Useful Commands + +```bash +# List all Workbench resources +kubectl get workbench -n posit-team + +# Describe Workbench configuration +kubectl describe workbench -n posit-team + +# View Workbench ConfigMap +kubectl get configmap -workbench -n posit-team -o yaml + +# Check session template ConfigMap +kubectl get configmap -workbench-templates -n posit-team -o yaml + +# List active sessions +kubectl get jobs -n posit-team -l posit.team/component=workbench-session + +# View session logs +kubectl logs job/ -n posit-team + +# Force restart Workbench +kubectl rollout restart deploy/-workbench -n posit-team +``` + +### Log Levels + +Enable debug logging for troubleshooting: + +```yaml +spec: + debug: true + logFormat: json # Optional: use JSON for log aggregation +``` + +Debug logging increases verbosity for: +- Launcher operations +- Authentication flows +- Session lifecycle events +- Database operations + +--- + +## Related Documentation + +- [Site Management Guide](product-team-site-management.md) +- [Adding Config Options](adding-config-options.md) - For contributors extending Workbench configuration +- [Posit Workbench Admin Guide](https://docs.posit.co/ide/server-pro/) +- [Kubernetes Job Launcher Documentation](https://docs.posit.co/ide/server-pro/integration/launcher-kubernetes.html)