diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 00000000..f8356f7b --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,150 @@ +# Contributing to Druid Operator + +First off, thanks for taking the time to contribute to the Druid Operator! πŸŽ‰ + +The following is a set of guidelines for contributing to Druid Operator and its packages. These are mostly guidelines, not rules. Use your best judgment, and feel free to propose changes to this document in a pull request. + +## Table of Contents + +- [Prerequisites](#prerequisites) + - [Windows](#windows) + - [MacOS](#macos) + - [Linux](#linux) +- [Development Setup](#development-setup) + - [1. Fork and Clone](#1-fork-and-clone) + - [2. Create a Local Cluster](#2-create-a-local-cluster) + - [3. Install Dependencies](#3-install-dependencies) + - [4. Run the Operator](#4-run-the-operator) +- [Testing](#testing) + - [Unit Tests](#unit-tests) + - [End-to-End Tests](#end-to-end-tests) +- [Project Structure](#project-structure) + +## Prerequisites + +You will need the following tools installed on your development machine: + +* **Go** (v1.20+) +* **Docker** (v20.10+) +* **Kind** (v0.20+) +* **Kubectl** (latest) +* **Helm** (v3+) +* **Make** + +### Windows + +**Recommended**: Use Docker Desktop with WSL 2 backend. + +run the following commands in PowerShell (Admin): + +```powershell +# Install core tools +winget install -e --id GoLang.Go +winget install -e --id Docker.DockerDesktop +winget install -e --id Kubernetes.kind +winget install -e --id Kubernetes.kubectl +winget install -e --id Helm.Helm +winget install -e --id GnuWin32.Make +``` + +### MacOS + +Using [Homebrew](https://brew.sh/): + +```bash +brew install go +brew install --cask docker +brew install kind +brew install kubectl +brew install helm +brew install make +``` + +### Linux + +Using `apt` (Ubuntu/Debian) or `brew` (Linuxbrew): + +```bash +# Using Linuxbrew (Recommended for unified versioning) +brew install go kind kubectl helm make + +# OR using apt (Ubuntu) +sudo apt update +sudo apt install -y golang-go make +# For Docker, Kind, Kubectl, and Helm, please refer to their official installation guides +# as apt repositories might lag behind. +``` + +## Development Setup + +### 1. Fork and Clone + +1. Fork the [druid-operator repository](https://github.com/datainfrahq/druid-operator) on GitHub. +2. Clone your fork locally: + +```bash +git clone https://github.com//druid-operator.git +cd druid-operator +``` + +### 2. Create a Local Cluster + +We use **Kind** (Kubernetes in Docker) for local development. + +```bash +kind create cluster --name druid +``` + +### 3. Install Dependencies + +Deploy the Druid Operator using Helm to set up CRDs and basic resources. + +```bash +# Add the DataInfra Helm repo +helm repo add datainfra https://charts.datainfra.io +helm repo update + +# Install the operator (this installs CRDs and the controller) +helm -n druid-operator-system upgrade -i --create-namespace cluster-druid-operator datainfra/druid-operator +``` + +### 4. Run the Operator + +You can run the operator source code locally against your Kind cluster. This is useful for rapid development without building Docker images for every change. + +```bash +# Verify you are pointing to the correct context +kubectl config use-context kind-druid + +# Run the controller locally +make run +``` + +The operator logs will appear in your terminal. + +## Testing + +### Unit Tests + +Run the unit tests to verify your changes. + +```bash +make test +``` + +### End-to-End Tests + +To run the full end-to-End suite (this spins up a Kind cluster and runs validation): + +```bash +make e2e +``` + +## Project Structure + +* `apis/`: Kubernetes API definitions (CRDs). +* `controllers/`: Core controller logic using Kubebuilder. +* `chart/`: Helm chart for the operator. +* `e2e/`: End-to-End test scripts and configurations. +* `docs/`: Documentation files. +* `Makefile`: Build and test automation commands. diff --git a/learning-docs/01-introduction/README.md b/learning-docs/01-introduction/README.md new file mode 100644 index 00000000..c627f1f5 --- /dev/null +++ b/learning-docs/01-introduction/README.md @@ -0,0 +1,114 @@ +# 1. Introduction - What is This Project? + +## The Problem This Solves + +Imagine you want to run Apache Druid (a real-time analytics database) on Kubernetes. Druid is a **distributed system** with multiple components: + +- **Coordinator** - Manages data availability +- **Overlord** - Manages data ingestion tasks +- **Broker** - Handles queries from clients +- **Router** - Routes requests to the right service +- **Historical** - Stores and serves historical data +- **MiddleManager/Indexer** - Handles data ingestion + +To run Druid manually on Kubernetes, you would need to create: +- Multiple StatefulSets (one for each component) +- Multiple ConfigMaps (configuration files) +- Multiple Services (for networking) +- PersistentVolumeClaims (for storage) +- And more... + +This could be **50+ YAML files** that you need to manage, update, and keep in sync! + +## The Solution: An Operator + +This operator lets you define your entire Druid cluster in **ONE simple YAML file**: + +```yaml +apiVersion: druid.apache.org/v1alpha1 +kind: Druid +metadata: + name: my-druid-cluster +spec: + image: apache/druid:25.0.0 + nodes: + brokers: + nodeType: broker + replicas: 2 + historicals: + nodeType: historical + replicas: 3 + # ... other nodes +``` + +The operator then: +1. **Reads** this YAML file +2. **Creates** all necessary Kubernetes resources automatically +3. **Monitors** the cluster continuously +4. **Heals** the cluster if something goes wrong +5. **Updates** the cluster when you change the YAML + +## Key Concepts in This Project + +### 1. Custom Resource Definition (CRD) +A CRD extends Kubernetes with new resource types. This project defines two CRDs: +- `Druid` - Represents a Druid cluster +- `DruidIngestion` - Represents a data ingestion job + +### 2. Custom Resource (CR) +A CR is an instance of a CRD. When you create a YAML file with `kind: Druid`, you're creating a CR. + +### 3. Controller +The controller is the "brain" that watches for CRs and takes action. It runs in a loop: +1. Watch for changes to Druid CRs +2. Compare desired state (what the CR says) vs actual state (what exists in K8s) +3. Take action to make actual state match desired state + +### 4. Reconciliation +The process of making the actual state match the desired state is called "reconciliation." + +## Project Components + +``` +druid-operator/ +β”œβ”€β”€ apis/ # CRD definitions (what a Druid CR looks like) +β”œβ”€β”€ controllers/ # Controller logic (what to do when CR changes) +β”œβ”€β”€ config/ # Kubernetes manifests for deploying the operator +β”œβ”€β”€ chart/ # Helm chart for easy installation +β”œβ”€β”€ examples/ # Example Druid cluster configurations +β”œβ”€β”€ main.go # Entry point - starts the operator +└── docs/ # Documentation +``` + +## How It Works (High Level) + +``` +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚ Kubernetes Cluster β”‚ +β”‚ β”‚ +β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ +β”‚ β”‚ You β”‚ β”‚ Druid Operator β”‚ β”‚ +β”‚ β”‚ (User) β”‚ β”‚ β”‚ β”‚ +β”‚ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ +β”‚ β”‚ β”‚ β”‚ Controller β”‚ β”‚ β”‚ +β”‚ β”‚ kubectl apply β”‚ β”‚ β”‚ β”‚ β”‚ +β”‚ β”‚ β”‚ β”‚ 1. Watch Druid CRs β”‚ β”‚ β”‚ +β”‚ β–Ό β”‚ β”‚ 2. Compare states β”‚ β”‚ β”‚ +β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ 3. Create/Update/Delete β”‚ β”‚ β”‚ +β”‚ β”‚ Druid CR │◄───────── β”‚ K8s resources β”‚ β”‚ β”‚ +β”‚ β”‚ (YAML) β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ +β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ +β”‚ β”‚ β”‚ +β”‚ β”‚ Operator creates these automatically: β”‚ +β”‚ β–Ό β”‚ +β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ +β”‚ β”‚ StatefulSets, Services, ConfigMaps, PVCs, etc. β”‚ β”‚ +β”‚ β”‚ (All the resources needed to run Druid) β”‚ β”‚ +β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ +β”‚ β”‚ +β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ +``` + +## Next Steps + +Continue to [Prerequisites & Learning Path](../02-prerequisites/README.md) to understand what technologies you need to learn. diff --git a/learning-docs/02-prerequisites/README.md b/learning-docs/02-prerequisites/README.md new file mode 100644 index 00000000..a66d3f9a --- /dev/null +++ b/learning-docs/02-prerequisites/README.md @@ -0,0 +1,161 @@ +# 2. Prerequisites & Learning Path + +## What You Need to Learn + +As a Java developer with basic Kubernetes knowledge, here's your learning roadmap: + +## 1. Go Programming Language (Essential) + +### Why Go for Kubernetes? +- Kubernetes itself is written in Go +- All official Kubernetes libraries are in Go +- Go compiles to a single binary (easy to deploy) +- Go has excellent concurrency support (goroutines) +- Go is simpler than Java (no classes, no inheritance) + +### What to Learn in Go + +| Topic | Priority | Time Estimate | +|-------|----------|---------------| +| Basic syntax (variables, functions, loops) | High | 2-3 hours | +| Structs and methods | High | 2 hours | +| Interfaces | High | 2 hours | +| Pointers | High | 1-2 hours | +| Error handling | High | 1 hour | +| Packages and modules | High | 1 hour | +| Goroutines and channels | Medium | 2-3 hours | +| Context package | Medium | 1 hour | + +### Recommended Resources +1. **A Tour of Go** (official): https://go.dev/tour/ +2. **Go by Example**: https://gobyexample.com/ +3. **Effective Go**: https://go.dev/doc/effective_go + +--- + +## 2. Kubernetes Concepts (Essential) + +### Core Concepts You Must Know + +| Concept | What It Is | Why It Matters | +|---------|------------|----------------| +| **Pod** | Smallest deployable unit | Druid nodes run in Pods | +| **Deployment** | Manages stateless Pods | Some Druid nodes use this | +| **StatefulSet** | Manages stateful Pods | Most Druid nodes use this | +| **Service** | Network endpoint for Pods | How Druid nodes communicate | +| **ConfigMap** | Configuration storage | Druid configuration files | +| **Secret** | Sensitive data storage | Passwords, credentials | +| **PersistentVolumeClaim** | Storage request | Druid data storage | +| **Namespace** | Resource isolation | Organize resources | + +### Advanced Concepts for Operators + +| Concept | What It Is | Why It Matters | +|---------|------------|----------------| +| **Custom Resource Definition (CRD)** | Extends K8s API | Defines what "Druid" means | +| **Custom Resource (CR)** | Instance of CRD | Your Druid cluster definition | +| **Controller** | Watches and acts on resources | The operator's brain | +| **Finalizer** | Cleanup hook | Clean up when Druid is deleted | +| **Owner Reference** | Parent-child relationship | Automatic garbage collection | + +### Recommended Resources +1. **Kubernetes Basics**: https://kubernetes.io/docs/tutorials/kubernetes-basics/ +2. **Kubernetes Concepts**: https://kubernetes.io/docs/concepts/ + +--- + +## 3. Controller-Runtime Library (Important) + +This is the Go library used to build Kubernetes operators. + +### Key Concepts + +| Concept | Description | +|---------|-------------| +| **Manager** | Runs controllers, handles leader election | +| **Controller** | Watches resources, triggers reconciliation | +| **Reconciler** | Your logic - what to do when resource changes | +| **Client** | Talks to Kubernetes API | +| **Scheme** | Knows about resource types | + +### Recommended Resources +1. **Kubebuilder Book**: https://book.kubebuilder.io/ +2. **Controller-Runtime Docs**: https://pkg.go.dev/sigs.k8s.io/controller-runtime + +--- + +## 4. Apache Druid (Good to Know) + +Understanding Druid helps you understand why the operator is designed this way. + +### Druid Components + +| Component | Role | Stateful? | +|-----------|------|-----------| +| **Coordinator** | Manages data distribution | Yes | +| **Overlord** | Manages ingestion tasks | Yes | +| **Broker** | Query routing | No | +| **Router** | API gateway | No | +| **Historical** | Serves historical data | Yes | +| **MiddleManager** | Runs ingestion tasks | Yes | + +### Recommended Resources +1. **Druid Introduction**: https://druid.apache.org/docs/latest/design/ +2. **Druid Architecture**: https://druid.apache.org/docs/latest/design/architecture.html + +--- + +## Learning Order Recommendation + +``` +Week 1: Go Basics +β”œβ”€β”€ Day 1-2: Go syntax, types, functions +β”œβ”€β”€ Day 3-4: Structs, methods, interfaces +└── Day 5-7: Packages, error handling, pointers + +Week 2: Kubernetes Deep Dive +β”œβ”€β”€ Day 1-2: Pods, Deployments, StatefulSets +β”œβ”€β”€ Day 3-4: Services, ConfigMaps, PVCs +└── Day 5-7: CRDs, Controllers concept + +Week 3: Operator Development +β”œβ”€β”€ Day 1-3: Kubebuilder tutorial +β”œβ”€β”€ Day 4-5: Controller-runtime basics +└── Day 6-7: Study this codebase + +Week 4: Apache Druid +β”œβ”€β”€ Day 1-3: Druid architecture +β”œβ”€β”€ Day 4-5: Druid configuration +└── Day 6-7: Run Druid locally +``` + +--- + +## Quick Self-Assessment + +Before diving into the code, make sure you can answer: + +### Go +- [ ] What's the difference between `var x int` and `x := 0`? +- [ ] How do you define a method on a struct? +- [ ] What's an interface in Go? +- [ ] How does error handling work in Go? + +### Kubernetes +- [ ] What's the difference between Deployment and StatefulSet? +- [ ] What is a Service and why do we need it? +- [ ] What is a ConfigMap used for? +- [ ] What happens when you `kubectl apply -f file.yaml`? + +### Operators +- [ ] What is a CRD? +- [ ] What is reconciliation? +- [ ] What is a controller? + +If you can't answer these, spend more time on the prerequisites before diving into the code. + +--- + +## Next Steps + +Continue to [Go vs Java](../03-go-vs-java/README.md) to understand Go from a Java developer's perspective. diff --git a/learning-docs/03-go-vs-java/README.md b/learning-docs/03-go-vs-java/README.md new file mode 100644 index 00000000..7aebbe16 --- /dev/null +++ b/learning-docs/03-go-vs-java/README.md @@ -0,0 +1,405 @@ +# 3. Go vs Java - A Java Developer's Guide to Go + +## Why Go Instead of Java? + +| Aspect | Java | Go | Winner for K8s | +|--------|------|-----|----------------| +| **Startup Time** | Slow (JVM warmup) | Fast (native binary) | Go βœ“ | +| **Memory Usage** | High (JVM overhead) | Low | Go βœ“ | +| **Deployment** | JAR + JVM | Single binary | Go βœ“ | +| **Concurrency** | Threads (heavy) | Goroutines (light) | Go βœ“ | +| **Kubernetes Libraries** | Third-party | Official | Go βœ“ | +| **Learning Curve** | Complex | Simple | Go βœ“ | + +### The Real Reason: Kubernetes is Written in Go +All official Kubernetes libraries, tools, and examples are in Go. Using Go means: +- Better library support +- More examples to learn from +- Easier integration with K8s ecosystem + +--- + +## Syntax Comparison + +### 1. Hello World + +**Java:** +```java +public class Main { + public static void main(String[] args) { + System.out.println("Hello, World!"); + } +} +``` + +**Go:** +```go +package main + +import "fmt" + +func main() { + fmt.Println("Hello, World!") +} +``` + +### 2. Variables + +**Java:** +```java +int x = 10; +String name = "John"; +final int CONSTANT = 100; +``` + +**Go:** +```go +var x int = 10 // Explicit type +x := 10 // Type inference (short declaration) +name := "John" // String +const CONSTANT = 100 // Constant +``` + +### 3. Functions + +**Java:** +```java +public int add(int a, int b) { + return a + b; +} + +// Multiple return values? Use a class or array +public int[] divideAndRemainder(int a, int b) { + return new int[]{a / b, a % b}; +} +``` + +**Go:** +```go +func add(a int, b int) int { + return a + b +} + +// Multiple return values are native! +func divideAndRemainder(a, b int) (int, int) { + return a / b, a % b +} + +// Named return values +func divide(a, b int) (quotient int, remainder int) { + quotient = a / b + remainder = a % b + return // Returns named values +} +``` + +### 4. Classes vs Structs + +**Java (Class with methods):** +```java +public class Person { + private String name; + private int age; + + public Person(String name, int age) { + this.name = name; + this.age = age; + } + + public String getName() { + return name; + } + + public void greet() { + System.out.println("Hello, I'm " + name); + } +} + +// Usage +Person p = new Person("John", 30); +p.greet(); +``` + +**Go (Struct with methods):** +```go +type Person struct { + Name string // Uppercase = public + age int // Lowercase = private +} + +// Constructor function (convention, not language feature) +func NewPerson(name string, age int) *Person { + return &Person{ + Name: name, + age: age, + } +} + +// Method on Person (receiver) +func (p *Person) Greet() { + fmt.Println("Hello, I'm", p.Name) +} + +// Usage +p := NewPerson("John", 30) +p.Greet() +``` + +### 5. Interfaces + +**Java:** +```java +public interface Animal { + void speak(); +} + +public class Dog implements Animal { + @Override + public void speak() { + System.out.println("Woof!"); + } +} +``` + +**Go (Implicit interfaces - no "implements" keyword!):** +```go +type Animal interface { + Speak() +} + +type Dog struct{} + +// Dog automatically implements Animal because it has Speak() +func (d Dog) Speak() { + fmt.Println("Woof!") +} + +// Usage +var animal Animal = Dog{} +animal.Speak() +``` + +### 6. Error Handling + +**Java (Exceptions):** +```java +public String readFile(String path) throws IOException { + // May throw IOException + return Files.readString(Path.of(path)); +} + +// Usage +try { + String content = readFile("file.txt"); +} catch (IOException e) { + System.err.println("Error: " + e.getMessage()); +} +``` + +**Go (Explicit error returns):** +```go +func readFile(path string) (string, error) { + content, err := os.ReadFile(path) + if err != nil { + return "", err // Return error + } + return string(content), nil // nil = no error +} + +// Usage +content, err := readFile("file.txt") +if err != nil { + fmt.Println("Error:", err) + return +} +fmt.Println(content) +``` + +### 7. Inheritance vs Composition + +**Java (Inheritance):** +```java +public class Animal { + protected String name; + public void eat() { System.out.println("Eating..."); } +} + +public class Dog extends Animal { + public void bark() { System.out.println("Woof!"); } +} +``` + +**Go (Composition - No inheritance!):** +```go +type Animal struct { + Name string +} + +func (a *Animal) Eat() { + fmt.Println("Eating...") +} + +// Dog "embeds" Animal (composition) +type Dog struct { + Animal // Embedded struct +} + +func (d *Dog) Bark() { + fmt.Println("Woof!") +} + +// Usage +dog := Dog{Animal: Animal{Name: "Buddy"}} +dog.Eat() // Works! Promoted from Animal +dog.Bark() // Dog's own method +``` + +### 8. Null vs Nil + +**Java:** +```java +String s = null; +if (s == null) { + System.out.println("s is null"); +} +``` + +**Go:** +```go +var s *string = nil // Pointer to string, nil +if s == nil { + fmt.Println("s is nil") +} + +// Slices, maps, channels can also be nil +var slice []int = nil +var m map[string]int = nil +``` + +### 9. Collections + +**Java:** +```java +// List +List list = new ArrayList<>(); +list.add("a"); +list.add("b"); + +// Map +Map map = new HashMap<>(); +map.put("one", 1); +map.put("two", 2); +``` + +**Go:** +```go +// Slice (like ArrayList) +slice := []string{"a", "b"} +slice = append(slice, "c") + +// Map +m := map[string]int{ + "one": 1, + "two": 2, +} +m["three"] = 3 + +// Check if key exists +value, exists := m["one"] +if exists { + fmt.Println(value) +} +``` + +### 10. Concurrency + +**Java:** +```java +// Thread +Thread t = new Thread(() -> { + System.out.println("Running in thread"); +}); +t.start(); +t.join(); +``` + +**Go:** +```go +// Goroutine (much lighter than threads) +go func() { + fmt.Println("Running in goroutine") +}() + +// Channels for communication +ch := make(chan string) +go func() { + ch <- "Hello from goroutine" +}() +msg := <-ch // Receive from channel +fmt.Println(msg) +``` + +--- + +## Key Differences Summary + +| Feature | Java | Go | +|---------|------|-----| +| OOP | Classes, inheritance | Structs, composition | +| Interfaces | Explicit (`implements`) | Implicit (duck typing) | +| Errors | Exceptions | Return values | +| Null | `null` | `nil` | +| Generics | Yes (since Java 5) | Yes (since Go 1.18) | +| Visibility | `public/private/protected` | Uppercase/lowercase | +| Constructors | Constructor methods | Factory functions | +| Getters/Setters | Common pattern | Not idiomatic | +| Package management | Maven/Gradle | Go modules | + +--- + +## Go Idioms to Learn + +### 1. The Blank Identifier `_` +```go +// Ignore a return value +_, err := someFunction() + +// Import for side effects only +import _ "some/package" +``` + +### 2. Defer +```go +func readFile() { + file, _ := os.Open("file.txt") + defer file.Close() // Will run when function returns + + // ... use file +} +``` + +### 3. Type Assertions +```go +var i interface{} = "hello" + +s := i.(string) // Panic if not string +s, ok := i.(string) // Safe - ok is false if not string +``` + +### 4. Type Switch +```go +switch v := i.(type) { +case string: + fmt.Println("string:", v) +case int: + fmt.Println("int:", v) +default: + fmt.Println("unknown type") +} +``` + +--- + +## Next Steps + +Continue to [Kubernetes Fundamentals](../04-kubernetes-fundamentals/README.md) to understand the Kubernetes concepts used in this operator. diff --git a/learning-docs/04-kubernetes-fundamentals/README.md b/learning-docs/04-kubernetes-fundamentals/README.md new file mode 100644 index 00000000..0ecf5ccb --- /dev/null +++ b/learning-docs/04-kubernetes-fundamentals/README.md @@ -0,0 +1,421 @@ +# 4. Kubernetes Fundamentals + +## What is Kubernetes? + +Kubernetes (K8s) is a container orchestration platform. Think of it as: +- **A data center operating system** - manages compute, storage, networking +- **A declarative system** - you describe WHAT you want, K8s figures out HOW +- **Self-healing** - automatically restarts failed containers + +--- + +## Core Concepts + +### 1. Pod - The Smallest Unit + +A Pod is one or more containers that: +- Share the same network (localhost) +- Share the same storage volumes +- Are scheduled together on the same node + +```yaml +apiVersion: v1 +kind: Pod +metadata: + name: my-pod +spec: + containers: + - name: my-container + image: nginx:latest + ports: + - containerPort: 80 +``` + +**Key Points:** +- Pods are ephemeral (temporary) - they can be killed and recreated +- Pods get a unique IP address +- Don't create Pods directly - use Deployments or StatefulSets + +--- + +### 2. Deployment - Stateless Applications + +A Deployment manages a set of identical Pods: +- Ensures desired number of replicas are running +- Handles rolling updates +- Supports rollback + +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: nginx-deployment +spec: + replicas: 3 + selector: + matchLabels: + app: nginx + template: + metadata: + labels: + app: nginx + spec: + containers: + - name: nginx + image: nginx:1.14.2 + ports: + - containerPort: 80 +``` + +**When to use:** +- Stateless applications (web servers, APIs) +- Applications where any Pod can handle any request +- Applications that don't need stable network identity + +--- + +### 3. StatefulSet - Stateful Applications + +A StatefulSet is like a Deployment but for stateful applications: +- Pods get stable, unique network identities (pod-0, pod-1, pod-2) +- Pods get stable storage (PersistentVolumeClaims) +- Pods are created/deleted in order + +```yaml +apiVersion: apps/v1 +kind: StatefulSet +metadata: + name: mysql +spec: + serviceName: mysql + replicas: 3 + selector: + matchLabels: + app: mysql + template: + metadata: + labels: + app: mysql + spec: + containers: + - name: mysql + image: mysql:5.7 + volumeMounts: + - name: data + mountPath: /var/lib/mysql + volumeClaimTemplates: + - metadata: + name: data + spec: + accessModes: ["ReadWriteOnce"] + resources: + requests: + storage: 10Gi +``` + +**When to use:** +- Databases (MySQL, PostgreSQL, Druid!) +- Applications that need stable network identity +- Applications that need persistent storage + +**Why Druid uses StatefulSets:** +- Historical nodes need persistent storage for data +- Nodes need stable identities for cluster coordination + +--- + +### 4. Service - Network Access to Pods + +A Service provides a stable network endpoint for Pods: +- Pods come and go, but Service IP stays the same +- Load balances traffic across Pods +- Provides DNS name for discovery + +```yaml +apiVersion: v1 +kind: Service +metadata: + name: my-service +spec: + selector: + app: my-app + ports: + - port: 80 + targetPort: 8080 + type: ClusterIP # Internal only +``` + +**Service Types:** + +| Type | Description | Use Case | +|------|-------------|----------| +| `ClusterIP` | Internal IP only | Internal communication | +| `NodePort` | Exposes on each node's IP | Development/testing | +| `LoadBalancer` | Cloud load balancer | Production external access | +| `Headless` | No cluster IP, DNS only | StatefulSet pod discovery | + +**Headless Service (important for StatefulSets):** +```yaml +apiVersion: v1 +kind: Service +metadata: + name: mysql-headless +spec: + clusterIP: None # This makes it headless! + selector: + app: mysql + ports: + - port: 3306 +``` + +With headless service, you can access individual pods: +- `mysql-0.mysql-headless.namespace.svc.cluster.local` +- `mysql-1.mysql-headless.namespace.svc.cluster.local` + +--- + +### 5. ConfigMap - Configuration Data + +A ConfigMap stores non-sensitive configuration: + +```yaml +apiVersion: v1 +kind: ConfigMap +metadata: + name: app-config +data: + database.host: "mysql.default.svc" + database.port: "3306" + config.properties: | + key1=value1 + key2=value2 +``` + +**Using ConfigMap in a Pod:** +```yaml +spec: + containers: + - name: app + env: + - name: DB_HOST + valueFrom: + configMapKeyRef: + name: app-config + key: database.host + volumeMounts: + - name: config + mountPath: /etc/config + volumes: + - name: config + configMap: + name: app-config +``` + +--- + +### 6. Secret - Sensitive Data + +A Secret stores sensitive data (passwords, tokens): + +```yaml +apiVersion: v1 +kind: Secret +metadata: + name: db-secret +type: Opaque +data: + username: YWRtaW4= # base64 encoded "admin" + password: cGFzc3dvcmQ= # base64 encoded "password" +``` + +--- + +### 7. PersistentVolumeClaim (PVC) - Storage + +A PVC requests storage from the cluster: + +```yaml +apiVersion: v1 +kind: PersistentVolumeClaim +metadata: + name: my-pvc +spec: + accessModes: + - ReadWriteOnce + resources: + requests: + storage: 10Gi + storageClassName: standard +``` + +**Access Modes:** +- `ReadWriteOnce` (RWO) - Single node read/write +- `ReadOnlyMany` (ROX) - Multiple nodes read-only +- `ReadWriteMany` (RWX) - Multiple nodes read/write + +--- + +### 8. Namespace - Resource Isolation + +Namespaces provide logical isolation: + +```yaml +apiVersion: v1 +kind: Namespace +metadata: + name: production +``` + +```bash +# Create resources in a namespace +kubectl apply -f deployment.yaml -n production + +# List resources in a namespace +kubectl get pods -n production +``` + +--- + +## How Kubernetes Works + +### The Control Loop + +``` +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚ Kubernetes Control Plane β”‚ +β”‚ β”‚ +β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ +β”‚ β”‚ API Server │◄───│ etcd β”‚ β”‚ Scheduler β”‚ β”‚ +β”‚ β”‚ β”‚ β”‚ (Database) β”‚ β”‚ β”‚ β”‚ +β”‚ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ +β”‚ β”‚ β”‚ +β”‚ β”‚ β”‚ +β”‚ β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β” β”‚ +β”‚ β”‚ Controller β”‚ ◄── Watches resources, takes action β”‚ +β”‚ β”‚ Manager β”‚ β”‚ +β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ +β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ + β”‚ + β”‚ Creates/Updates/Deletes + β–Ό +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚ Worker Nodes β”‚ +β”‚ β”‚ +β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ +β”‚ β”‚ kubelet β”‚ β”‚ kubelet β”‚ β”‚ kubelet β”‚ β”‚ +β”‚ β”‚ + Pods β”‚ β”‚ + Pods β”‚ β”‚ + Pods β”‚ β”‚ +β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ +β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ +``` + +### Declarative vs Imperative + +**Imperative (telling K8s what to DO):** +```bash +kubectl create deployment nginx --image=nginx +kubectl scale deployment nginx --replicas=3 +kubectl set image deployment/nginx nginx=nginx:1.16 +``` + +**Declarative (telling K8s what you WANT):** +```yaml +# deployment.yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: nginx +spec: + replicas: 3 + template: + spec: + containers: + - name: nginx + image: nginx:1.16 +``` +```bash +kubectl apply -f deployment.yaml +``` + +**Declarative is preferred because:** +- Configuration is version controlled +- Easy to reproduce +- Self-documenting +- Operators work declaratively + +--- + +## Labels and Selectors + +Labels are key-value pairs attached to resources: + +```yaml +metadata: + labels: + app: druid + component: broker + environment: production +``` + +Selectors find resources by labels: + +```yaml +spec: + selector: + matchLabels: + app: druid + component: broker +``` + +--- + +## Resource Lifecycle + +``` +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚ Pending │────►│ Running │────►│ Succeeded β”‚ +β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ + β”‚ + β–Ό + β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” + β”‚ Failed β”‚ + β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ +``` + +--- + +## Common kubectl Commands + +```bash +# Get resources +kubectl get pods +kubectl get pods -n my-namespace +kubectl get pods -o wide # More details +kubectl get pods -o yaml # Full YAML + +# Describe (detailed info) +kubectl describe pod my-pod + +# Logs +kubectl logs my-pod +kubectl logs my-pod -f # Follow +kubectl logs my-pod -c my-container # Specific container + +# Execute command in pod +kubectl exec -it my-pod -- /bin/bash + +# Apply configuration +kubectl apply -f my-resource.yaml + +# Delete +kubectl delete pod my-pod +kubectl delete -f my-resource.yaml + +# Watch for changes +kubectl get pods -w +``` + +--- + +## Next Steps + +Continue to [What is an Operator?](../05-operators/README.md) to understand the operator pattern. diff --git a/learning-docs/05-operators/README.md b/learning-docs/05-operators/README.md new file mode 100644 index 00000000..0e538ec2 --- /dev/null +++ b/learning-docs/05-operators/README.md @@ -0,0 +1,286 @@ +# 5. What is a Kubernetes Operator? + +## The Problem Operators Solve + +Kubernetes is great at managing stateless applications. But complex, stateful applications like databases need: +- **Domain knowledge** - How to properly scale, backup, upgrade +- **Operational procedures** - What to do when things go wrong +- **Configuration management** - Complex interdependent settings + +Traditionally, this required a human operator (SRE/DevOps engineer) who: +1. Understands the application deeply +2. Monitors the application +3. Takes corrective actions when needed +4. Performs upgrades, backups, scaling + +**An Operator automates this human operator's knowledge into software!** + +--- + +## The Operator Pattern + +An Operator is a Kubernetes controller that: +1. **Extends Kubernetes** with Custom Resource Definitions (CRDs) +2. **Watches** for changes to Custom Resources +3. **Reconciles** the actual state to match the desired state +4. **Encodes operational knowledge** in code + +``` +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚ Operator Pattern β”‚ +β”‚ β”‚ +β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ +β”‚ β”‚ Custom β”‚ Watches β”‚ Controller β”‚ β”‚ +β”‚ β”‚ Resource │◄──────────────────►│ (Reconciler) β”‚ β”‚ +β”‚ β”‚ (CR) β”‚ β”‚ β”‚ β”‚ +β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ +β”‚ β”‚ β”‚ β”‚ +β”‚ β”‚ Desired State β”‚ Creates/ β”‚ +β”‚ β”‚ β”‚ Updates/ β”‚ +β”‚ β”‚ β”‚ Deletes β”‚ +β”‚ β”‚ β–Ό β”‚ +β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ +β”‚ β”‚ β”‚ Kubernetes β”‚ β”‚ +β”‚ β”‚ β”‚ Resources β”‚ β”‚ +β”‚ β”‚ β”‚ (Pods, Svcs, β”‚ β”‚ +β”‚ └───────────────────────────►│ ConfigMaps) β”‚ β”‚ +β”‚ Actual State β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ +β”‚ β”‚ +β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ +``` + +--- + +## Key Components of an Operator + +### 1. Custom Resource Definition (CRD) + +A CRD tells Kubernetes about a new resource type: + +```yaml +apiVersion: apiextensions.k8s.io/v1 +kind: CustomResourceDefinition +metadata: + name: druids.druid.apache.org +spec: + group: druid.apache.org + versions: + - name: v1alpha1 + served: true + storage: true + schema: + openAPIV3Schema: + type: object + properties: + spec: + type: object + properties: + image: + type: string + nodes: + type: object + scope: Namespaced + names: + plural: druids + singular: druid + kind: Druid +``` + +After applying this CRD, Kubernetes understands what a "Druid" is! + +### 2. Custom Resource (CR) + +A CR is an instance of a CRD - your actual configuration: + +```yaml +apiVersion: druid.apache.org/v1alpha1 +kind: Druid +metadata: + name: my-druid-cluster +spec: + image: apache/druid:25.0.0 + nodes: + brokers: + nodeType: broker + replicas: 2 +``` + +### 3. Controller + +The controller watches CRs and takes action: + +```go +// Simplified controller logic +func (r *DruidReconciler) Reconcile(ctx context.Context, req Request) (Result, error) { + // 1. Get the Druid CR + druid := &v1alpha1.Druid{} + err := r.Get(ctx, req.NamespacedName, druid) + + // 2. Create/Update Kubernetes resources based on CR + // - Create ConfigMaps for Druid configuration + // - Create StatefulSets for Druid nodes + // - Create Services for networking + + // 3. Return and requeue after some time + return Result{RequeueAfter: 10 * time.Second}, nil +} +``` + +--- + +## The Reconciliation Loop + +The heart of an operator is the **reconciliation loop**: + +``` +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚ Reconciliation Loop β”‚ +β”‚ β”‚ +β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ +β”‚ β”‚ Event β”‚ (CR created, updated, deleted, β”‚ +β”‚ β”‚ Trigger β”‚ or periodic requeue) β”‚ +β”‚ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β”‚ +β”‚ β”‚ β”‚ +β”‚ β–Ό β”‚ +β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ +β”‚ β”‚ Fetch β”‚ Get the CR from Kubernetes API β”‚ +β”‚ β”‚ CR β”‚ β”‚ +β”‚ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β”‚ +β”‚ β”‚ β”‚ +β”‚ β–Ό β”‚ +β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ +β”‚ β”‚ Compare β”‚ Desired State (CR) vs Actual State (K8s) β”‚ +β”‚ β”‚ States β”‚ β”‚ +β”‚ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β”‚ +β”‚ β”‚ β”‚ +β”‚ β–Ό β”‚ +β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ +β”‚ β”‚ Take β”‚ Create, Update, or Delete resources β”‚ +β”‚ β”‚ Action β”‚ β”‚ +β”‚ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β”‚ +β”‚ β”‚ β”‚ +β”‚ β–Ό β”‚ +β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ +β”‚ β”‚ Requeue β”‚ Schedule next reconciliation β”‚ +β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ +β”‚ β”‚ +β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ +``` + +### Key Principles + +1. **Idempotent** - Running reconcile multiple times has the same effect as running once +2. **Level-triggered** - Reacts to current state, not events +3. **Eventually consistent** - May take multiple reconciliations to reach desired state + +--- + +## What Makes a Good Operator? + +### Level 1: Basic Install +- Automated application provisioning +- Basic configuration + +### Level 2: Seamless Upgrades +- Patch and minor version upgrades +- Rolling updates + +### Level 3: Full Lifecycle +- Backup and restore +- Application-aware scaling + +### Level 4: Deep Insights +- Metrics and alerts +- Log processing + +### Level 5: Auto Pilot +- Auto-scaling +- Auto-healing +- Auto-tuning + +**This Druid Operator is approximately Level 3** - it handles installation, upgrades, scaling, and some self-healing. + +--- + +## Operator Frameworks + +Several frameworks help build operators: + +| Framework | Language | Complexity | Use Case | +|-----------|----------|------------|----------| +| **Kubebuilder** | Go | Medium | Production operators | +| **Operator SDK** | Go/Ansible/Helm | Medium | Various approaches | +| **Metacontroller** | Any (webhooks) | Low | Simple operators | +| **KUDO** | YAML | Low | Stateful apps | + +**This project uses Kubebuilder** - the most common choice for Go operators. + +--- + +## Operator vs Helm Chart + +| Aspect | Helm Chart | Operator | +|--------|------------|----------| +| **What it is** | Package manager | Controller | +| **When it runs** | At install time | Continuously | +| **Day 2 operations** | Manual | Automated | +| **Self-healing** | No | Yes | +| **Upgrades** | Manual | Can be automated | +| **Complexity** | Lower | Higher | + +**Use Helm when:** +- Simple, stateless applications +- One-time deployment +- No special operational needs + +**Use Operator when:** +- Complex, stateful applications +- Need continuous management +- Application-specific operational knowledge + +--- + +## Real-World Example: Druid Operator + +Without operator (manual): +```bash +# Create namespace +kubectl create namespace druid + +# Create ConfigMaps (multiple files) +kubectl apply -f common-config.yaml +kubectl apply -f broker-config.yaml +kubectl apply -f historical-config.yaml +# ... more configs + +# Create Services +kubectl apply -f broker-service.yaml +kubectl apply -f historical-service.yaml +# ... more services + +# Create StatefulSets +kubectl apply -f broker-statefulset.yaml +kubectl apply -f historical-statefulset.yaml +# ... more statefulsets + +# Monitor and fix issues manually +# Upgrade manually +# Scale manually +``` + +With operator: +```bash +# Install operator (once) +helm install druid-operator datainfra/druid-operator + +# Deploy Druid cluster +kubectl apply -f my-druid-cluster.yaml + +# That's it! Operator handles everything else. +``` + +--- + +## Next Steps + +Continue to [Custom Resources](../06-custom-resources/README.md) to understand CRDs and CRs in detail. diff --git a/learning-docs/06-custom-resources/README.md b/learning-docs/06-custom-resources/README.md new file mode 100644 index 00000000..ab87b396 --- /dev/null +++ b/learning-docs/06-custom-resources/README.md @@ -0,0 +1,363 @@ +# 6. Custom Resources (CR) and Custom Resource Definitions (CRD) + +## What is a CRD? + +A Custom Resource Definition (CRD) extends the Kubernetes API with new resource types. It's like teaching Kubernetes a new language. + +**Before CRD:** Kubernetes only knows about Pods, Services, Deployments, etc. +**After CRD:** Kubernetes also knows about Druid, DruidIngestion, etc. + +--- + +## CRDs in This Project + +This project defines two CRDs: + +### 1. Druid CRD +Represents a Druid cluster. + +**Location:** `config/crd/bases/druid.apache.org_druids.yaml` + +### 2. DruidIngestion CRD +Represents a data ingestion job. + +**Location:** `config/crd/bases/druid.apache.org_druidingestions.yaml` + +--- + +## Anatomy of a CRD + +```yaml +apiVersion: apiextensions.k8s.io/v1 +kind: CustomResourceDefinition +metadata: + name: druids.druid.apache.org # plural.group +spec: + group: druid.apache.org # API group + names: + kind: Druid # Resource kind + listKind: DruidList # List kind + plural: druids # Plural name (used in URLs) + singular: druid # Singular name + scope: Namespaced # Namespaced or Cluster + versions: + - name: v1alpha1 # Version + served: true # Is this version served? + storage: true # Is this the storage version? + schema: + openAPIV3Schema: # Validation schema + type: object + properties: + spec: + type: object + # ... field definitions +``` + +--- + +## Go Type Definitions + +CRDs are defined in Go code, then generated into YAML. + +**Location:** `apis/druid/v1alpha1/druid_types.go` + +### The Druid Type + +```go +// Druid is the Schema for the druids API +// +kubebuilder:object:root=true +// +kubebuilder:subresource:status +type Druid struct { + metav1.TypeMeta `json:",inline"` + metav1.ObjectMeta `json:"metadata,omitempty"` + + Spec DruidSpec `json:"spec"` + Status DruidClusterStatus `json:"status,omitempty"` +} +``` + +**Key Parts:** + +| Field | Purpose | +|-------|---------| +| `TypeMeta` | API version and kind | +| `ObjectMeta` | Name, namespace, labels, annotations | +| `Spec` | Desired state (what user wants) | +| `Status` | Actual state (what exists) | + +### The DruidSpec Type + +```go +type DruidSpec struct { + // Image for druid + // +required + Image string `json:"image,omitempty"` + + // CommonRuntimeProperties - common configuration + // +required + CommonRuntimeProperties string `json:"common.runtime.properties"` + + // Nodes - list of Druid node types + // +required + Nodes map[string]DruidNodeSpec `json:"nodes"` + + // RollingDeploy - enable rolling updates + // +optional + // +kubebuilder:default:=true + RollingDeploy bool `json:"rollingDeploy"` + + // ... many more fields +} +``` + +### The DruidNodeSpec Type + +```go +type DruidNodeSpec struct { + // NodeType - broker, historical, coordinator, etc. + // +required + // +kubebuilder:validation:Enum:=historical;overlord;middleManager;indexer;broker;coordinator;router + NodeType string `json:"nodeType"` + + // DruidPort - port for the node + // +required + DruidPort int32 `json:"druid.port"` + + // Replicas - number of replicas + // +optional + Replicas int32 `json:"replicas"` + + // RuntimeProperties - node-specific configuration + // +required + RuntimeProperties string `json:"runtime.properties"` + + // ... many more fields +} +``` + +--- + +## Kubebuilder Markers + +The comments starting with `+kubebuilder:` are special markers: + +| Marker | Purpose | +|--------|---------| +| `+kubebuilder:object:root=true` | This is a root object (has its own API endpoint) | +| `+kubebuilder:subresource:status` | Enable status subresource | +| `+kubebuilder:validation:Enum` | Allowed values | +| `+kubebuilder:default` | Default value | +| `+required` | Field is required | +| `+optional` | Field is optional | + +--- + +## Example Custom Resource + +Here's a complete Druid CR: + +```yaml +apiVersion: druid.apache.org/v1alpha1 +kind: Druid +metadata: + name: tiny-cluster + namespace: druid +spec: + image: apache/druid:25.0.0 + startScript: /druid.sh + + # Common configuration for all nodes + commonConfigMountPath: "/opt/druid/conf/druid/cluster/_common" + + common.runtime.properties: | + druid.zk.service.host=zookeeper:2181 + druid.metadata.storage.type=derby + druid.storage.type=local + druid.extensions.loadList=["druid-kafka-indexing-service"] + + jvm.options: |- + -server + -XX:MaxDirectMemorySize=10240g + -Duser.timezone=UTC + + # Node definitions + nodes: + # Broker nodes + brokers: + nodeType: broker + druid.port: 8088 + replicas: 2 + nodeConfigMountPath: "/opt/druid/conf/druid/cluster/query/broker" + runtime.properties: | + druid.service=druid/broker + druid.broker.http.numConnections=5 + extra.jvm.options: |- + -Xmx512M + -Xms512M + + # Historical nodes + historicals: + nodeType: historical + druid.port: 8088 + replicas: 3 + nodeConfigMountPath: "/opt/druid/conf/druid/cluster/data/historical" + runtime.properties: | + druid.service=druid/historical + druid.segmentCache.locations=[{"path":"/druid/data/segments","maxSize":10737418240}] + volumeClaimTemplates: + - metadata: + name: data-volume + spec: + accessModes: ["ReadWriteOnce"] + resources: + requests: + storage: 10Gi + + # Coordinator nodes + coordinators: + nodeType: coordinator + druid.port: 8088 + replicas: 1 + nodeConfigMountPath: "/opt/druid/conf/druid/cluster/master/coordinator-overlord" + runtime.properties: | + druid.service=druid/coordinator + druid.coordinator.asOverlord.enabled=true +``` + +--- + +## DruidIngestion CR + +For data ingestion: + +```yaml +apiVersion: druid.apache.org/v1alpha1 +kind: DruidIngestion +metadata: + name: kafka-ingestion +spec: + suspend: false + druidCluster: tiny-cluster # Reference to Druid CR + ingestion: + type: kafka + nativeSpec: + type: kafka + spec: + dataSchema: + dataSource: my-data + timestampSpec: + column: timestamp + format: auto + ioConfig: + topic: my-topic + consumerProperties: + bootstrap.servers: kafka:9092 +``` + +--- + +## How CRs are Processed + +``` +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚ CR Processing Flow β”‚ +β”‚ β”‚ +β”‚ 1. User creates CR β”‚ +β”‚ kubectl apply -f druid.yaml β”‚ +β”‚ β”‚ β”‚ +β”‚ β–Ό β”‚ +β”‚ 2. API Server validates CR against CRD schema β”‚ +β”‚ - Checks required fields β”‚ +β”‚ - Validates enum values β”‚ +β”‚ - Applies defaults β”‚ +β”‚ β”‚ β”‚ +β”‚ β–Ό β”‚ +β”‚ 3. CR stored in etcd β”‚ +β”‚ β”‚ β”‚ +β”‚ β–Ό β”‚ +β”‚ 4. Controller receives event β”‚ +β”‚ - Watch detects new/updated CR β”‚ +β”‚ β”‚ β”‚ +β”‚ β–Ό β”‚ +β”‚ 5. Reconcile function called β”‚ +β”‚ - Reads CR spec β”‚ +β”‚ - Creates/updates K8s resources β”‚ +β”‚ - Updates CR status β”‚ +β”‚ β”‚ +β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ +``` + +--- + +## Status Subresource + +The status subresource tracks the actual state: + +```go +type DruidClusterStatus struct { + DruidNodeStatus DruidNodeTypeStatus `json:"druidNodeStatus,omitempty"` + StatefulSets []string `json:"statefulSets,omitempty"` + Deployments []string `json:"deployments,omitempty"` + Services []string `json:"services,omitempty"` + ConfigMaps []string `json:"configMaps,omitempty"` + Pods []string `json:"pods,omitempty"` + PersistentVolumeClaims []string `json:"persistentVolumeClaims,omitempty"` +} +``` + +**Why separate status?** +- Users update `spec` (desired state) +- Controller updates `status` (actual state) +- Prevents conflicts between user and controller + +--- + +## Owner References + +When the operator creates resources, it sets owner references: + +```yaml +apiVersion: apps/v1 +kind: StatefulSet +metadata: + name: druid-tiny-cluster-brokers + ownerReferences: + - apiVersion: druid.apache.org/v1alpha1 + kind: Druid + name: tiny-cluster + uid: abc-123-def + controller: true +``` + +**Benefits:** +- Automatic garbage collection (delete Druid CR β†’ all resources deleted) +- Clear ownership hierarchy +- Prevents orphaned resources + +--- + +## Finalizers + +Finalizers ensure cleanup before deletion: + +```yaml +apiVersion: druid.apache.org/v1alpha1 +kind: Druid +metadata: + name: tiny-cluster + finalizers: + - druid.apache.org/finalizer +``` + +**Flow:** +1. User deletes CR: `kubectl delete druid tiny-cluster` +2. K8s sets `deletionTimestamp` but doesn't delete +3. Controller sees `deletionTimestamp`, runs cleanup +4. Controller removes finalizer +5. K8s actually deletes the CR + +--- + +## Next Steps + +Continue to [Project Structure](../07-project-structure/README.md) to understand how the codebase is organized. diff --git a/learning-docs/07-project-structure/README.md b/learning-docs/07-project-structure/README.md new file mode 100644 index 00000000..66369b75 --- /dev/null +++ b/learning-docs/07-project-structure/README.md @@ -0,0 +1,330 @@ +# 7. Project Structure + +## Directory Overview + +``` +druid-operator/ +β”œβ”€β”€ apis/ # API definitions (CRDs in Go) +β”‚ └── druid/ +β”‚ └── v1alpha1/ # Version v1alpha1 +β”‚ β”œβ”€β”€ druid_types.go # Druid CR definition +β”‚ β”œβ”€β”€ druidingestion_types.go # DruidIngestion CR definition +β”‚ β”œβ”€β”€ groupversion_info.go # API group registration +β”‚ └── zz_generated.deepcopy.go # Auto-generated deep copy +β”‚ +β”œβ”€β”€ controllers/ # Controller logic +β”‚ β”œβ”€β”€ druid/ # Druid controller +β”‚ β”‚ β”œβ”€β”€ druid_controller.go # Main reconciler +β”‚ β”‚ β”œβ”€β”€ handler.go # Resource creation logic +β”‚ β”‚ β”œβ”€β”€ finalizers.go # Cleanup logic +β”‚ β”‚ β”œβ”€β”€ status.go # Status updates +β”‚ β”‚ └── ... +β”‚ └── ingestion/ # Ingestion controller +β”‚ β”œβ”€β”€ ingestion_controller.go +β”‚ └── reconciler.go +β”‚ +β”œβ”€β”€ config/ # Kubernetes manifests +β”‚ β”œβ”€β”€ crd/ # CRD definitions +β”‚ β”‚ └── bases/ # Generated CRD YAML +β”‚ β”œβ”€β”€ rbac/ # RBAC (permissions) +β”‚ β”œβ”€β”€ manager/ # Operator deployment +β”‚ └── samples/ # Example CRs +β”‚ +β”œβ”€β”€ chart/ # Helm chart +β”‚ β”œβ”€β”€ Chart.yaml +β”‚ β”œβ”€β”€ values.yaml +β”‚ β”œβ”€β”€ templates/ +β”‚ └── crds/ +β”‚ +β”œβ”€β”€ pkg/ # Shared packages +β”‚ β”œβ”€β”€ druidapi/ # Druid API client +β”‚ β”œβ”€β”€ http/ # HTTP utilities +β”‚ └── util/ # General utilities +β”‚ +β”œβ”€β”€ examples/ # Example configurations +β”œβ”€β”€ docs/ # Documentation +β”œβ”€β”€ e2e/ # End-to-end tests +β”œβ”€β”€ main.go # Entry point +β”œβ”€β”€ go.mod # Go module definition +└── Makefile # Build commands +``` + +--- + +## Key Files Explained + +### 1. main.go - Entry Point + +```go +func main() { + // 1. Parse command-line flags + flag.Parse() + + // 2. Create the manager + mgr, err := ctrl.NewManager(ctrl.GetConfigOrDie(), ctrl.Options{ + Scheme: scheme, + MetricsBindAddress: metricsAddr, + // ... + }) + + // 3. Register controllers + if err = (druid.NewDruidReconciler(mgr)).SetupWithManager(mgr); err != nil { + // Handle error + } + + if err = (ingestion.NewDruidIngestionReconciler(mgr)).SetupWithManager(mgr); err != nil { + // Handle error + } + + // 4. Start the manager + if err := mgr.Start(ctrl.SetupSignalHandler()); err != nil { + // Handle error + } +} +``` + +**What it does:** +- Creates a controller-runtime Manager +- Registers both controllers (Druid and DruidIngestion) +- Starts the manager (which starts all controllers) + +--- + +### 2. apis/druid/v1alpha1/druid_types.go - CR Definition + +```go +// DruidSpec defines the desired state of Druid +type DruidSpec struct { + Image string `json:"image,omitempty"` + CommonRuntimeProperties string `json:"common.runtime.properties"` + Nodes map[string]DruidNodeSpec `json:"nodes"` + RollingDeploy bool `json:"rollingDeploy"` + // ... many more fields +} + +// Druid is the Schema for the druids API +type Druid struct { + metav1.TypeMeta `json:",inline"` + metav1.ObjectMeta `json:"metadata,omitempty"` + Spec DruidSpec `json:"spec"` + Status DruidClusterStatus `json:"status,omitempty"` +} +``` + +**What it does:** +- Defines the structure of a Druid CR +- Used to generate CRD YAML +- Used by controller to work with Druid objects + +--- + +### 3. controllers/druid/druid_controller.go - Main Controller + +```go +// DruidReconciler reconciles a Druid object +type DruidReconciler struct { + client.Client + Log logr.Logger + Scheme *runtime.Scheme + ReconcileWait time.Duration + Recorder record.EventRecorder +} + +// Reconcile is the main reconciliation function +func (r *DruidReconciler) Reconcile(ctx context.Context, request reconcile.Request) (ctrl.Result, error) { + // 1. Fetch the Druid instance + instance := &druidv1alpha1.Druid{} + err := r.Get(ctx, request.NamespacedName, instance) + if err != nil { + if errors.IsNotFound(err) { + return ctrl.Result{}, nil // CR deleted, nothing to do + } + return ctrl.Result{}, err + } + + // 2. Deploy the Druid cluster + if err := deployDruidCluster(ctx, r.Client, instance, emitEvent); err != nil { + return ctrl.Result{}, err + } + + // 3. Requeue after wait time + return ctrl.Result{RequeueAfter: r.ReconcileWait}, nil +} +``` + +**What it does:** +- Watches for Druid CR changes +- Calls `deployDruidCluster` to create/update resources +- Requeues for periodic reconciliation + +--- + +### 4. controllers/druid/handler.go - Resource Creation + +```go +func deployDruidCluster(ctx context.Context, sdk client.Client, m *v1alpha1.Druid, emitEvents EventEmitter) error { + // 1. Validate the spec + if err := verifyDruidSpec(m); err != nil { + return nil + } + + // 2. Get node specs in order + allNodeSpecs := getNodeSpecsByOrder(m) + + // 3. Create common ConfigMap + if _, err := sdkCreateOrUpdateAsNeeded(ctx, sdk, + func() (object, error) { return makeCommonConfigMap(ctx, sdk, m, ls) }, + // ... + ); err != nil { + return err + } + + // 4. For each node type, create resources + for _, elem := range allNodeSpecs { + // Create node ConfigMap + // Create Services + // Create StatefulSet or Deployment + // Create PodDisruptionBudget + // Create HPA + // Create Ingress + } + + // 5. Delete unused resources + // 6. Update status + + return nil +} +``` + +**What it does:** +- Creates all Kubernetes resources for a Druid cluster +- Handles create, update, and delete operations +- Manages rolling updates + +--- + +### 5. controllers/ingestion/reconciler.go - Ingestion Logic + +```go +func (r *DruidIngestionReconciler) do(ctx context.Context, di *v1alpha1.DruidIngestion) error { + // 1. Get Druid router service URL + svcName, err := druidapi.GetRouterSvcUrl(di.Namespace, di.Spec.DruidClusterName, r.Client) + + // 2. Create or update ingestion task + _, err = r.CreateOrUpdate(di, svcName, *build, auth) + + // 3. Handle finalizers for cleanup + if di.ObjectMeta.DeletionTimestamp.IsZero() { + // Add finalizer if not present + } else { + // Cleanup: shutdown ingestion task + } + + return nil +} +``` + +**What it does:** +- Manages Druid ingestion tasks via Druid's REST API +- Creates/updates/deletes ingestion supervisors +- Handles compaction and rules + +--- + +## Package Dependencies + +``` +main.go + β”‚ + β”œβ”€β”€ apis/druid/v1alpha1 + β”‚ └── Druid, DruidIngestion types + β”‚ + β”œβ”€β”€ controllers/druid + β”‚ β”œβ”€β”€ druid_controller.go (Reconciler) + β”‚ β”œβ”€β”€ handler.go (Resource creation) + β”‚ β”œβ”€β”€ finalizers.go (Cleanup) + β”‚ └── status.go (Status updates) + β”‚ + β”œβ”€β”€ controllers/ingestion + β”‚ β”œβ”€β”€ ingestion_controller.go (Reconciler) + β”‚ └── reconciler.go (Ingestion logic) + β”‚ + └── pkg/ + β”œβ”€β”€ druidapi/ (Druid REST API client) + β”œβ”€β”€ http/ (HTTP utilities) + └── util/ (General utilities) +``` + +--- + +## Configuration Files + +### config/crd/bases/druid.apache.org_druids.yaml +Generated CRD for Druid. Created by running `make manifests`. + +### config/rbac/role.yaml +RBAC permissions the operator needs: +```yaml +rules: +- apiGroups: ["druid.apache.org"] + resources: ["druids"] + verbs: ["get", "list", "watch", "create", "update", "patch", "delete"] +- apiGroups: ["apps"] + resources: ["statefulsets", "deployments"] + verbs: ["get", "list", "watch", "create", "update", "patch", "delete"] +# ... more permissions +``` + +### config/manager/manager.yaml +Deployment for the operator itself. + +--- + +## Helm Chart Structure + +``` +chart/ +β”œβ”€β”€ Chart.yaml # Chart metadata +β”œβ”€β”€ values.yaml # Default values +β”œβ”€β”€ crds/ # CRD definitions +β”‚ β”œβ”€β”€ druid.apache.org_druids.yaml +β”‚ └── druid.apache.org_druidingestions.yaml +└── templates/ # Kubernetes manifests + β”œβ”€β”€ deployment.yaml # Operator deployment + β”œβ”€β”€ service.yaml # Operator service + β”œβ”€β”€ rbac_*.yaml # RBAC resources + └── service_account.yaml +``` + +--- + +## Build Commands (Makefile) + +```bash +# Generate CRD manifests from Go types +make manifests + +# Generate deep copy functions +make generate + +# Build the operator binary +make build + +# Build Docker image +make docker-build + +# Run tests +make test + +# Deploy to cluster +make deploy + +# Install CRDs only +make install +``` + +--- + +## Next Steps + +Continue to [The Reconciliation Loop](../08-reconciliation-loop/README.md) to understand the core operator logic in detail. diff --git a/learning-docs/08-reconciliation-loop/README.md b/learning-docs/08-reconciliation-loop/README.md new file mode 100644 index 00000000..c8b961a5 --- /dev/null +++ b/learning-docs/08-reconciliation-loop/README.md @@ -0,0 +1,389 @@ +# 8. The Reconciliation Loop + +## What is Reconciliation? + +Reconciliation is the process of making the **actual state** match the **desired state**. + +``` +Desired State (CR spec) β†’ Reconcile β†’ Actual State (K8s resources) +``` + +The reconciler runs in a loop: +1. **Observe** - Read the current state +2. **Analyze** - Compare with desired state +3. **Act** - Create, update, or delete resources +4. **Repeat** - Requeue and do it again + +--- + +## The Reconcile Function + +**Location:** `controllers/druid/druid_controller.go` + +```go +func (r *DruidReconciler) Reconcile(ctx context.Context, request reconcile.Request) (ctrl.Result, error) { + // request contains the name and namespace of the CR that triggered reconciliation + + // Step 1: Fetch the Druid CR + instance := &druidv1alpha1.Druid{} + err := r.Get(ctx, request.NamespacedName, instance) + if err != nil { + if errors.IsNotFound(err) { + // CR was deleted, nothing to do + // Owned resources will be garbage collected automatically + return ctrl.Result{}, nil + } + // Error reading the object - requeue + return ctrl.Result{}, err + } + + // Step 2: Initialize event emitter for logging + var emitEvent EventEmitter = EmitEventFuncs{r.Recorder} + + // Step 3: Deploy/Update the Druid cluster + if err := deployDruidCluster(ctx, r.Client, instance, emitEvent); err != nil { + return ctrl.Result{}, err + } + + // Step 4: Update dynamic configurations + if err := updateDruidDynamicConfigs(ctx, r.Client, instance, emitEvent); err != nil { + return ctrl.Result{}, err + } + + // Step 5: Requeue after wait time (default 10 seconds) + return ctrl.Result{RequeueAfter: r.ReconcileWait}, nil +} +``` + +--- + +## Return Values + +The `Reconcile` function returns `(ctrl.Result, error)`: + +| Return | Meaning | +|--------|---------| +| `Result{}, nil` | Success, don't requeue | +| `Result{Requeue: true}, nil` | Success, requeue immediately | +| `Result{RequeueAfter: 10s}, nil` | Success, requeue after 10 seconds | +| `Result{}, err` | Error, requeue with backoff | + +--- + +## The deployDruidCluster Function + +**Location:** `controllers/druid/handler.go` + +This is where the magic happens: + +```go +func deployDruidCluster(ctx context.Context, sdk client.Client, m *v1alpha1.Druid, emitEvents EventEmitter) error { + + // 1. VALIDATE the spec + if err := verifyDruidSpec(m); err != nil { + emitEvents.EmitEventGeneric(m, "DruidOperatorInvalidSpec", "", err) + return nil // Don't retry invalid specs + } + + // 2. GET NODE SPECS in correct order + // Order matters for rolling updates! + allNodeSpecs := getNodeSpecsByOrder(m) + // Returns: historicals β†’ middleManagers β†’ indexers β†’ brokers β†’ coordinators β†’ overlords β†’ routers + + // 3. TRACK created resources + statefulSetNames := make(map[string]bool) + serviceNames := make(map[string]bool) + configMapNames := make(map[string]bool) + // ... more maps + + // 4. CREATE COMMON CONFIGMAP + // Contains common.runtime.properties shared by all nodes + commonConfig, err := makeCommonConfigMap(ctx, sdk, m, ls) + commonConfigSHA, _ := getObjectHash(commonConfig) // For change detection + + sdkCreateOrUpdateAsNeeded(ctx, sdk, + func() (object, error) { return makeCommonConfigMap(ctx, sdk, m, ls) }, + func() object { return &v1.ConfigMap{} }, + // ... + ) + + // 5. HANDLE DELETION + if m.GetDeletionTimestamp() != nil { + return executeFinalizers(ctx, sdk, m, emitEvents) + } + + // 6. UPDATE FINALIZERS + if err := updateFinalizers(ctx, sdk, m, emitEvents); err != nil { + return err + } + + // 7. FOR EACH NODE TYPE + for _, elem := range allNodeSpecs { + key := elem.key + nodeSpec := elem.spec + + // Create unique identifier + nodeSpecUniqueStr := makeNodeSpecificUniqueString(m, key) + // e.g., "druid-tiny-cluster-brokers" + + // Create labels + lm := makeLabelsForNodeSpec(&nodeSpec, m, m.Name, nodeSpecUniqueStr) + + // 7a. Create node ConfigMap + nodeConfig, _ := makeConfigMapForNodeSpec(&nodeSpec, m, lm, nodeSpecUniqueStr) + sdkCreateOrUpdateAsNeeded(ctx, sdk, ...) + + // 7b. Create Services + for _, svc := range services { + sdkCreateOrUpdateAsNeeded(ctx, sdk, + func() (object, error) { return makeService(&svc, &nodeSpec, m, lm, nodeSpecUniqueStr) }, + // ... + ) + } + + // 7c. Create StatefulSet or Deployment + if nodeSpec.Kind == "Deployment" { + sdkCreateOrUpdateAsNeeded(ctx, sdk, + func() (object, error) { return makeDeployment(&nodeSpec, m, ...) }, + // ... + ) + } else { + // Default: StatefulSet + sdkCreateOrUpdateAsNeeded(ctx, sdk, + func() (object, error) { return makeStatefulSet(&nodeSpec, m, ...) }, + // ... + ) + } + + // 7d. Create optional resources + // - Ingress + // - PodDisruptionBudget + // - HorizontalPodAutoscaler + // - PersistentVolumeClaims + } + + // 8. DELETE UNUSED RESOURCES + // If a node was removed from spec, delete its resources + deleteUnusedResources(ctx, sdk, m, statefulSetNames, ...) + + // 9. UPDATE STATUS + updatedStatus := v1alpha1.DruidClusterStatus{ + StatefulSets: ..., + Services: ..., + ConfigMaps: ..., + Pods: ..., + } + druidClusterStatusPatcher(ctx, sdk, updatedStatus, m, emitEvents) + + return nil +} +``` + +--- + +## Create or Update Pattern + +The `sdkCreateOrUpdateAsNeeded` function handles idempotent resource management: + +```go +func sdkCreateOrUpdateAsNeeded( + ctx context.Context, + sdk client.Client, + objFn func() (object, error), // Function to create the desired object + emptyObjFn func() object, // Function to create empty object for Get + isEqualFn func(prev, curr object) bool, // Compare function + updaterFn func(prev, curr object), // Update function + drd *v1alpha1.Druid, + names map[string]bool, + emitEvent EventEmitter, +) (DruidNodeStatus, error) { + + // 1. Create the desired object + obj, err := objFn() + names[obj.GetName()] = true // Track this resource + + // 2. Add owner reference (for garbage collection) + addOwnerRefToObject(obj, asOwner(drd)) + + // 3. Add hash annotation (for change detection) + addHashToObject(obj) + + // 4. Try to get existing object + prevObj := emptyObjFn() + err := sdk.Get(ctx, namespacedName, prevObj) + + if err != nil && apierrors.IsNotFound(err) { + // 5a. Object doesn't exist - CREATE it + return writers.Create(ctx, sdk, drd, obj, emitEvent) + } + + // 5b. Object exists - check if UPDATE needed + if obj.GetAnnotations()[druidOpResourceHash] != prevObj.GetAnnotations()[druidOpResourceHash] { + // Hash changed - UPDATE + obj.SetResourceVersion(prevObj.GetResourceVersion()) + return writers.Update(ctx, sdk, drd, obj, emitEvent) + } + + // 5c. No change needed + return "", nil +} +``` + +--- + +## Rolling Updates + +The operator supports Druid's recommended rolling update order: + +```go +func getNodeSpecsByOrder(m *v1alpha1.Druid) []keyAndNodeSpec { + // Order defined by Druid documentation: + // https://druid.apache.org/docs/latest/operations/rolling-updates.html + + nodeSpecsByOrder := []string{ + "historicals", // 1. Update historicals first + "middleManagers", // 2. Then middle managers + "indexers", // 3. Then indexers + "brokers", // 4. Then brokers + "coordinators", // 5. Then coordinators + "overlords", // 6. Then overlords + "routers", // 7. Finally routers + } + + // Return specs in this order +} +``` + +During rolling updates: +```go +if m.Spec.RollingDeploy { + // Check if previous node type is fully deployed + done, err := isObjFullyDeployed(ctx, sdk, nodeSpec, ...) + if !done { + return err // Wait for previous to complete + } +} +``` + +--- + +## Finalizers + +Finalizers ensure cleanup when a Druid CR is deleted: + +```go +func executeFinalizers(ctx context.Context, sdk client.Client, m *v1alpha1.Druid, emitEvents EventEmitter) error { + // 1. Check if finalizer exists + if !controllerutil.ContainsFinalizer(m, finalizerName) { + return nil + } + + // 2. Perform cleanup + // - Delete PVCs if configured + // - Any other cleanup needed + + // 3. Remove finalizer + controllerutil.RemoveFinalizer(m, finalizerName) + if err := sdk.Update(ctx, m); err != nil { + return err + } + + return nil +} +``` + +--- + +## Status Updates + +The operator updates the CR status to reflect actual state: + +```go +func druidClusterStatusPatcher(ctx context.Context, sdk client.Client, updatedStatus v1alpha1.DruidClusterStatus, m *v1alpha1.Druid, emitEvents EventEmitter) error { + // Get current CR + currentDruid := &v1alpha1.Druid{} + sdk.Get(ctx, types.NamespacedName{Name: m.Name, Namespace: m.Namespace}, currentDruid) + + // Update status + currentDruid.Status = updatedStatus + + // Patch status subresource + return sdk.Status().Update(ctx, currentDruid) +} +``` + +--- + +## Event Flow Diagram + +``` +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚ Reconciliation Flow β”‚ +β”‚ β”‚ +β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ +β”‚ β”‚ Event β”‚ CR created/updated/deleted β”‚ +β”‚ β”‚ (Watch) β”‚ or periodic requeue β”‚ +β”‚ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β”‚ +β”‚ β”‚ β”‚ +β”‚ β–Ό β”‚ +β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ +β”‚ β”‚ Reconcile() β”‚ β”‚ +β”‚ β”‚ β”‚ β”‚ +β”‚ β”‚ 1. Get Druid CR β”‚ β”‚ +β”‚ β”‚ └─► Not found? Return (garbage collection handles rest) β”‚ β”‚ +β”‚ β”‚ β”‚ β”‚ +β”‚ β”‚ 2. deployDruidCluster() β”‚ β”‚ +β”‚ β”‚ β”œβ”€β–Ί Validate spec β”‚ β”‚ +β”‚ β”‚ β”œβ”€β–Ί Check deletion timestamp β”‚ β”‚ +β”‚ β”‚ β”‚ └─► If deleting, run finalizers β”‚ β”‚ +β”‚ β”‚ β”œβ”€β–Ί Create common ConfigMap β”‚ β”‚ +β”‚ β”‚ β”œβ”€β–Ί For each node type (in order): β”‚ β”‚ +β”‚ β”‚ β”‚ β”œβ”€β–Ί Create node ConfigMap β”‚ β”‚ +β”‚ β”‚ β”‚ β”œβ”€β–Ί Create Services β”‚ β”‚ +β”‚ β”‚ β”‚ β”œβ”€β–Ί Create StatefulSet/Deployment β”‚ β”‚ +β”‚ β”‚ β”‚ β”œβ”€β–Ί Wait for rollout (if rolling deploy) β”‚ β”‚ +β”‚ β”‚ β”‚ └─► Create optional resources (PDB, HPA, Ingress) β”‚ β”‚ +β”‚ β”‚ β”œβ”€β–Ί Delete unused resources β”‚ β”‚ +β”‚ β”‚ └─► Update status β”‚ β”‚ +β”‚ β”‚ β”‚ β”‚ +β”‚ β”‚ 3. Return Result{RequeueAfter: 10s} β”‚ β”‚ +β”‚ β”‚ β”‚ β”‚ +β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ +β”‚ β”‚ β”‚ +β”‚ β”‚ After 10 seconds β”‚ +β”‚ β–Ό β”‚ +β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ +β”‚ β”‚ Requeue β”‚ Start again β”‚ +β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ +β”‚ β”‚ +β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ +``` + +--- + +## Key Concepts + +### 1. Idempotency +Running reconcile multiple times produces the same result: +- If resource exists and matches, do nothing +- If resource exists but differs, update it +- If resource doesn't exist, create it + +### 2. Level-Triggered vs Edge-Triggered +- **Edge-triggered**: React to events (what changed) +- **Level-triggered**: React to state (what is) + +Operators are level-triggered - they look at current state, not events. + +### 3. Eventual Consistency +The system may take multiple reconciliations to reach desired state: +- First reconcile: Create ConfigMaps +- Second reconcile: Create StatefulSets +- Third reconcile: Wait for pods to be ready +- Fourth reconcile: All good, just monitor + +--- + +## Next Steps + +Continue to [Apache Druid Overview](../09-druid-overview/README.md) to understand what Druid is and why it needs an operator. diff --git a/learning-docs/09-druid-overview/README.md b/learning-docs/09-druid-overview/README.md new file mode 100644 index 00000000..d00a9cf6 --- /dev/null +++ b/learning-docs/09-druid-overview/README.md @@ -0,0 +1,329 @@ +# 9. Apache Druid Overview + +## What is Apache Druid? + +Apache Druid is a **real-time analytics database** designed for: +- **Fast queries** on large datasets (sub-second response times) +- **Real-time data ingestion** (streaming from Kafka, Kinesis) +- **High concurrency** (thousands of queries per second) +- **Time-series data** (events with timestamps) + +**Use cases:** +- Clickstream analytics +- Network monitoring +- IoT sensor data +- Business intelligence dashboards + +--- + +## Druid Architecture + +Druid is a **distributed system** with multiple node types: + +``` +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚ Druid Cluster β”‚ +β”‚ β”‚ +β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ +β”‚ β”‚ Master Nodes β”‚ β”‚ +β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ +β”‚ β”‚ β”‚ Coordinator β”‚ β”‚ Overlord β”‚ β”‚ β”‚ +β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ +β”‚ β”‚ β”‚ - Manages β”‚ β”‚ - Manages β”‚ β”‚ β”‚ +β”‚ β”‚ β”‚ segments β”‚ β”‚ ingestion β”‚ β”‚ β”‚ +β”‚ β”‚ β”‚ - Balances β”‚ β”‚ tasks β”‚ β”‚ β”‚ +β”‚ β”‚ β”‚ data β”‚ β”‚ β”‚ β”‚ β”‚ +β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ +β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ +β”‚ β”‚ +β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ +β”‚ β”‚ Query Nodes β”‚ β”‚ +β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ +β”‚ β”‚ β”‚ Broker β”‚ β”‚ Router β”‚ β”‚ β”‚ +β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ +β”‚ β”‚ β”‚ - Receives β”‚ β”‚ - Routes β”‚ β”‚ β”‚ +β”‚ β”‚ β”‚ queries β”‚ β”‚ requests β”‚ β”‚ β”‚ +β”‚ β”‚ β”‚ - Merges β”‚ β”‚ - API β”‚ β”‚ β”‚ +β”‚ β”‚ β”‚ results β”‚ β”‚ gateway β”‚ β”‚ β”‚ +β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ +β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ +β”‚ β”‚ +β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ +β”‚ β”‚ Data Nodes β”‚ β”‚ +β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ +β”‚ β”‚ β”‚ Historical β”‚ β”‚MiddleManagerβ”‚ β”‚ β”‚ +β”‚ β”‚ β”‚ β”‚ β”‚ /Indexer β”‚ β”‚ β”‚ +β”‚ β”‚ β”‚ - Stores β”‚ β”‚ β”‚ β”‚ β”‚ +β”‚ β”‚ β”‚ segments β”‚ β”‚ - Ingests β”‚ β”‚ β”‚ +β”‚ β”‚ β”‚ - Serves β”‚ β”‚ data β”‚ β”‚ β”‚ +β”‚ β”‚ β”‚ queries β”‚ β”‚ - Creates β”‚ β”‚ β”‚ +β”‚ β”‚ β”‚ β”‚ β”‚ segments β”‚ β”‚ β”‚ +β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ +β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ +β”‚ β”‚ +β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ +β”‚ β”‚ External Dependencies β”‚ β”‚ +β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ +β”‚ β”‚ β”‚ ZooKeeper β”‚ β”‚ Metadata β”‚ β”‚ Deep β”‚ β”‚ β”‚ +β”‚ β”‚ β”‚ β”‚ β”‚ Store β”‚ β”‚ Storage β”‚ β”‚ β”‚ +β”‚ β”‚ β”‚ - Service β”‚ β”‚ (MySQL/ β”‚ β”‚ (S3/HDFS/ β”‚ β”‚ β”‚ +β”‚ β”‚ β”‚ discovery β”‚ β”‚ PostgreSQL)β”‚ β”‚ local) β”‚ β”‚ β”‚ +β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ +β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ +β”‚ β”‚ +β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ +``` + +--- + +## Node Types Explained + +### 1. Coordinator +**Role:** Manages data availability and distribution + +**Responsibilities:** +- Assigns segments to Historical nodes +- Balances data across the cluster +- Manages data retention rules +- Handles segment compaction + +**Stateful:** Yes (needs to track segment assignments) + +### 2. Overlord +**Role:** Manages data ingestion tasks + +**Responsibilities:** +- Accepts ingestion task submissions +- Distributes tasks to MiddleManagers +- Monitors task status +- Handles task failures + +**Stateful:** Yes (tracks task state) + +**Note:** Often co-located with Coordinator (`coordinator.asOverlord.enabled=true`) + +### 3. Broker +**Role:** Query router and result merger + +**Responsibilities:** +- Receives queries from clients +- Determines which nodes have relevant data +- Fans out queries to Historical/MiddleManager nodes +- Merges partial results + +**Stateful:** No (can be scaled horizontally) + +### 4. Router +**Role:** API gateway + +**Responsibilities:** +- Routes requests to appropriate services +- Provides unified API endpoint +- Hosts the web console + +**Stateful:** No (can be scaled horizontally) + +### 5. Historical +**Role:** Stores and serves historical data + +**Responsibilities:** +- Downloads segments from deep storage +- Caches segments locally +- Serves queries for historical data + +**Stateful:** Yes (needs persistent storage for segment cache) + +### 6. MiddleManager / Indexer +**Role:** Handles data ingestion + +**Responsibilities:** +- Runs ingestion tasks (Peons) +- Creates new segments +- Uploads segments to deep storage + +**Stateful:** Yes (needs storage for intermediate data) + +--- + +## Data Flow + +### Ingestion Flow +``` +Data Source (Kafka/Files) + β”‚ + β–Ό + β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” + β”‚Overlord β”‚ Accepts ingestion spec + β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ + β”‚ + β–Ό +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚MiddleManager β”‚ Runs ingestion task +β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ + β”‚ + β–Ό +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚ Deep Storage β”‚ Stores segments (S3/HDFS) +β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ + β”‚ + β–Ό +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚ Coordinator β”‚ Assigns segments to Historicals +β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ + β”‚ + β–Ό +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚ Historical β”‚ Downloads and serves segments +β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ +``` + +### Query Flow +``` + Client + β”‚ + β–Ό +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚ Router β”‚ Routes to Broker +β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ + β”‚ + β–Ό +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚ Broker β”‚ Determines which nodes have data +β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ + β”‚ + β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” + β–Ό β–Ό β–Ό +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚Historicalβ”‚ β”‚Historicalβ”‚ β”‚MiddleManager β”‚ +β”‚ Node 1 β”‚ β”‚ Node 2 β”‚ β”‚(real-time) β”‚ +β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ + β”‚ β”‚ β”‚ + β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ + β”‚ + β–Ό + β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” + β”‚ Broker β”‚ Merges results + β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ + β”‚ + β–Ό + Client +``` + +--- + +## Why Druid Needs an Operator + +### Complexity +- 6+ different node types +- Each with different configuration +- Complex interdependencies +- Specific startup/shutdown order + +### Stateful Components +- Historical nodes need persistent storage +- Coordinators need stable identity +- Data must survive pod restarts + +### Operational Knowledge +- Rolling updates must follow specific order +- Scaling requires understanding of data distribution +- Failure recovery needs domain knowledge + +### Configuration Management +- Common configuration shared across nodes +- Node-specific configuration +- Runtime properties, JVM options, logging + +--- + +## Druid Configuration in the Operator + +### Common Configuration +Shared by all nodes: +```yaml +common.runtime.properties: | + # ZooKeeper connection + druid.zk.service.host=zookeeper:2181 + + # Metadata store + druid.metadata.storage.type=mysql + druid.metadata.storage.connector.connectURI=jdbc:mysql://mysql:3306/druid + + # Deep storage + druid.storage.type=s3 + druid.storage.bucket=druid-segments + + # Extensions + druid.extensions.loadList=["druid-kafka-indexing-service", "druid-s3-extensions"] +``` + +### Node-Specific Configuration +Each node type has its own: +```yaml +nodes: + brokers: + runtime.properties: | + druid.service=druid/broker + druid.broker.http.numConnections=5 + druid.processing.buffer.sizeBytes=100000000 + extra.jvm.options: |- + -Xmx4g + -Xms4g +``` + +--- + +## Segments: Druid's Data Unit + +Druid stores data in **segments**: +- Immutable chunks of data +- Typically cover a time range (hour, day) +- Stored in deep storage (S3, HDFS) +- Cached locally on Historical nodes + +``` +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚ Segment Lifecycle β”‚ +β”‚ β”‚ +β”‚ 1. MiddleManager creates segment from ingested data β”‚ +β”‚ β”‚ β”‚ +β”‚ β–Ό β”‚ +β”‚ 2. Segment uploaded to Deep Storage (S3) β”‚ +β”‚ β”‚ β”‚ +β”‚ β–Ό β”‚ +β”‚ 3. Coordinator assigns segment to Historical β”‚ +β”‚ β”‚ β”‚ +β”‚ β–Ό β”‚ +β”‚ 4. Historical downloads and caches segment β”‚ +β”‚ β”‚ β”‚ +β”‚ β–Ό β”‚ +β”‚ 5. Historical serves queries from cached segment β”‚ +β”‚ β”‚ +β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ +``` + +--- + +## Druid Extensions + +Druid is extensible via extensions: + +| Extension | Purpose | +|-----------|---------| +| `druid-kafka-indexing-service` | Kafka ingestion | +| `druid-kinesis-indexing-service` | Kinesis ingestion | +| `druid-s3-extensions` | S3 deep storage | +| `druid-hdfs-storage` | HDFS deep storage | +| `druid-mysql-metadata-storage` | MySQL metadata | +| `druid-postgresql-metadata-storage` | PostgreSQL metadata | + +Configured in common.runtime.properties: +```properties +druid.extensions.loadList=["druid-kafka-indexing-service", "druid-s3-extensions"] +``` + +--- + +## Next Steps + +Continue to [Complete Flow Walkthrough](../10-complete-flow/README.md) to see how everything works together end-to-end. diff --git a/learning-docs/10-complete-flow/README.md b/learning-docs/10-complete-flow/README.md new file mode 100644 index 00000000..468cbd16 --- /dev/null +++ b/learning-docs/10-complete-flow/README.md @@ -0,0 +1,449 @@ +# 10. Complete Flow Walkthrough + +## End-to-End: From YAML to Running Druid Cluster + +Let's trace what happens when you deploy a Druid cluster using this operator. + +--- + +## Step 1: Install the Operator + +```bash +# Add Helm repository +helm repo add datainfra https://charts.datainfra.io + +# Install the operator +helm install druid-operator datainfra/druid-operator -n druid-operator-system --create-namespace +``` + +**What happens:** +1. Helm creates the `druid-operator-system` namespace +2. CRDs are installed (`Druid`, `DruidIngestion`) +3. RBAC resources are created (ServiceAccount, Role, RoleBinding) +4. Operator Deployment is created +5. Operator pod starts running + +``` +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚ druid-operator-system namespace β”‚ +β”‚ β”‚ +β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ +β”‚ β”‚ druid-operator Deployment β”‚ β”‚ +β”‚ β”‚ β”‚ β”‚ +β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ +β”‚ β”‚ β”‚ druid-operator Pod β”‚ β”‚ β”‚ +β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ +β”‚ β”‚ β”‚ main.go starts: β”‚ β”‚ β”‚ +β”‚ β”‚ β”‚ - Creates Manager β”‚ β”‚ β”‚ +β”‚ β”‚ β”‚ - Registers DruidReconciler β”‚ β”‚ β”‚ +β”‚ β”‚ β”‚ - Registers DruidIngestionReconciler β”‚ β”‚ β”‚ +β”‚ β”‚ β”‚ - Starts watching for CRs β”‚ β”‚ β”‚ +β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ +β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ +β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ +β”‚ β”‚ +β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ +``` + +--- + +## Step 2: Create a Druid CR + +```bash +kubectl apply -f - < + ... + + # Pod configuration + podLabels: + app: druid + podAnnotations: + key: value + securityContext: + runAsUser: 1000 + + # Volumes + volumes: + - name: data + emptyDir: {} + volumeMounts: + - name: data + mountPath: /druid/data +``` + +--- + +## Environment Variables + +| Variable | Default | Description | +|----------|---------|-------------| +| `WATCH_NAMESPACE` | "" | Namespace(s) to watch (empty = all) | +| `DENY_LIST` | "default,kube-system" | Namespaces to ignore | +| `RECONCILE_WAIT` | "10s" | Time between reconciliations | +| `MAX_CONCURRENT_RECONCILES` | "1" | Max parallel reconciliations | + +--- + +## Useful Links + +- [Apache Druid Documentation](https://druid.apache.org/docs/latest/) +- [Kubernetes Documentation](https://kubernetes.io/docs/) +- [Kubebuilder Book](https://book.kubebuilder.io/) +- [Controller-Runtime](https://pkg.go.dev/sigs.k8s.io/controller-runtime) +- [Go Documentation](https://go.dev/doc/)