Skip to content

Commit f5f702d

Browse files
authored
Merge pull request #237 from rararulab/issue-228-docs-alignment
docs: align documentation with single-process architecture (#228)
2 parents 80e620d + a9ebba6 commit f5f702d

6 files changed

Lines changed: 56 additions & 90 deletions

File tree

docs/design/architecture.md

Lines changed: 16 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -10,9 +10,9 @@
1010

1111
## Design Principles
1212

13-
1. **Stateless first** — all backend services are stateless. State belongs in dedicated stores (database, cache, object store). Any service instance can be killed and replaced without impact.
13+
1. **Stateless first** — all backend services are stateless. State belongs in dedicated stores (database, object store). Any service instance can be killed and replaced without impact.
1414
2. **Pluggable** — cross-cutting concerns (auth, rate limiting) are middleware slots with defined interfaces. Swap implementations without touching business logic.
15-
3. **Cloud-native**designed for Kubernetes from day one. Service discovery via DNS, horizontal scaling per component, infrastructure and application layers are cleanly separated.
15+
3. **Single-process standalone**all services run in a single process via the `app` binary. The architecture is designed so that individual services *could* be split into separate deployments in the future, but the current deployment model is a single process.
1616

1717
## System Overview
1818

@@ -23,17 +23,16 @@
2323

2424
| Component | Stateful | Description |
2525
|-----------|----------|-------------|
26-
| **Gateway** | No | Control-plane entry point. Reverse proxy, rate limiting, auth middleware. See [Gateway](./gateway.md). |
27-
| **MetaService** | No | System brain. Metadata CRUD, share link lifecycle, connection token signing, service topology awareness (K8s Endpoints API). See [MetaService](./meta-service.md). |
26+
| **Gateway** | No | Control-plane entry point. Reverse proxy for MetaService and Ingestor, rate limiting, auth middleware. Streamer is accessed directly, not through Gateway — streaming data should not pass through a reverse proxy. See [Gateway](./gateway.md). |
27+
| **MetaService** | No | System brain. Metadata CRUD, share link lifecycle, connection token signing. See [MetaService](./meta-service.md). |
2828
| **Ingestor** | No | Receives uploads from clients, processes into HLS. Storage is an internal detail. Stateless — no affinity needed. |
29-
| **Streamer** | No | Serves video playback. Per-video affinity via consistent hashing for local cache optimization. Client reconnects to a different instance only on failure. |
30-
| **Client (WASM)** | No | Core logic in Rust → WASM. Client-side load balancing, connection reuse, failover. |
29+
| **Streamer** | No | Serves video playback. Accessed directly by clients, not proxied through Gateway. |
30+
| **Client (WASM)** | No | Core logic in Rust → WASM. |
3131
| **MinIO (S3)** | Yes | Object storage. Internal to Ingestor/Streamer — never exposed to clients. |
3232
| **Database** | Yes | Metadata persistence. |
33-
| **MemoryCache** | Yes | Rate limiting state for Gateway and MetaService. Redis, Memcached, or in-process. |
34-
| **Queue** | Yes | Async job delivery between services (e.g. upload verification). Redis Streams, RabbitMQ, or SQS. |
33+
| **Queue (in-memory)** | No | In-process async job delivery (e.g. upload-complete notification). Uses an in-memory channel within the single process — not a distributed message broker. |
3534

36-
All backend services are stateless. State belongs in dedicated stores.
35+
All services run in a single process. Backend services are stateless; persistent state belongs in dedicated stores (database, object storage).
3736

3837
## User Identity
3938

@@ -43,27 +42,21 @@ Ownership checks (e.g. "can this user delete this video?") are enforced by MetaS
4342

4443
## Data Plane — Connection Token Flow
4544

46-
Control-plane requests go through Gateway. Data-plane traffic (upload/playback) bypasses Gateway entirely — clients connect directly to Ingestor/Streamer instances.
45+
Control-plane requests go through Gateway. Data-plane traffic (upload/playback) is served by Ingestor/Streamer directly (all within the same process).
4746

48-
This follows the **Kafka / Redis Cluster pattern**: client fetches a service map, computes routing locally, and connects directly to the target instance.
47+
In the current single-process deployment, all services share the same address. The connection token flow still applies — it authorizes data-plane access regardless of deployment topology.
4948

5049
```
5150
1. Client → Gateway → MetaService: "I want to upload/watch video X"
52-
2. MetaService returns a service map (list of Ingestor/Streamer external addresses)
53-
sourced from K8s Endpoints API, mapped to externally reachable addresses (NodePort)
54-
3. Client computes target locally:
55-
- Upload: round-robin across Ingestors
56-
- Playback: consistent hash ring on video_id → pick Streamer (cache affinity)
57-
4. Client connects directly to the chosen instance with a signed connection token
58-
5. Upload: client sends file data to Ingestor (storage is an internal detail of Ingestor)
59-
Playback: client requests HLS manifest/segments from Streamer (caching is an internal detail)
51+
2. MetaService returns a connection token authorizing the action
52+
3. Client connects to Ingestor (upload) or Streamer (playback) with the signed token
53+
4. Upload: client sends file data to Ingestor
54+
Playback: client requests HLS manifest/segments from Streamer
6055
```
6156

62-
**On failure**: client refreshes the service map from MetaService, recomputes target, reconnects. Only the failed node's connections are affected — consistent hashing minimizes cache disruption on Streamer topology changes.
57+
## Scaling (future)
6358

64-
## Scaling
65-
66-
Horizontal scaling is handled by K8s HPA. Each service exposes a `/metrics` endpoint (Prometheus format):
59+
The current deployment is single-process. The architecture is designed so that services *could* be separated and scaled independently in the future. Potential scale-out dimensions:
6760

6861
| Service | Scale Metric |
6962
|---------|-------------|

docs/design/gateway.md

Lines changed: 6 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,8 @@ Gateway performs **prefix-strip-and-forward**. It never parses downstream routes
2626
|--------|----------|---------|
2727
| `/api/meta/*` | MetaService | Video metadata, share links |
2828
| `/api/ingest/*` | Ingestor | Upload negotiation |
29-
| `/api/stream/*` | Streamer | Playback negotiation |
29+
30+
Streamer is **not** proxied through Gateway. Video streaming data should not pass through a reverse proxy — clients access Streamer directly. This is by design: Gateway handles control-plane traffic only.
3031

3132
Adding endpoints to any downstream service requires **zero Gateway changes**.
3233

@@ -39,19 +40,16 @@ Request → AuthMiddleware → RateLimitMiddleware → ProxyForward → Response
3940
- **AuthMiddleware** — extracts user identity and injects `X-User-Id` header for downstream services. Currently hardcoded to `bot` (single-user mode). Interface is defined so a real implementation (e.g. JWT verification) can be swapped in without changing Gateway internals or any downstream service.
4041
- **RateLimitMiddleware** — per-user throttling using a backing cache (Redis / in-process). Only applies to control-plane requests since data-plane traffic never hits Gateway.
4142

42-
## Service Discovery on K8s
43+
## Service Discovery
4344

44-
No external service registry needed. Gateway resolves upstreams via **K8s Service DNS**:
45+
In the current single-process deployment, Gateway forwards requests to MetaService and Ingestor in-process. Upstream addresses are configured via environment variables:
4546

4647
```yaml
4748
upstreams:
48-
meta: "http://meta-service:8080"
49-
ingest: "http://ingest-service:8080"
50-
stream: "http://stream-service:8080"
49+
meta: "http://localhost:8080"
50+
ingest: "http://localhost:8080"
5151
```
5252
53-
K8s handles DNS resolution and load balancing across pods.
54-
5553
## Why a Custom Gateway (Not Nginx)
5654
5755
K8s Ingress (Nginx/Envoy) handles infrastructure concerns: TLS termination, external routing, health checks. Gateway handles **business concerns**: auth validation and rate limiting are application logic that belongs in application code, not Nginx config.

docs/design/ingestor.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ Three responsibilities:
1414

1515
![Ingestor Upload Phase](./ingestor-upload-flow.svg)
1616

17-
## Processing Pipeline (async, queue-driven)
17+
## Processing Pipeline (async, in-memory queue)
1818

1919
![Ingestor Processing Pipeline](./ingestor-processing-flow.svg)
2020

@@ -38,7 +38,7 @@ bucket/
3838
Processing tasks are persisted in the database with their current state. This ensures:
3939

4040
- **Crash recovery** — on restart, Ingestor picks up incomplete tasks rather than re-processing from scratch
41-
- **No duplicate work** — each task has a unique ID; the queue delivers at-least-once, Ingestor deduplicates via DB state
41+
- **No duplicate work** — each task has a unique ID; Ingestor deduplicates via DB state
4242

4343
### Failure Strategy
4444

@@ -54,8 +54,8 @@ After all retries exhausted, the task is marked `failed` and MetaService is noti
5454

5555
```
5656
Owner calls DELETE /videos/:id
57-
→ MetaService publishes cancellation event to queue
58-
→ Ingestor consumes event
57+
→ MetaService publishes cancellation event to in-memory queue
58+
→ Ingestor consumes event (same process)
5959
→ If transmux is in progress: kill ffmpeg, clean up partial HLS output
6060
→ If task is queued but not started: discard from queue
6161
→ Notify MetaService: status → cancelled

docs/design/layout.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
crates/
77
├── app/ # The ONLY binary crate. CLI entry point, starts services based on args.
88
├── core/ # Shared library. Domain types, DB repos, storage helpers, HTTP client.
9-
├── server/ # HTTP server foundation. Metrics (/metrics for k8s HPA), health, graceful shutdown, tracing.
9+
├── server/ # HTTP server foundation. Health checks, graceful shutdown, tracing.
1010
├── meta/ # Library crate. Video metadata CRUD, share links. Exposes Router + State.
1111
├── ingestor/ # Library crate. Upload + transmux pipeline. Exposes Router + State.
1212
├── streamer/ # Library crate. HLS playback + segment caching. Exposes Router + State.
@@ -19,7 +19,7 @@ crates/
1919
1. **Single binary**`app` is the only crate with `main.rs`. It parses CLI args to decide which service(s) to start. No other crate produces a binary.
2020
2. **Library-first** — Every service crate is a lib crate that exposes its `Router` and `State`. The `app` crate composes them.
2121
3. **`core` is shared code** — Domain types, database repositories, storage helpers, config. No HTTP server logic.
22-
4. **`server` is the HTTP foundation** — Wraps axum with cross-cutting concerns: metrics endpoint for k8s HPA, health checks, graceful shutdown, tracing setup. Service crates depend on `server`, not on raw axum server setup.
22+
4. **`server` is the HTTP foundation** — Wraps axum with cross-cutting concerns: health checks, graceful shutdown, tracing setup. Service crates depend on `server`, not on raw axum server setup.
2323
5. **Each crate owns its tests** — Every lib crate has its own unit/integration tests. `cargo test -p <crate>` must pass independently.
2424

2525
## Service Crate Contract
@@ -57,7 +57,6 @@ server.mount("/api/meta", meta_router)
5757
| HTTP client (`reqwest` singletons) | `core` |
5858
| Config (`AppConfig`, `paths`) | `core` |
5959
| HTTP server (`bind`, `run`, graceful shutdown) | `server` |
60-
| Metrics endpoint (`/metrics`) | `server` |
6160
| Health check (`/healthz`, `/readyz`) | `server` |
6261
| Tracing/logging setup | `server` |
6362

docs/design/meta-service.md

Lines changed: 5 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,8 @@ Four responsibilities:
88

99
1. **Metadata management** — video CRUD, share link lifecycle, ownership enforcement
1010
2. **Connection token** — issues HMAC-signed tokens that authorize clients to access data-plane services
11-
3. **Topology awareness**watches K8s Endpoints API to maintain a live service map
12-
4. **Task lifecycle** — publishes cancellation/deletion events to the queue for Ingestor to consume
11+
3. **Service address**provides the service address for clients to connect to data-plane services
12+
4. **Task lifecycle** — publishes cancellation/deletion events to the in-memory queue for Ingestor to consume (same process)
1313

1414
All mutating operations check ownership via `X-User-Id` (currently hardcoded to `bot` by Gateway). Each video record stores an `owner` field set at creation time.
1515

@@ -19,20 +19,11 @@ All mutating operations check ownership via `X-User-Id` (currently hardcoded to
1919

2020
![MetaService Playback Interaction](./meta-playback-flow.svg)
2121

22-
Follows the **Kafka / Redis Cluster pattern**: routing decisions live on the client side, MetaService only provides the data.
23-
2422
When a client requests an upload or playback, MetaService returns:
2523

26-
- **Service map**list of healthy Ingestor/Streamer external addresses (NodePort)
24+
- **Service address**the address of the single-process server hosting Ingestor/Streamer
2725
- **Connection token** — authorizes the client to connect
2826

29-
The client computes the target locally:
30-
31-
| Action | Strategy | Reason |
32-
|--------|----------|--------|
33-
| Upload | Round-robin | Stateless, no affinity needed |
34-
| Playback | Consistent hash ring on `video_id` | Cache affinity — same video hits same Streamer. On node changes, only `1/N` keys re-map |
35-
3627
## Connection Token
3728

3829
Data-plane services (Ingestor/Streamer) sit outside Gateway's auth middleware. The connection token is the **only** mechanism that authorizes data-plane access. Without a valid token, services reject the connection.
@@ -45,7 +36,7 @@ signature = HMAC-SHA256(payload, server_secret)
4536
token = base64url(payload + signature)
4637
```
4738

48-
`server_secret` is shared between MetaService and data-plane services via K8s Secret. Data-plane services verify tokens locally — no callback to MetaService needed.
39+
`server_secret` is shared between MetaService and data-plane services via configuration. In the single-process deployment, all services share the same config. Data-plane services verify tokens locally — no callback to MetaService needed.
4940

5041
### Fields
5142

@@ -67,23 +58,6 @@ The token does not encode a target instance. It only authorizes the action — t
6758
| Short-lived | TTL-based expiry limits replay attacks |
6859
| Scoped | `action` field prevents cross-purpose reuse |
6960

70-
## Topology Awareness
71-
72-
![MetaService Topology Awareness](./meta-topology-flow.svg)
73-
74-
MetaService watches K8s Endpoints API for Ingestor and Streamer services, giving it a live view of ready pods without polling.
75-
76-
Pods are exposed via NodePort. MetaService maps internal pod IPs to externally reachable `node_ip:node_port` pairs in the service map.
77-
78-
On topology changes (scale up/down, pod failure):
79-
80-
- Service map updates immediately
81-
- Existing client connections are unaffected
82-
- New requests or service map refreshes reflect the new topology
83-
- Consistent hashing minimizes cache disruption on Streamer changes
84-
85-
Only videos previously assigned to failed nodes need to re-map.
86-
8761
## Share Link
8862

8963
![MetaService Share Link Flow](./meta-share-flow.svg)
@@ -108,7 +82,7 @@ Share link controls **who can enter**. Connection token controls **how long they
10882

10983
### Rate Limiting
11084

111-
Rate limit state is stored in MemoryCache (Redis), shared by Gateway and MetaService:
85+
Rate limit state is stored in-memory, shared by Gateway and MetaService within the single process:
11286

11387
| Layer | Location | Protects against |
11488
|-------|----------|-----------------|

docs/design/performance.md

Lines changed: 23 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -46,16 +46,18 @@ Each viewer consumes ~2.5 Mbps. With Streamer cache hits, most traffic is served
4646

4747
---
4848

49-
## Instance Sizing
49+
## Capacity per Service (single-process)
5050

51-
| Service | Sizing factor | Light (100 viewers) | Peak (1,000 viewers) |
52-
|---------|---------------|:-------------------:|:--------------------:|
53-
| Gateway | ~10K req/s per instance | 1 | 2 |
54-
| MetaService | ~5K req/s per instance | 1 | 2 |
55-
| Ingestor | 1 per ~20 concurrent uploads | 1 | 3 |
56-
| Streamer | 1 per ~200 concurrent streams (NIC bound) | 1 | 5 |
51+
All services share a single process. The following estimates show per-service capacity within that process:
5752

58-
Gateway and MetaService handle lightweight control-plane requests. Streamer is the primary scale-out target.
53+
| Service | Capacity factor |
54+
|---------|----------------|
55+
| Gateway | ~10K req/s (lightweight control-plane proxy) |
56+
| MetaService | ~5K req/s (metadata reads/writes) |
57+
| Ingestor | ~20 concurrent upload negotiations; transmux is CPU-bound |
58+
| Streamer | ~200 concurrent streams per 1 Gbps NIC capacity |
59+
60+
Gateway and MetaService handle lightweight control-plane requests. Streamer is the primary resource consumer.
5961

6062
---
6163

@@ -76,7 +78,7 @@ Two separate concerns with very different limits:
7678
- **Upload negotiation** (token validation + presigned URL signing): ~20 concurrent per instance
7779
- **Transmux processing** (CPU bound): ~5 concurrent per instance
7880

79-
With 3 instances × 5 transmux slots = ~15 videos/min throughput. HPA scales on queue depth; the queue provides natural backpressure (client sees `processing` status until complete).
81+
In the single-process deployment, transmux concurrency is limited by available CPU cores. The in-memory queue provides natural backpressure (client sees `processing` status until complete).
8082

8183
### Streamer — Network I/O
8284

@@ -88,13 +90,11 @@ With 3 instances × 5 transmux slots = ~15 videos/min throughput. HPA scales on
8890

8991
### Consistent hash rebalance on scale-up
9092

91-
Adding a Streamer instance remaps `1/N` of cached videos, causing cold-start penalties. Mitigate by scaling during low-traffic periods and using virtual nodes to reduce per-event disruption.
92-
9393
### Infrastructure
9494

95-
- **MinIO** — disk I/O on write path; use erasure coding across nodes + SSD for hot tier
96-
- **Database**~10K writes/s (Postgres), sufficient for this scale; shard by `video_id` if needed
97-
- **Queue**~100K msg/s (Redis Streams), far exceeds needs
95+
- **MinIO** — disk I/O on write path; use SSD for hot tier
96+
- **Database**~10K writes/s (Postgres), sufficient for this scale
97+
- **Queue**in-memory channel, negligible overhead
9898

9999
---
100100

@@ -114,7 +114,7 @@ Client Gateway MetaService Ingestor
114114
│◄── 200 OK ────────────────────────────────────────┤
115115
│ │
116116
├── POST /uploaded ──────►├───────────►│ │
117-
│ ~50ms │ │── queue ───►│
117+
│ ~50ms │ │── notify ──►│
118118
│◄── 202 Accepted ──────┤ │ │
119119
│ │
120120
│ (async transmux) │
@@ -224,12 +224,14 @@ The 1,000-viewer target is well within single-machine capacity. Upgrading to a 2
224224

225225
---
226226

227-
## Scaling Decision Guide
227+
## Scaling Decision Guide (future)
228+
229+
The current deployment is single-process. If the architecture were split into separate services, these would be the scaling signals:
228230

229231
| Signal | Action |
230232
|--------|--------|
231-
| Playback latency rising, cache miss rate high | Add Streamer instances |
232-
| Transmux queue depth growing | Add Ingestor instances (HPA on queue depth) |
233-
| Control-plane P99 > 100 ms | Add MetaService instance or DB read replica |
234-
| Gateway CPU > 70% | Add Gateway instance or offload TLS to Ingress |
235-
| MinIO write latency spiking | Add storage nodes or move to SSD |
233+
| Playback latency rising, cache miss rate high | Scale Streamer |
234+
| Transmux queue depth growing | Scale Ingestor |
235+
| Control-plane P99 > 100 ms | Scale MetaService or add DB read replica |
236+
| Gateway CPU > 70% | Scale Gateway or offload TLS to Ingress |
237+
| MinIO write latency spiking | Add storage capacity or move to SSD |

0 commit comments

Comments
 (0)