@@ -17,7 +17,7 @@ description: "Learn why lakehouse architecture combines the best of data lakes a
1717## Prerequisites
1818
1919- None. Start here.
20- - Optional: [ Part 2: Getting Started— Setup Guide] ( 02-setup-guide.md ) if you want to run services now.
20+ - Optional: [ Part 2: Getting Started - Setup Guide] ( 02-setup-guide.md ) if you want to run services now.
2121
2222## The Problem We're Solving
2323
@@ -26,10 +26,10 @@ Traditional data pipelines have a fundamental problem: **they force you to choos
2626Either you have:
2727
2828- ** A Data Lake** : cheap, flexible storage but chaotic and hard to query
29- - ** A Data Warehouse** : organized , fast queries but rigid and expensive
29+ - ** A Data Warehouse** : organised , fast queries but rigid and expensive
3030
3131Phlo solves this by combining the best of both worlds into a ** lakehouse** .
32- If you want hands-on setup next, jump to [ Part 2: Getting Started— Setup Guide] ( 02-setup-guide.md ) .
32+ If you want hands-on setup next, jump to [ Part 2: Getting Started - Setup Guide] ( 02-setup-guide.md ) .
3333
3434## The Three Eras of Data Architecture
3535
@@ -43,7 +43,7 @@ If you want hands-on setup next, jump to [Part 2: Getting Started—Setup Guide]
4343
4444- Store raw data cheaply in object storage
4545- Flexible schema
46- - Problem: "Swamp" syndrome— data is disorganized , hard to query, poor governance
46+ - Problem: "Swamp" syndrome - data is disorganised , hard to query, poor governance
4747
4848### Era 3: The Data Lakehouse (2020s+)
4949
@@ -94,7 +94,7 @@ graph TB
9494
9595### 1. Apache Iceberg (Table Format)
9696
97- Imagine you're storing data in a filing cabinet. A table format is the ** file organization system** that lets you:
97+ Imagine you're storing data in a filing cabinet. A table format is the ** file organisation system** that lets you:
9898
9999```
100100Instead of:
@@ -217,23 +217,58 @@ def publish_marts() -> None:
217217
218218## The Data Flow in Phlo
219219
220- ``` mermaid
221- flowchart TD
222- A[Nightscout API] -->|DLT + PyIceberg| B[S3 Staging - MinIO]
223- B -->|Merge with dedup| C[raw.glucose_entries]
224- C -->|dbt + Trino| D[bronze.stg_entries]
225- D --> E[silver.fct_readings]
226- E --> F[gold.dim_date]
227- F -->|Trino to Postgres| G[marts.mrt_glucose_overview]
228- G -->|SQL queries| H[Superset Dashboard]
220+ ```
221+ 1. INGEST
222+ ┌─────────────────────┐
223+ │ Nightscout API │
224+ │ (glucose data) │
225+ └──────────┬──────────┘
226+ │
227+ ↓ (DLT + PyIceberg)
228+ ┌─────────────────────────────────┐
229+ │ S3 Staging (MinIO) │ ← Temporary parquet files
230+ └──────────┬──────────────────────┘
231+ │
232+ ↓ (Merge with dedup)
233+ ┌──────────────────────────────────────┐
234+ │ Iceberg Table: raw.glucose_entries │ ← Immutable, ACID
235+ │ Branch: main (production) │
236+ └──────────┬───────────────────────────┘
237+
238+ 2. TRANSFORM
239+ ↓ (dbt + Trino)
240+ ┌──────────────────────────────────────┐
241+ │ Iceberg Table: bronze.stg_entries │ ← Type conversions
242+ └──────────┬───────────────────────────┘
243+ │
244+ ↓
245+ ┌──────────────────────────────────────┐
246+ │ Iceberg Table: silver.fct_readings │ ← Business logic
247+ └──────────┬───────────────────────────┘
248+ │
249+ ↓
250+ ┌──────────────────────────────────────┐
251+ │ Iceberg Table: gold.dim_date │ ← Dimensions
252+ └──────────┬───────────────────────────┘
253+
254+ 3. PUBLISH
255+ ↓ (Trino → Postgres)
256+ ┌──────────────────────────────────────┐
257+ │ Postgres: marts.mrt_glucose_overview │ ← Fast for BI
258+ └──────────┬───────────────────────────┘
259+ │
260+ ↓ (SQL queries)
261+ ┌──────────────────────────────────────┐
262+ │ Superset Dashboard │ ← Visualisation
263+ └──────────────────────────────────────┘
229264```
230265
231266## Why This Matters (Real Benefits)
232267
233268| Problem | Traditional | Phlo Solution |
234269| -------------- | ------------------------- | ---------------------------- |
235270| Data costs | High (warehouse fees) | Low (S3 storage) |
236- | Query speed | Fast | Fast (Trino optimization ) |
271+ | Query speed | Fast | Fast (Trino optimisation ) |
237272| Schema changes | Painful rewrites | Easy evolution |
238273| Governance | Manual processes | Git-like branching |
239274| Vendor lock-in | Yes (Snowflake, Redshift) | No (open formats) |
@@ -300,7 +335,7 @@ See [Troubleshooting Guide](../operations/troubleshooting.md) for deeper diagnos
300335
301336## See Also
302337
303- See also: [ Part 2: Getting Started— Setup Guide] ( 02-setup-guide.md ) , [ Part 3: Apache Iceberg Explained] ( 03-apache-iceberg-explained.md ) , [ Part 4: Project Nessie Versioning] ( 04-project-nessie-versioning.md ) . Reference: [ Architecture Overview] ( ../reference/architecture.md ) .
338+ See also: [ Part 2: Getting Started - Setup Guide] ( 02-setup-guide.md ) , [ Part 3: Apache Iceberg Explained] ( 03-apache-iceberg-explained.md ) , [ Part 4: Project Nessie Versioning] ( 04-project-nessie-versioning.md ) . Reference: [ Architecture Overview] ( ../reference/architecture.md ) .
304339
305340## Summary
306341
@@ -317,4 +352,4 @@ Ready to build? In Part 2, we'll:
317352- Start all services with one command
318353- Run your first data pipeline
319354
320- ** Next** : [ Part 2: Getting Started— Setup Guide] ( 02-setup-guide.md )
355+ ** Next** : [ Part 2: Getting Started - Setup Guide] ( 02-setup-guide.md )
0 commit comments