phlohouse
diff --git a/‎web/content/blog/01-intro-data-lakehouse.md‎
Lines changed: 52 additions & 17 deletions b/‎web/content/blog/01-intro-data-lakehouse.md‎
Lines changed: 52 additions & 17 deletions
diff --git a/‎web/content/blog/02-setup-guide.md‎
Lines changed: 25 additions & 18 deletions b/‎web/content/blog/02-setup-guide.md‎
Lines changed: 25 additions & 18 deletions
diff --git a/‎web/content/blog/03-apache-iceberg-explained.md‎
Lines changed: 8 additions & 8 deletions b/‎web/content/blog/03-apache-iceberg-explained.md‎
Lines changed: 8 additions & 8 deletions
diff --git a/‎web/content/blog/04-project-nessie-versioning.md‎
Lines changed: 32 additions & 23 deletions b/‎web/content/blog/04-project-nessie-versioning.md‎
Lines changed: 32 additions & 23 deletions
@@ -17,7 +17,7 @@ description: "Learn why lakehouse architecture combines the best of data lakes a
 ## Prerequisites
 
 - None. Start here.
-- Optional: [Part 2: Getting Started—Setup Guide](02-setup-guide.md) if you want to run services now.
+- Optional: [Part 2: Getting Started - Setup Guide](02-setup-guide.md) if you want to run services now.
 
 ## The Problem We're Solving
 
@@ -26,10 +26,10 @@ Traditional data pipelines have a fundamental problem: **they force you to choos
 Either you have:
 
 - **A Data Lake**: cheap, flexible storage but chaotic and hard to query
-- **A Data Warehouse**: organized, fast queries but rigid and expensive
+- **A Data Warehouse**: organised, fast queries but rigid and expensive
 
 Phlo solves this by combining the best of both worlds into a **lakehouse**.
-If you want hands-on setup next, jump to [Part 2: Getting Started—Setup Guide](02-setup-guide.md).
+If you want hands-on setup next, jump to [Part 2: Getting Started - Setup Guide](02-setup-guide.md).
 
 ## The Three Eras of Data Architecture
 
@@ -43,7 +43,7 @@ If you want hands-on setup next, jump to [Part 2: Getting Started—Setup Guide]
 
 - Store raw data cheaply in object storage
 - Flexible schema
-- Problem: "Swamp" syndrome—data is disorganized, hard to query, poor governance
+- Problem: "Swamp" syndrome - data is disorganised, hard to query, poor governance
 
 ### Era 3: The Data Lakehouse (2020s+)
 
@@ -94,7 +94,7 @@ graph TB
 
 ### 1. Apache Iceberg (Table Format)
 
-Imagine you're storing data in a filing cabinet. A table format is the **file organization system** that lets you:
+Imagine you're storing data in a filing cabinet. A table format is the **file organisation system** that lets you:
 
 ```
 Instead of:
@@ -217,23 +217,58 @@ def publish_marts() -> None:
 
 ## The Data Flow in Phlo
 
-```mermaid
-flowchart TD
-    A[Nightscout API] -->|DLT + PyIceberg| B[S3 Staging - MinIO]
-    B -->|Merge with dedup| C[raw.glucose_entries]
-    C -->|dbt + Trino| D[bronze.stg_entries]
-    D --> E[silver.fct_readings]
-    E --> F[gold.dim_date]
-    F -->|Trino to Postgres| G[marts.mrt_glucose_overview]
-    G -->|SQL queries| H[Superset Dashboard]
+```
+1. INGEST
+   ┌─────────────────────┐
+   │ Nightscout API      │
+   │ (glucose data)      │
+   └──────────┬──────────┘
+              │
+              ↓ (DLT + PyIceberg)
+   ┌─────────────────────────────────┐
+   │ S3 Staging (MinIO)              │ ← Temporary parquet files
+   └──────────┬──────────────────────┘
+              │
+              ↓ (Merge with dedup)
+   ┌──────────────────────────────────────┐
+   │ Iceberg Table: raw.glucose_entries   │ ← Immutable, ACID
+   │ Branch: main (production)            │
+   └──────────┬───────────────────────────┘
+
+2. TRANSFORM
+              ↓ (dbt + Trino)
+   ┌──────────────────────────────────────┐
+   │ Iceberg Table: bronze.stg_entries    │ ← Type conversions
+   └──────────┬───────────────────────────┘
+              │
+              ↓
+   ┌──────────────────────────────────────┐
+   │ Iceberg Table: silver.fct_readings   │ ← Business logic
+   └──────────┬───────────────────────────┘
+              │
+              ↓
+   ┌──────────────────────────────────────┐
+   │ Iceberg Table: gold.dim_date         │ ← Dimensions
+   └──────────┬───────────────────────────┘
+
+3. PUBLISH
+              ↓ (Trino → Postgres)
+   ┌──────────────────────────────────────┐
+   │ Postgres: marts.mrt_glucose_overview │ ← Fast for BI
+   └──────────┬───────────────────────────┘
+              │
+              ↓ (SQL queries)
+   ┌──────────────────────────────────────┐
+   │ Superset Dashboard                   │ ← Visualisation
+   └──────────────────────────────────────┘
 ```
 
 ## Why This Matters (Real Benefits)
 
 | Problem        | Traditional               | Phlo Solution                |
 | -------------- | ------------------------- | ---------------------------- |
 | Data costs     | High (warehouse fees)     | Low (S3 storage)             |
-| Query speed    | Fast                      | Fast (Trino optimization)    |
+| Query speed    | Fast                      | Fast (Trino optimisation)    |
 | Schema changes | Painful rewrites          | Easy evolution               |
 | Governance     | Manual processes          | Git-like branching           |
 | Vendor lock-in | Yes (Snowflake, Redshift) | No (open formats)            |
@@ -300,7 +335,7 @@ See [Troubleshooting Guide](../operations/troubleshooting.md) for deeper diagnos
 
 ## See Also
 
-See also: [Part 2: Getting Started—Setup Guide](02-setup-guide.md), [Part 3: Apache Iceberg Explained](03-apache-iceberg-explained.md), [Part 4: Project Nessie Versioning](04-project-nessie-versioning.md). Reference: [Architecture Overview](../reference/architecture.md).
+See also: [Part 2: Getting Started - Setup Guide](02-setup-guide.md), [Part 3: Apache Iceberg Explained](03-apache-iceberg-explained.md), [Part 4: Project Nessie Versioning](04-project-nessie-versioning.md). Reference: [Architecture Overview](../reference/architecture.md).
 
 ## Summary
 
@@ -317,4 +352,4 @@ Ready to build? In Part 2, we'll:
 - Start all services with one command
 - Run your first data pipeline
 
-**Next**: [Part 2: Getting Started—Setup Guide](02-setup-guide.md)
+**Next**: [Part 2: Getting Started - Setup Guide](02-setup-guide.md)
@@ -3,14 +3,14 @@ title: "Getting Started with Phlo — Setup Guide"
 description: "Install Phlo, bootstrap a new project, start the service stack, and run your first data pipeline in under 20 minutes."
 ---
 
-# Part 2: Getting Started with Phlo—Setup Guide
+# Part 2: Getting Started with Phlo - Setup Guide
 
 > Prerequisite: Read [Part 1: What is a Data Lakehouse?](01-intro-data-lakehouse.md) for core concepts.
 
 ## What You'll Learn
 
 - Bootstrap a new Phlo project with `phlo init`
-- Initialize and start the service stack
+- Initialise and start the service stack
 - Ingest sample data and materialize assets
 - Verify results in Dagster and Observatory
 
@@ -54,7 +54,7 @@ description: "Install Phlo, bootstrap a new project, start the service stack, an
 
 If you have less than 4GB RAM, you can start a minimal setup (Postgres + MinIO only) and add services gradually.
 
-## Step 1: Initialize Your Project
+## Step 1: Initialise Your Project
 
 ```bash
 # Create a new Phlo project
@@ -69,7 +69,7 @@ cd my-lakehouse
 ```
 
 
-Then initialize infra (generates `.phlo/.env` and `.phlo/.env.local`):
+Then initialise infra (generates `.phlo/.env` and `.phlo/.env.local`):
 
 ```bash
 phlo services init
@@ -281,12 +281,16 @@ Open **Dagster** at http://localhost:3000
 
 You should see the asset graph:
 
-```mermaid
-flowchart TD
-    A[glucose_entries] --> B[stg_glucose_entries]
-    B --> C[fct_glucose_readings]
-    C --> D[fct_daily_glucose_metrics]
-    D --> E[postgres_marts]
+```
+glucose_entries
+  ↓
+stg_glucose_entries (dbt)
+  ↓
+fct_glucose_readings (dbt)
+  ↓
+fct_daily_glucose_metrics
+  ↓
+postgres_marts
 ```
 
 Click on `glucose_entries` → Click **Materialize this asset**
@@ -340,11 +344,14 @@ This will:
 
 Watch it propagate through the graph:
 
-```mermaid
-flowchart TD
-    A[glucose_entries ✓] --> B[stg_glucose_entries ⏳]
-    B --> C[fct_glucose_readings ⏳]
-    C --> D[postgres_marts ⏳]
+```
+glucose_entries [SUCCESS]
+  ↓
+stg_glucose_entries ⏳ (running)
+  ↓
+fct_glucose_readings ⏳ (waiting)
+  ↓
+postgres_marts ⏳ (waiting)
 ```
 
 ### 5d: Check Results
@@ -436,7 +443,7 @@ LIMIT 24
 5. Click **Update Chart**
 6. Click **Save Chart**
 
-Congratulations! You've visualized real glucose data from a lakehouse.
+Congratulations! You've visualised real glucose data from a lakehouse.
 
 ## Hands-On Exercise: Re-run the Pipeline
 
@@ -544,9 +551,9 @@ You've successfully:
 - Ran transformations
 - Created a dashboard
 
-In Part 3, we'll dive deep into **Apache Iceberg**—the magic that makes this lakehouse work.
+In Part 3, we'll cover **Apache Iceberg** and how it manages table metadata, snapshots, and schema changes.
 
 ## Next Steps
 
-- Continue with [Part 3: Apache Iceberg—The Table Format That Changed Everything](03-apache-iceberg-explained.md).
+- Continue with [Part 3: Apache Iceberg - The Table Format That Changed Everything](03-apache-iceberg-explained.md).
 - Jump to ingestion specifics in [Part 5: Data Ingestion](05-data-ingestion.md).
@@ -3,7 +3,7 @@ title: "Apache Iceberg — The Table Format That Changed Everything"
 description: "Understand how Apache Iceberg enables ACID transactions, time travel, and schema evolution on your data lake."
 ---
 
-# Part 3: Apache Iceberg—The Table Format That Changed Everything
+# Part 3: Apache Iceberg - The Table Format That Changed Everything
 
 > Prerequisite: Read [Part 1: What is a Data Lakehouse?](01-intro-data-lakehouse.md) for lakehouse context.
 
@@ -17,9 +17,9 @@ description: "Understand how Apache Iceberg enables ACID transactions, time trav
 ## Prerequisites
 
 - [Part 1: What is a Data Lakehouse?](01-intro-data-lakehouse.md)
-- Optional: [Part 2: Getting Started—Setup Guide](02-setup-guide.md) to run the examples locally.
+- Optional: [Part 2: Getting Started - Setup Guide](02-setup-guide.md) to run the examples locally.
 
-In Part 1, we mentioned Iceberg as the magic ingredient. Let's understand _why_ it's such a game-changer.
+In Part 1, we introduced Iceberg as the table layer. Here's why it matters in day-to-day pipelines.
 For Git-like versioning on top of Iceberg, see [Part 4: Project Nessie Versioning](04-project-nessie-versioning.md).
 
 ## The Problem With Traditional Parquet
@@ -43,7 +43,7 @@ s3://lake/glucose-data/
 - Schema changes require rewriting all files
 - Queries must scan ALL files (no partition pruning)
 - Concurrent writes = conflicting files
-- No time travel—data is gone when you delete it
+- No time travel - data is gone when you delete it
 
 ## What Iceberg Provides
 
@@ -90,7 +90,7 @@ flowchart TB
 
 ### 1. Snapshots (Immutable Versions)
 
-Each write creates a new **snapshot**—a complete, immutable view of the table at that moment:
+Each write creates a new **snapshot** - a complete, immutable view of the table at that moment:
 
 ```python
 # In Python, using PyIceberg
@@ -137,7 +137,7 @@ Snapshot 1234567892:
     └── data/year=2024/month=10/day=02/00004.parquet → rows 101-200
 ```
 
-Why manifests? Query optimization:
+Why manifests? Query optimisation:
 
 - Scanner reads manifest, not S3 listing
 - Knows exact file count before scanning
@@ -233,7 +233,7 @@ FROM iceberg.raw.glucose_entries;  -- Current
 - 🐛 Data quality issue today? Check what you ingested yesterday
 - Audit trail: see exactly what changed and when
 - Reproducibility: re-run yesterday's analysis with yesterday's data
-- ↩️ No "undo" button needed—just query the previous snapshot
+- No "undo" button needed - just query the previous snapshot
 
 ## ACID Transactions
 
@@ -500,5 +500,5 @@ Phlo uses Iceberg to ensure:
 
 ## Next Steps
 
-- Continue with [Part 4: Project Nessie—Git-Like Versioning for Data](04-project-nessie-versioning.md).
+- Continue with [Part 4: Project Nessie - Git-Like Versioning for Data](04-project-nessie-versioning.md).
 - See how Iceberg powers dbt models in [Part 6: dbt Transformations](06-dbt-transformations.md).
@@ -3,7 +3,7 @@ title: "Project Nessie — Git-Like Versioning for Data"
 description: "Add branching, merging, and tagging to your data catalog with Project Nessie for safe experimentation and auditable changes."
 ---
 
-# Part 4: Project Nessie—Git-Like Versioning for Data
+# Part 4: Project Nessie - Git-Like Versioning for Data
 
 > Prerequisite: Read [Part 3: Apache Iceberg Explained](03-apache-iceberg-explained.md) for table format basics.
 
@@ -17,7 +17,7 @@ description: "Add branching, merging, and tagging to your data catalog with Proj
 ## Prerequisites
 
 - [Part 3: Apache Iceberg Explained](03-apache-iceberg-explained.md)
-- Optional: [Part 2: Getting Started—Setup Guide](02-setup-guide.md) to run commands locally.
+- Optional: [Part 2: Getting Started - Setup Guide](02-setup-guide.md) to run commands locally.
 
 Iceberg gave us time travel. Now let's add **branching**, **merging**, and **tags** to our data with Project Nessie.
 For governance workflows that build on Nessie history, see [Part 10: Metadata & Governance](10-metadata-governance.md).
@@ -38,9 +38,12 @@ git merge  # Promote to main
 
 Nessie brings this same workflow to **data**:
 
-```mermaid
-flowchart LR
-    A[main branch - stable, validated] -->|merge when ready| B[dev branch - experimental, testing]
+```
+main branch (production)     dev branch (development)
+      │                            │
+      │  ← stable, validated       │  ← experimental, testing
+      │                            │
+      └──── merge when ready ──────┘
 ```
 
 ### Nessie Branching Flow (Diagram)
@@ -65,23 +68,29 @@ sequenceDiagram
 
 Without versioning, data work looks like:
 
-```mermaid
-flowchart TD
-    A[Production Data] --> B[Dev transforms it]
-    B --> C[Oops! Broke something]
-    C --> D[Production Data CORRUPTED]
-    D --> E[Lost today's data!]
+```
+Production Data
+    ↓
+  (Dev transforms it)
+    ↓
+  (Oops! Broke something)
+    ↓
+Production Data is CORRUPTED
+    ↓
+(Back up from last night? Lost today's data!)
 ```
 
 With Nessie:
 
-```mermaid
-flowchart TD
-    A[main - production] --> B[dev branch]
-    B --> C[Test transformations]
-    C --> D[Validate quality]
-    D -->|If good, merge| A
-    D -->|If bad, delete branch| E[main unchanged]
+```
+main (production)
+  ↓
+  ├─ dev (development)
+  │   └─ (Test transformations)
+  │   └─ (Validate quality)
+  │   └─ (If bad, delete branch, main unchanged)
+  │
+  └─ (If good, merge dev → main atomically)
 ```
 
 ## Core Nessie Concepts
@@ -132,7 +141,7 @@ main:
 dev (branched from Commit B):
   ├── Commit B': Quality fixes (inherited)
   ├── Commit D: New transformations
-  └── Commit E: Schema optimizations (HEAD)
+  └── Commit E: Schema optimisations (HEAD)
 
 Merge dev → main:
   ├── Commit A: Initial data load
@@ -404,7 +413,7 @@ SELECT
 In dbt, select the target to use the appropriate catalog:
 
 ```yaml
-# workflows/transforms/dbt/profiles.yml
+# workflows/transforms/dbt/profiles/profiles.yml
 
 phlo:
   outputs:
@@ -600,8 +609,8 @@ $ dbt run --select fct_glucose_readings
 $ phlo materialize fct_glucose_readings --partition 2024-01-15
 
 # 4. Validate changes
-$ phlo contract validate glucose_readings
-$ phlo quality run silver.fct_glucose_readings
+$ phlo validate-schema workflows/schemas/glucose.py
+$ phlo catalog describe silver.fct_glucose_readings
 
 # 5. Compare to main
 $ phlo branch diff main feature/add-a1c-calculation
@@ -801,4 +810,4 @@ See also: [Part 3: Apache Iceberg Explained](03-apache-iceberg-explained.md), [P
 - Learn how data is transformed on top of Nessie in [Part 6: dbt Transformations](06-dbt-transformations.md).
 - See governance workflows that build on branches in [Part 10: Metadata & Governance](10-metadata-governance.md).
 
-**Next**: [Part 5: Data Ingestion—Getting Data Into the Lakehouse](05-data-ingestion.md)
+**Next**: [Part 5: Data Ingestion - Getting Data Into the Lakehouse](05-data-ingestion.md)