Get Phlo running and see your first data pipeline in action in under 10 minutes.
A simple glucose data ingestion pipeline that:
- Fetches data from Nightscout API
- Validates with Pandera schemas
- Stores in Apache Iceberg table
- Views in Dagster UI
- Docker and Docker Compose
- 10 minutes
- Text editor
git clone https://github.com/iamgp/phlo.git
cd phlo
# Initialize infrastructure (generates .phlo/.env and .phlo/.env.local)
phlo services init
# Edit .phlo/.env.local if needed (defaults work for local development)
# Start core services
phlo services start --profile core --profile queryWait for services to start (~60 seconds).
open http://localhost:10006
# Opens http://localhost:10006You'll see the existing glucose ingestion asset already defined!
Open workflows/ingestion/nightscout/glucose.py:
from dlt.sources.rest_api import rest_api
from phlo_dlt.decorator import phlo_ingestion
from workflows.schemas.glucose import RawGlucoseEntries
@phlo_ingestion(
table_name="glucose_entries",
unique_key="_id",
validation_schema=RawGlucoseEntries,
group="nightscout",
cron="0 */1 * * *",
freshness_hours=(1, 24),
)
def glucose_entries(partition_date: str):
"""Ingest Nightscout glucose entries."""
start_time_iso = f"{partition_date}T00:00:00.000Z"
end_time_iso = f"{partition_date}T23:59:59.999Z"
source = rest_api(
client={
"base_url": "https://gwp-diabetes.fly.dev/api/v1",
},
resources=[
{
"name": "entries",
"endpoint": {
"path": "entries.json",
"params": {
"count": 10000,
"find[dateString][$gte]": start_time_iso,
"find[dateString][$lt]": end_time_iso,
},
},
}
],
)
return sourceNotice: Only 60 lines! The @phlo_ingestion decorator handles:
- DLT pipeline setup
- Pandera validation
- Iceberg table creation
- Merge with deduplication
- Timing instrumentation
# Materialize for today's date
phlo materialize dlt_glucose_entries
# Or in Dagster UI:
# Navigate to Assets → dlt_glucose_entries → MaterializeWatch the execution in the Dagster UI. You'll see:
- DLT fetching data from API
- Pandera validation
- Staging to parquet
- Merge to Iceberg
# Connect to Trino
phlo trino --catalog iceberg_dev --schema raw
# Query your data
SELECT _id, sgv, dateString
FROM glucose_entries
ORDER BY dateString DESC
LIMIT 10;import duckdb
conn = duckdb.connect()
# Install Iceberg extension
conn.execute("INSTALL iceberg;")
conn.execute("LOAD iceberg;")
# Query Iceberg table (adapt path to your MinIO endpoint)
result = conn.execute("""
SELECT _id, sgv, dateString
FROM iceberg_scan('s3://warehouse/raw/glucose_entries', ...)
LIMIT 10
""").df()
print(result)In 10 minutes, you:
- Started Phlo's lakehouse platform
- Explored an ingestion asset
- Materialized data to Iceberg
- Queried with SQL engines
Key Concepts:
- Decorator-driven: Minimal boilerplate with
@phlo_ingestion - Schema-first: Pandera validates data quality
- Iceberg tables: ACID transactions, time travel, schema evolution
- Multi-engine: Query with Trino, DuckDB, Spark
- Define schema in
workflows/schemas/mydata.py:
import pandera as pa
from pandera.typing import Series
class RawWeatherData(pa.DataFrameModel):
city_name: Series[str] = pa.Field(description="City name")
temperature: Series[float] = pa.Field(ge=-50, le=50)
timestamp: Series[str] = pa.Field(description="ISO 8601 timestamp")
class Config:
strict = True
coerce = True- Create asset in
workflows/ingestion/weather/observations.py:
from dlt.sources.rest_api import rest_api
from phlo_dlt.decorator import phlo_ingestion
from workflows.schemas.mydata import RawWeatherData
@phlo_ingestion(
table_name="weather_observations",
unique_key="timestamp",
validation_schema=RawWeatherData,
group="weather",
cron="0 */1 * * *",
freshness_hours=(1, 24),
)
def weather_observations(partition_date: str):
"""Ingest weather observations."""
source = rest_api(
client={
"base_url": "https://api.openweathermap.org/data/3.0",
"auth": {"token": "YOUR_API_KEY"},
},
resources=[
{
"name": "observations",
"endpoint": {
"path": "onecall/timemachine",
"params": {"dt": partition_date},
},
}
],
)
return source-
No manual registration is needed. Phlo discovers assets under
workflows/. -
Restart Dagster:
phlo services restart --service dagster- Materialize in UI!
Follow the comprehensive tutorial:
- Workflow Development Guide (42KB, 10-step tutorial)
This covers:
- Bronze/Silver/Gold layers with dbt
- Data quality checks
- Publishing to Postgres
- Scheduling and automation
- Time Travel: Query historical snapshots
- Git-like Branching: Nessie for data versioning
- Data Catalog: OpenMetadata integration
- Observability: Grafana dashboards
- Concepts: Core Concepts - Understand lakehouse fundamentals
- Complete Tutorial: Workflow Development Guide - Build full pipeline
- Best Practices: Best Practices Guide - Production patterns
- Architecture: Architecture - System design
- Troubleshooting: Troubleshooting Guide - Common issues
"Services won't start"
# Check Docker is running
docker ps
# Check logs
phlo services logs -f dagster
# Restart services
phlo services stop
phlo services start --profile core --profile query"Asset not showing in UI"
# Restart Dagster
phlo services restart --service dagster
# Ensure asset file lives under workflows/ and imports cleanly"Validation failed"
# Check schema matches your data types
# Common issue: timestamp as datetime instead of string
# Review Pandera schema in workflows/schemas/"Permission denied in MinIO"
# Check .phlo/.env.local has correct MinIO credentials
# Default: MINIO_ROOT_USER=minioadmin, MINIO_ROOT_PASSWORD=minioadmin74% less boilerplate vs manual Dagster/Iceberg/DLT integration:
| Operation | Manual Code | With Phlo | Reduction |
|---|---|---|---|
| DLT setup | ~50 lines | 0 lines | 100% |
| Iceberg schema | ~40 lines | 0 lines (auto-generated) | 100% |
| Merge logic | ~60 lines | 0 lines | 100% |
| Error handling | ~40 lines | 0 lines | 100% |
| Timing/logging | ~30 lines | 0 lines | 100% |
| Total | ~270 lines | ~60 lines | 74% |
Unique Features:
- Git-like branching for data (Nessie)
- Time travel queries (Iceberg)
- Schema auto-generation (Pandera → PyIceberg)
- Idempotent ingestion (deduplication built-in)
- Multi-engine analytics (Trino, DuckDB, Spark)
- Documentation: docs/index.md
- GitHub Issues: Report bugs and request features
- GitHub Discussions: Ask questions and share ideas
Next: Complete Tutorial | Best Practices