Semantic Layer for FAIR Data on Lakehouses
A semantic control plane that implements FAIR Data principles on data lakehouses. The system uses LinkML as its core modeling language and connects business concepts to physical data through a chain of mappings:
concept → ontology → logical → physical
We don't access data. We help users find, understand, and govern it.
User: "Show org chart for engineering"
↓
EXTRACT: [org chart, engineering]
↓
RESOLVE: org chart → ✗ unbound
↓
EXPAND: LLM: "org chart needs employee, manager, reports_to"
↓
RESOLVE: employee → Worker ✓, manager → Manager ✓
↓
BIND: Worker → unity://hr/employees
↓
COVER: self-join via manager_id
↓
GENERATE: WITH RECURSIVE hierarchy AS (...)
| Layer | Role | Technology |
|---|---|---|
| Skeleton | What CAN exist (authoritative) | LinkML, SKOS |
| Flesh | What DOES activate (fuzzy) | HNSW vectors (usearch) |
Neither is subordinate. Complementary:
- Flesh without skeleton: no auditability, drifts
- Skeleton without flesh: brittle, no fuzzy matching
| Plane | Question | Standards |
|---|---|---|
| Inventory | What exists? | DCAT, dprod |
| Semantics | What does it mean? | SKOS, LinkML |
| Evidence | Is it true? | PROV, DQV |
| Governance | Can I use it? | ODRL, DCON |
Data Products
(dprod / DCAT)
/ \
/ \
Data Contracts Usage Policies
(DCON) (ODRL)
# Run tests
uv run pytest tests/ -v
# Demo (needs OPENAI_API_KEY)
export OPENAI_API_KEY=sk-...
uv run python scripts/orchestrator_demo.py "show org chart for engineering"| Document | Purpose |
|---|---|
| LLM.md | Shared technical briefing (Claude/Gemini/Codex) |
| AGENTS.md | Governance, personas, decision authority |
| CLAUDE.md | Claude wrapper (loads LLM.md) |
| GEMINI.md | Gemini wrapper (loads LLM.md) |
| CODEX.md | Codex wrapper (loads LLM.md) |
| docs/vision.md | Vision, direction, justification, roadmap |
| docs/architecture.md | Components, mapping chain, service architecture |
| docs/dcon-evolution.md | History of experiments and key learnings |
| Component | Technology |
|---|---|
| Schemas | LinkML (YAML) |
| Vocabulary | SKOS concepts |
| Vector Index | usearch (HNSW) |
| Embeddings | sentence-transformers |
| Graph | NetworkX |
| Mappings | LinkML-Map, SSSOM |
| LLM | OpenAI / Anthropic |
- Ask first, don't guess — Disambiguation over assumption
- LinkML is canonical — Other formats are derived or imported
- Standards over invention — DCAT, DCON, ODRL, SKOS, PROV
- Components over monoliths — Interface contracts, not prescribed implementations
- Skeleton is authoritative — Edit to fix errors immediately
- HNSW is cheap, LLM is expensive — Use vectors first
Phase: Resetting direction. Prior NL→SQL pipeline (88% exit accuracy, 56 tests) archived as proof of concept.
New focus: Semantic layer for FAIR data on lakehouses using LinkML mapping chain.
See docs/vision.md for the roadmap.