A system that transforms unstructured information into structured, evolving knowledge assets using LLMs.
Knowledge Compiler is a deterministic pipeline that converts raw content (articles, notes, PDFs) into structured Markdown-based knowledge units.
Instead of ephemeral AI outputs, this system produces persistent, versioned, and continuously improving knowledge.
The system is designed as a headless knowledge engine, with tools like Obsidian acting as the visualization layer.
- Structured over freeform — every output follows a strict schema
- Deterministic pipelines — no ad-hoc prompting
- Markdown as source of truth — portable, versionable, and human-readable
- Composable system — simple primitives over complex infrastructure
- Incremental refinement — knowledge improves over time
Input Sources → LLM Processing → Structured Markdown → Retrieval & Refinement
-
Input Layer
- Raw text, PDFs, notes, articles
-
Processing Layer
- LLM transforms input into structured knowledge units
-
Storage Layer
- Markdown files (
/knowledge) as the source of truth
- Markdown files (
-
Consumption Layer
- Obsidian or any Markdown-compatible viewer
-
Refinement Layer (future)
- Improves, links, and updates existing knowledge
knowledge-compiler/
│
├── knowledge/ # Compiled knowledge (Markdown files / Obsidian vault)
│ ├── backend/
│
├── pipelines/ # LLM processing logic
├── schemas/ # Knowledge contracts (zod / types)
├── scripts/ # CLI / execution scripts
│
├── README.md
Each knowledge unit follows a strict structure:
---
id: rate-limiting
title: Rate Limiting
tags: [backend, distributed-systems]
created_at: YYYY-MM-DD
updated_at: YYYY-MM-DD
source: article | pdf | manual
---
## Summary
...
## Key Concepts
...
## Deep Dive
...
## Related
- [[Token Bucket]]
- [[Leaky Bucket]]
## Open Questions
...The system treats each Markdown file as a node in a knowledge graph:
- Files → nodes
[[links]]→ edges- Tags → semantic grouping
This enables:
- Graph-based navigation (via Obsidian)
- Context-aware refinement
- Future semantic retrieval
- Read raw input
- Send to LLM with structured prompt
- Validate output format
- Save as Markdown file
- (Future) Refine and link with existing knowledge
npm installOPENAI_API_KEY=your_api_keynpm run generateecho "Rate limiting prevents abuse in distributed systems..." > input.txt
npm run generateOutput:
/knowledge/backend/rate-limiting.md
- Schema validation with zod
- Auto-linking between knowledge units
- Incremental refinement pipeline
- Full-text search
- Semantic search (optional)
- Version diffing and rollback
This project intentionally avoids:
- Heavy RAG pipelines
- Vector databases (early stage)
- Over-engineered abstractions
Focus is on clarity, determinism, and long-term knowledge quality.