diff --git a/README.md b/README.md
index 0863e29..d026ec4 100644
--- a/README.md
+++ b/README.md
@@ -1,6 +1,3 @@
-
-
-
@@ -11,7 +8,7 @@
A type-safe query language for Tree-sitter.
- Query in, typed data out.
+ Powered by the arborium grammar collection.
@@ -26,202 +23,124 @@
- ⚠️ ALPHA STAGE: not for production use ⚠️
+
+ ⚠️ Beta: not for production use ⚠️
+
-
-
-## The problem
-
-Tree-sitter solved parsing. It powers syntax highlighting and code navigation at GitHub, drives the editing experience in Zed, Helix, and Neovim. It gives you a fast, accurate, incremental syntax tree for virtually any language.
-
-The hard problem now is what comes _after_ parsing: extracting structured data from the tree:
-
-```typescript
-function extractFunction(node: SyntaxNode): FunctionInfo | null {
- if (node.type !== "function_declaration") {
- return null;
- }
- const name = node.childForFieldName("name");
- const body = node.childForFieldName("body");
- if (!name || !body) {
- return null;
- }
- return {
- name: name.text,
- body,
- };
-}
-```
-
-Every extraction requires a new function, each one a potential source of bugs that won't surface until production.
-
-## The solution
-
-Plotnik extends Tree-sitter queries with type annotations:
-
-```clojure
-(function_declaration
- name: (identifier) @name :: string
- body: (statement_block) @body
-) @func :: FunctionInfo
-```
-
-The query describes structure, and Plotnik infers the output type:
-
-```typescript
-interface FunctionInfo {
- name: string;
- body: SyntaxNode;
-}
-```
-
-This structure is guaranteed by the query engine. No defensive programming needed.
-
-## But what about Tree-sitter queries?
-
-Tree-sitter already has queries:
-
-```clojure
-(function_declaration
- name: (identifier) @name
- body: (statement_block) @body)
-```
-
-The result is a flat capture list:
-```typescript
-query.matches(tree.rootNode);
-// → [{ captures: [{ name: "name", node }, { name: "body", node }] }, ...]
-```
-
-The assembly layer is up to you:
-
-```typescript
-const name = match.captures.find((c) => c.name === "name")?.node;
-const body = match.captures.find((c) => c.name === "body")?.node;
-if (!name || !body) throw new Error("Missing capture");
-return { name: name.text, body };
-```
-
-This means string-based lookup, null checks, and manual type definitions kept in sync by convention.
-
-Tree-sitter queries are designed for matching. Plotnik adds the typing layer: the query _is_ the type definition.
-
-## Why Plotnik?
+Tree-sitter gives you the syntax tree. Extracting structured data from it still means writing imperative navigation code, null checks, and maintaining type definitions by hand. Plotnik makes extraction declarative: write a pattern, get typed data. The query is the type definition.
-| Hand-written extraction | Plotnik |
-| -------------------------- | ---------------------------- |
-| Manual navigation | Declarative pattern matching |
-| Runtime type errors | Compile-time type inference |
-| Repetitive extraction code | Single-query extraction |
-| Ad-hoc data structures | Generated structs/interfaces |
+## Features
-Plotnik extends Tree-sitter's query syntax with:
+- [x] Static type inference from query structure
+- [x] Named expressions for composition and reuse
+- [x] Recursion for nested structures
+- [x] Tagged unions (discriminated unions)
+- [x] TypeScript type generation
+- [x] CLI: `exec` for matches, `infer` for types, `ast`/`trace`/`dump` for debug
+- [ ] Grammar verification (validate queries against tree-sitter node types)
+- [ ] Compile-time queries via proc-macro
+- [ ] LSP server
+- [ ] Editor extensions
-- **Named expressions** for composition and reuse
-- **Recursion** for arbitrarily nested structures
-- **Type annotations** for precise output shapes
-- **Alternations**: untagged for simplicity, tagged for precision (discriminated unions)
+## Example
-## Use cases
+Extract function signatures from Rust. `Type` references itself to handle nested generics like `Option>`.
-- **Scripting:** Count patterns, extract metrics, audit dependencies
-- **Custom linters:** Encode your business rules and architecture constraints
-- **LLM Pipelines:** Extract signatures and types as structured data for RAG
-- **Code Intelligence:** Outline views, navigation, symbol extraction across grammars
-
-## Language design
-
-Start simple—extract all function names from a file:
+`query.ptk`:
```clojure
-Functions = (program
- {(function_declaration name: (identifier) @name :: string)}* @functions)
-```
+Type = [
+ Simple: [(type_identifier) (primitive_type)] @name :: string
+ Generic: (generic_type
+ type: (type_identifier) @name :: string
+ type_arguments: (type_arguments (Type)* @args))
+]
-Plotnik infers the output type:
+Func = (function_item
+ name: (identifier) @name :: string
+ parameters: (parameters
+ (parameter
+ pattern: (identifier) @param :: string
+ type: (Type) @type
+ )* @params))
-```typescript
-type Functions = {
- functions: { name: string }[];
-};
+Funcs = (source_file (Func)* @funcs)
```
-Scale up to tagged unions for richer structure:
-
-```clojure
-Statement = [
- Assign: (assignment_expression
- left: (identifier) @target :: string
- right: (Expression) @value)
- Call: (call_expression
- function: (identifier) @func :: string
- arguments: (arguments (Expression)* @args))
-]
+`lib.rs`:
-Expression = [
- Ident: (identifier) @name :: string
- Num: (number) @value :: string
-]
+```rust
+fn get(key: Option>) {}
-TopDefinitions = (program (Statement)+ @statements)
+fn set(key: String, val: i32) {}
```
-This produces:
+Plotnik infers TypeScript types from the query structure. `Type` is recursive: `args: Type[]`.
-```typescript
-type Statement =
- | { $tag: "Assign"; $data: { target: string; value: Expression } }
- | { $tag: "Call"; $data: { func: string; args: Expression[] } };
+```sh
+❯ plotnik infer query.ptk -l rust
+export type Type =
+ | { $tag: "Simple"; $data: { name: string } }
+ | { $tag: "Generic"; $data: { name: string; args: Type[] } };
-type Expression =
- | { $tag: "Ident"; $data: { name: string } }
- | { $tag: "Num"; $data: { value: string } };
+export interface Func {
+ name: string;
+ params: { param: string; type: Type }[];
+}
-type TopDefinitions = {
- statements: [Statement, ...Statement[]];
-};
+export interface Funcs {
+ funcs: Func[];
+}
```
-Then process the results:
-
-```typescript
-for (const stmt of result.statements) {
- switch (stmt.$tag) {
- case "Assign":
- console.log(`Assignment to ${stmt.$data.target}`);
- break;
- case "Call":
- console.log(
- `Call to ${stmt.$data.func} with ${stmt.$data.args.length} args`,
- );
- break;
- }
+Run the query against `lib.rs` to extract structured JSON:
+
+```sh
+❯ plotnik exec query.ptk lib.rs
+{
+ "funcs": [
+ {
+ "name": "get",
+ "params": [{
+ "param": "key",
+ "type": {
+ "$tag": "Generic",
+ "$data": {
+ "name": "Option",
+ "args": [{
+ "$tag": "Generic",
+ "$data": {
+ "name": "Vec",
+ "args": [{ "$tag": "Simple", "$data": { "name": "String" } }]
+ }
+ }]
+ }
+ }
+ }]
+ },
+ {
+ "name": "set",
+ "params": [
+ { "param": "key", "type": { "$tag": "Simple", "$data": { "name": "String" } } },
+ { "param": "val", "type": { "$tag": "Simple", "$data": { "name": "i32" } } }
+ ]
+ }
+ ]
}
```
-For the detailed specification, see the [Language Reference](docs/lang-reference.md).
+## Why
-## Documentation
+Pattern matching over syntax trees is powerful, but tree-sitter queries produce flat capture lists. You still need to assemble the results, handle missing captures, and define types by hand. Plotnik closes this gap: the query describes structure, the engine guarantees it.
-- [CLI Guide](docs/cli.md) — Command-line tool usage
-- [Language Reference](docs/lang-reference.md) — Complete syntax and semantics
-- [Type System](docs/type-system.md) — How output types are inferred from queries
-- [Runtime Engine](docs/runtime-engine.md) — VM execution model (for contributors)
-
-## Supported Languages
-
-Plotnik bundles 15 languages out of the box: Bash, C, C++, CSS, Go, HTML, Java, JavaScript, JSON, Python, Rust, TOML, TSX, TypeScript, and YAML. The underlying [arborium](https://github.com/bearcove/arborium) collection includes 60+ permissively-licensed grammars—additional languages can be enabled as needed.
-
-## Status
-
-**Working now:** Parser with error recovery, type inference, query execution, CLI tools (`check`, `dump`, `infer`, `exec`, `trace`, `tree`, `langs`).
-
-**Next up:** CLI distribution (Homebrew, npm), language bindings (TypeScript/WASM, Python), LSP server, editor extensions.
+## Documentation
-⚠️ Alpha stage—API may change. Not for production use.
+- [CLI Guide](docs/cli.md)
+- [Language Reference](docs/lang-reference.md)
+- [Type System](docs/type-system.md)
## Acknowledgments