From e0959ab2ffd88e5066521dc4e5614bbc37aba021 Mon Sep 17 00:00:00 2001 From: Sergei Zharinov Date: Wed, 7 Jan 2026 16:30:04 -0300 Subject: [PATCH] Revise README for clarity and project updates Updated README to reflect project status and features. --- README.md | 259 +++++++++++++++++++----------------------------------- 1 file changed, 89 insertions(+), 170 deletions(-) diff --git a/README.md b/README.md index 0863e291..d026ec4b 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,3 @@ -
-
-

The logo: a curled wood shaving on a workbench

@@ -11,7 +8,7 @@

A type-safe query language for Tree-sitter.
- Query in, typed data out. + Powered by the arborium grammar collection.


@@ -26,202 +23,124 @@

- ⚠️ ALPHA STAGE: not for production use ⚠️
+ + ⚠️ Beta: not for production use ⚠️
+


-
- -## The problem - -Tree-sitter solved parsing. It powers syntax highlighting and code navigation at GitHub, drives the editing experience in Zed, Helix, and Neovim. It gives you a fast, accurate, incremental syntax tree for virtually any language. - -The hard problem now is what comes _after_ parsing: extracting structured data from the tree: - -```typescript -function extractFunction(node: SyntaxNode): FunctionInfo | null { - if (node.type !== "function_declaration") { - return null; - } - const name = node.childForFieldName("name"); - const body = node.childForFieldName("body"); - if (!name || !body) { - return null; - } - return { - name: name.text, - body, - }; -} -``` - -Every extraction requires a new function, each one a potential source of bugs that won't surface until production. - -## The solution - -Plotnik extends Tree-sitter queries with type annotations: - -```clojure -(function_declaration - name: (identifier) @name :: string - body: (statement_block) @body -) @func :: FunctionInfo -``` - -The query describes structure, and Plotnik infers the output type: - -```typescript -interface FunctionInfo { - name: string; - body: SyntaxNode; -} -``` - -This structure is guaranteed by the query engine. No defensive programming needed. - -## But what about Tree-sitter queries? - -Tree-sitter already has queries: - -```clojure -(function_declaration - name: (identifier) @name - body: (statement_block) @body) -``` - -The result is a flat capture list: -```typescript -query.matches(tree.rootNode); -// → [{ captures: [{ name: "name", node }, { name: "body", node }] }, ...] -``` - -The assembly layer is up to you: - -```typescript -const name = match.captures.find((c) => c.name === "name")?.node; -const body = match.captures.find((c) => c.name === "body")?.node; -if (!name || !body) throw new Error("Missing capture"); -return { name: name.text, body }; -``` - -This means string-based lookup, null checks, and manual type definitions kept in sync by convention. - -Tree-sitter queries are designed for matching. Plotnik adds the typing layer: the query _is_ the type definition. - -## Why Plotnik? +Tree-sitter gives you the syntax tree. Extracting structured data from it still means writing imperative navigation code, null checks, and maintaining type definitions by hand. Plotnik makes extraction declarative: write a pattern, get typed data. The query is the type definition. -| Hand-written extraction | Plotnik | -| -------------------------- | ---------------------------- | -| Manual navigation | Declarative pattern matching | -| Runtime type errors | Compile-time type inference | -| Repetitive extraction code | Single-query extraction | -| Ad-hoc data structures | Generated structs/interfaces | +## Features -Plotnik extends Tree-sitter's query syntax with: +- [x] Static type inference from query structure +- [x] Named expressions for composition and reuse +- [x] Recursion for nested structures +- [x] Tagged unions (discriminated unions) +- [x] TypeScript type generation +- [x] CLI: `exec` for matches, `infer` for types, `ast`/`trace`/`dump` for debug +- [ ] Grammar verification (validate queries against tree-sitter node types) +- [ ] Compile-time queries via proc-macro +- [ ] LSP server +- [ ] Editor extensions -- **Named expressions** for composition and reuse -- **Recursion** for arbitrarily nested structures -- **Type annotations** for precise output shapes -- **Alternations**: untagged for simplicity, tagged for precision (discriminated unions) +## Example -## Use cases +Extract function signatures from Rust. `Type` references itself to handle nested generics like `Option>`. -- **Scripting:** Count patterns, extract metrics, audit dependencies -- **Custom linters:** Encode your business rules and architecture constraints -- **LLM Pipelines:** Extract signatures and types as structured data for RAG -- **Code Intelligence:** Outline views, navigation, symbol extraction across grammars - -## Language design - -Start simple—extract all function names from a file: +`query.ptk`: ```clojure -Functions = (program - {(function_declaration name: (identifier) @name :: string)}* @functions) -``` +Type = [ + Simple: [(type_identifier) (primitive_type)] @name :: string + Generic: (generic_type + type: (type_identifier) @name :: string + type_arguments: (type_arguments (Type)* @args)) +] -Plotnik infers the output type: +Func = (function_item + name: (identifier) @name :: string + parameters: (parameters + (parameter + pattern: (identifier) @param :: string + type: (Type) @type + )* @params)) -```typescript -type Functions = { - functions: { name: string }[]; -}; +Funcs = (source_file (Func)* @funcs) ``` -Scale up to tagged unions for richer structure: - -```clojure -Statement = [ - Assign: (assignment_expression - left: (identifier) @target :: string - right: (Expression) @value) - Call: (call_expression - function: (identifier) @func :: string - arguments: (arguments (Expression)* @args)) -] +`lib.rs`: -Expression = [ - Ident: (identifier) @name :: string - Num: (number) @value :: string -] +```rust +fn get(key: Option>) {} -TopDefinitions = (program (Statement)+ @statements) +fn set(key: String, val: i32) {} ``` -This produces: +Plotnik infers TypeScript types from the query structure. `Type` is recursive: `args: Type[]`. -```typescript -type Statement = - | { $tag: "Assign"; $data: { target: string; value: Expression } } - | { $tag: "Call"; $data: { func: string; args: Expression[] } }; +```sh +❯ plotnik infer query.ptk -l rust +export type Type = + | { $tag: "Simple"; $data: { name: string } } + | { $tag: "Generic"; $data: { name: string; args: Type[] } }; -type Expression = - | { $tag: "Ident"; $data: { name: string } } - | { $tag: "Num"; $data: { value: string } }; +export interface Func { + name: string; + params: { param: string; type: Type }[]; +} -type TopDefinitions = { - statements: [Statement, ...Statement[]]; -}; +export interface Funcs { + funcs: Func[]; +} ``` -Then process the results: - -```typescript -for (const stmt of result.statements) { - switch (stmt.$tag) { - case "Assign": - console.log(`Assignment to ${stmt.$data.target}`); - break; - case "Call": - console.log( - `Call to ${stmt.$data.func} with ${stmt.$data.args.length} args`, - ); - break; - } +Run the query against `lib.rs` to extract structured JSON: + +```sh +❯ plotnik exec query.ptk lib.rs +{ + "funcs": [ + { + "name": "get", + "params": [{ + "param": "key", + "type": { + "$tag": "Generic", + "$data": { + "name": "Option", + "args": [{ + "$tag": "Generic", + "$data": { + "name": "Vec", + "args": [{ "$tag": "Simple", "$data": { "name": "String" } }] + } + }] + } + } + }] + }, + { + "name": "set", + "params": [ + { "param": "key", "type": { "$tag": "Simple", "$data": { "name": "String" } } }, + { "param": "val", "type": { "$tag": "Simple", "$data": { "name": "i32" } } } + ] + } + ] } ``` -For the detailed specification, see the [Language Reference](docs/lang-reference.md). +## Why -## Documentation +Pattern matching over syntax trees is powerful, but tree-sitter queries produce flat capture lists. You still need to assemble the results, handle missing captures, and define types by hand. Plotnik closes this gap: the query describes structure, the engine guarantees it. -- [CLI Guide](docs/cli.md) — Command-line tool usage -- [Language Reference](docs/lang-reference.md) — Complete syntax and semantics -- [Type System](docs/type-system.md) — How output types are inferred from queries -- [Runtime Engine](docs/runtime-engine.md) — VM execution model (for contributors) - -## Supported Languages - -Plotnik bundles 15 languages out of the box: Bash, C, C++, CSS, Go, HTML, Java, JavaScript, JSON, Python, Rust, TOML, TSX, TypeScript, and YAML. The underlying [arborium](https://github.com/bearcove/arborium) collection includes 60+ permissively-licensed grammars—additional languages can be enabled as needed. - -## Status - -**Working now:** Parser with error recovery, type inference, query execution, CLI tools (`check`, `dump`, `infer`, `exec`, `trace`, `tree`, `langs`). - -**Next up:** CLI distribution (Homebrew, npm), language bindings (TypeScript/WASM, Python), LSP server, editor extensions. +## Documentation -⚠️ Alpha stage—API may change. Not for production use. +- [CLI Guide](docs/cli.md) +- [Language Reference](docs/lang-reference.md) +- [Type System](docs/type-system.md) ## Acknowledgments