From 5166961e2f8f57ede058a133ad96c459874a8496 Mon Sep 17 00:00:00 2001
From: Sergei Zharinov
Plotnik
+Plotnik
- Typed query language for tree-sitter
+ Typed query language for Tree-sitter. Your queries return typed structs.
+ Captures become fields, quantifiers become arrays, alternations become unions.
@@ -21,34 +24,220 @@
-For more details, see [reference](docs/REFERENCE.md).
+## The problem
+
+Tree-sitter solved parsing. It powers syntax highlighting and code navigation at GitHub, drives the editing experience in Zed, Helix, and Neovim. It gives you a fast, accurate, incremental syntax tree for virtually any language.
+
+The hard problem now is what comes _after_ parsing, extraction of meaning from the tree:
+
+```typescript
+function extractFunction(node: SyntaxNode): FunctionInfo | null {
+ if (node.type !== "function_declaration") {
+ return null;
+ }
+ const name = node.childForFieldName("name");
+ const body = node.childForFieldName("body");
+ if (!name || !body) {
+ return null;
+ }
+ return {
+ name: name.text,
+ body,
+ };
+}
+```
+
+Every extraction requires a new function, each one a potential source of bugs that won't surface until production.
+
+## The solution
+
+Plotnik extends Tree-sitter queries with type annotations:
+
+```clojure
+(function_declaration
+ name: (identifier) @name :: string
+ body: (statement_block) @body
+) @func :: FunctionInfo
+```
+
+The query describes structure, and Plotnik infers the output type:
+
+```typescript
+interface FunctionInfo {
+ name: string;
+ body: SyntaxNode;
+}
+```
+
+This structure is guaranteed by the query engine. No defensive programming needed.
+
+## But what about Tree-sitter queries?
+
+Tree-sitter already has queries:
+
+```scheme
+(function_declaration
+ name: (identifier) @name
+ body: (statement_block) @body)
+```
+
+The result is a flat capture list:
+
+```typescript
+query.matches(tree.rootNode);
+// → [{ captures: [{ name: "name", node }, { name: "body", node }] }, ...]
+```
+
+The assembly layer is up to you:
+
+```typescript
+const name = match.captures.find((c) => c.name === "name")?.node;
+const body = match.captures.find((c) => c.name === "body")?.node;
+if (!name || !body) throw new Error("Missing capture");
+return { name: name.text, body };
+```
+
+This means string-based lookup, null checks, and manual type definitions kept in sync by convention.
+
+Tree-sitter queries are designed for matching. Plotnik adds the typing layer: the query _is_ the type definition.
+
+## Why Plotnik?
+
+| Hand-written extraction | Plotnik |
+| -------------------------- | ---------------------------- |
+| Manual navigation | Declarative pattern matching |
+| Runtime type errors | Compile-time type inference |
+| Repetitive extraction code | Single-query extraction |
+| Ad-hoc data structures | Generated structs/interfaces |
+
+Plotnik extends Tree-sitter's query syntax with:
+
+- **Named expressions** for composition and reuse
+- **Recursion** for arbitrarily nested structures
+- **Type annotations** for precise output shapes
+- **Tagged alternations** for discriminated unions
+
+## Use cases
+
+- **Scripting:** Count patterns, extract metrics, audit dependencies
+- **Custom linters:** Encode your business rules and architecture constraints
+- **LLM Pipelines:** Extract signatures and types as structured data for RAG
+- **Code Intelligence:** Outline views, navigation, symbol extraction across grammars
+
+## Language design
+
+Plotnik builds on Tree-sitter's query syntax, extending it with the features needed for typed extraction:
+
+```clojure
+Statement = [
+ Assign: (assignment_expression
+ left: (identifier) @target :: string
+ right: (Expression) @value)
+ Call: (call_expression
+ function: (identifier) @func :: string
+ arguments: (arguments (Expression)* @args))
+]
+
+Expression = [
+ Ident: (identifier) @name :: string
+ Num: (number) @value :: string
+]
+
+TopDefinitions = (program (Statement)+ @statements)
+```
+
+This produces:
+
+```typescript
+type Statement =
+ | { tag: "Assign"; target: string; value: Expression }
+ | { tag: "Call"; func: string; args: Expression[] };
+
+type Expression =
+ | { tag: "Ident"; name: string }
+ | { tag: "Num"; value: string };
+
+type TopDefinitions = {
+ statements: [Statement, ...Statement[]];
+};
+```
+
+Then process the results:
+
+```typescript
+for (const stmt of result.statements) {
+ switch (stmt.tag) {
+ case "Assign":
+ console.log(`Assignment to ${stmt.target}`);
+ break;
+ case "Call":
+ console.log(`Call to ${stmt.func} with ${stmt.args.length} args`);
+ break;
+ }
+}
+```
+
+For the detailed specification, see the [Language Reference](docs/REFERENCE.md).
+
+## Supported Languages
+
+Plotnik ships with schema support for 26 languages:
+
+> Bash, C, C++, C#, CSS, Elixir, Go, Haskell, HCL, HTML, Java, JavaScript, JSON, Kotlin, Lua, Nix, PHP, Python, Ruby, Rust, Scala, Solidity, Swift, TypeScript, TSX, YAML
+
+Additional languages and dynamic loading are planned.
+
+## Roadmap
+
+### Ignition: the parser ✓
+
+The foundation is complete: a resilient parser that recovers from errors and keeps going.
+
+- [x] Resilient lexer ([`logos`](https://github.com/maciejhirsz/logos)) and parser ([`rowan`](https://github.com/rust-analyzer/rowan)) with error recovery
+- [x] Typed AST layer over concrete syntax tree
+- [x] Rich diagnostics with spans, colored output, related locations, and suggested fixes
+- [x] Name resolution with two-pass symbol table construction
+- [x] Recursion validation via Tarjan SCC analysis (escape path detection)
+- [x] Shape cardinality inference (One vs Many) for field constraint validation
+- [x] Alternation validation (mixed tagged/untagged detection)
+- [ ] Semantic validation: capture naming rules, type annotation consistency
+
+### Liftoff: type inference
+
+The schema infrastructure is built. Type inference is next.
+
+- [x] `node-types.json` parsing and schema representation (`plotnik-core`)
+- [x] Proc macro for compile-time schema embedding (`plotnik-macros`)
+- [x] 26 languages bundled with static node type tables (`plotnik-langs`)
+- [ ] Query validation against language schemas (node types, fields, children)
+- [ ] Full type inference: query → output shape → generated structs
+
+### Acceleration: query engine
-## Roadmap 🚀
+- [ ] Thompson NFA construction for query IR
+- [ ] Runtime execution with backtracking cursor walker
+- [ ] Advanced validation powered by `grammar.json` (production rules, precedence)
+- [ ] Match result API with typed accessors
-**Ignition** _(the parser)_
+### Orbit: developer experience
-- [x] Resuilient query language parser
-- [x] Basic error messages
-- [x] Name resolution
-- [x] Recursion validator
-- [ ] Semantic analyzer
+The CLI foundation exists. The full developer experience is ahead.
-**Liftoff** _(type inference)_
+- [x] CLI framework with `debug`, `docs`, `langs` commands
+- [x] Query inspection: AST dump, symbol table, cardinalities, spans
+- [x] Source inspection: Tree-sitter parse tree visualization
+- [ ] CLI distribution: Homebrew, cargo-binstall, npm wrapper
+- [ ] Compiled queries via Rust proc macros (zero-cost: query → native code)
+- [ ] Language bindings: TypeScript (WASM), Python, Ruby
+- [ ] LSP server: diagnostics, completions, hover, go-to-definition
+- [ ] Editor extensions: VS Code, Zed, Neovim
-- [ ] Basic validation against `node-types.json` schemas
-- [ ] Type inference of the query result shape
+## Acknowledgments
-**Acceleration** _(query engine)_
+[Max Brunsfeld](https://github.com/maxbrunsfeld) created Tree-sitter; [Amaan Qureshi](https://github.com/amaanq) and other contributors maintain the parser ecosystem that makes this project possible.
-- [ ] Thompson construction of query IR
-- [ ] Runtime execution engine
-- [ ] Advanced validation powered by `grammar.json` files
+## License
-**Orbit** _(the tooling)_
+This project is licensed under the [MIT license].
-- [ ] The CLI app available via installers
-- [ ] Compiled queries (using procedural macros)
-- [ ] Enhanced error messages
-- [ ] Bindings (TypeScript, Python, Ruby)
-- [ ] LSP server
-- [ ] Editor support (VSCode, Zed, Neovim)
+[MIT license]: LICENSE.md