Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
237 changes: 213 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,15 @@
<img width="400" alt="The logo: a curled wood shaving on a workbench" src="https://github.com/user-attachments/assets/1fcef0a9-20f8-4500-960b-f31db3e9fd94" />
</p>

<h1><p align="center">Plotnik</p></h1>
<h1><p align="center">Plotnik</p></h1>

<p align="center">
Typed query language for <a href="https://tree-sitter.github.io/">tree-sitter</a>
Typed query language for <a href="https://tree-sitter.github.io/">Tree-sitter</a>. Your queries return typed structs.<br/>
Captures become fields, quantifiers become arrays, alternations become unions.
</p>

<br/>

<p align="center">
<a href="https://github.com/plotnik-lang/plotnik/actions/workflows/stable.yml"><img src="https://github.com/plotnik-lang/plotnik/actions/workflows/stable.yml/badge.svg" alt="stable"></a>
<a href="https://github.com/plotnik-lang/plotnik/actions/workflows/nightly.yml"><img src="https://github.com/plotnik-lang/plotnik/actions/workflows/nightly.yml/badge.svg" alt="nightly"></a>
Expand All @@ -21,34 +24,220 @@
<br/>
<br/>

For more details, see [reference](docs/REFERENCE.md).
## The problem

Tree-sitter solved parsing. It powers syntax highlighting and code navigation at GitHub, drives the editing experience in Zed, Helix, and Neovim. It gives you a fast, accurate, incremental syntax tree for virtually any language.

The hard problem now is what comes _after_ parsing, extraction of meaning from the tree:

```typescript
function extractFunction(node: SyntaxNode): FunctionInfo | null {
if (node.type !== "function_declaration") {
return null;
}
const name = node.childForFieldName("name");
const body = node.childForFieldName("body");
if (!name || !body) {
return null;
}
return {
name: name.text,
body,
};
}
```

Every extraction requires a new function, each one a potential source of bugs that won't surface until production.

## The solution

Plotnik extends Tree-sitter queries with type annotations:

```clojure
(function_declaration
name: (identifier) @name :: string
body: (statement_block) @body
) @func :: FunctionInfo
```

The query describes structure, and Plotnik infers the output type:

```typescript
interface FunctionInfo {
name: string;
body: SyntaxNode;
}
```

This structure is guaranteed by the query engine. No defensive programming needed.

## But what about Tree-sitter queries?

Tree-sitter already has queries:

```scheme
(function_declaration
name: (identifier) @name
body: (statement_block) @body)
```

The result is a flat capture list:

```typescript
query.matches(tree.rootNode);
// → [{ captures: [{ name: "name", node }, { name: "body", node }] }, ...]
```

The assembly layer is up to you:

```typescript
const name = match.captures.find((c) => c.name === "name")?.node;
const body = match.captures.find((c) => c.name === "body")?.node;
if (!name || !body) throw new Error("Missing capture");
return { name: name.text, body };
```

This means string-based lookup, null checks, and manual type definitions kept in sync by convention.

Tree-sitter queries are designed for matching. Plotnik adds the typing layer: the query _is_ the type definition.

## Why Plotnik?

| Hand-written extraction | Plotnik |
| -------------------------- | ---------------------------- |
| Manual navigation | Declarative pattern matching |
| Runtime type errors | Compile-time type inference |
| Repetitive extraction code | Single-query extraction |
| Ad-hoc data structures | Generated structs/interfaces |

Plotnik extends Tree-sitter's query syntax with:

- **Named expressions** for composition and reuse
- **Recursion** for arbitrarily nested structures
- **Type annotations** for precise output shapes
- **Tagged alternations** for discriminated unions

## Use cases

- **Scripting:** Count patterns, extract metrics, audit dependencies
- **Custom linters:** Encode your business rules and architecture constraints
- **LLM Pipelines:** Extract signatures and types as structured data for RAG
- **Code Intelligence:** Outline views, navigation, symbol extraction across grammars

## Language design

Plotnik builds on Tree-sitter's query syntax, extending it with the features needed for typed extraction:

```clojure
Statement = [
Assign: (assignment_expression
left: (identifier) @target :: string
right: (Expression) @value)
Call: (call_expression
function: (identifier) @func :: string
arguments: (arguments (Expression)* @args))
]

Expression = [
Ident: (identifier) @name :: string
Num: (number) @value :: string
]

TopDefinitions = (program (Statement)+ @statements)
```

This produces:

```typescript
type Statement =
| { tag: "Assign"; target: string; value: Expression }
| { tag: "Call"; func: string; args: Expression[] };

type Expression =
| { tag: "Ident"; name: string }
| { tag: "Num"; value: string };

type TopDefinitions = {
statements: [Statement, ...Statement[]];
};
```

Then process the results:

```typescript
for (const stmt of result.statements) {
switch (stmt.tag) {
case "Assign":
console.log(`Assignment to ${stmt.target}`);
break;
case "Call":
console.log(`Call to ${stmt.func} with ${stmt.args.length} args`);
break;
}
}
```

For the detailed specification, see the [Language Reference](docs/REFERENCE.md).

## Supported Languages

Plotnik ships with schema support for 26 languages:

> Bash, C, C++, C#, CSS, Elixir, Go, Haskell, HCL, HTML, Java, JavaScript, JSON, Kotlin, Lua, Nix, PHP, Python, Ruby, Rust, Scala, Solidity, Swift, TypeScript, TSX, YAML

Additional languages and dynamic loading are planned.

## Roadmap

### Ignition: the parser ✓

The foundation is complete: a resilient parser that recovers from errors and keeps going.

- [x] Resilient lexer ([`logos`](https://github.com/maciejhirsz/logos)) and parser ([`rowan`](https://github.com/rust-analyzer/rowan)) with error recovery
- [x] Typed AST layer over concrete syntax tree
- [x] Rich diagnostics with spans, colored output, related locations, and suggested fixes
- [x] Name resolution with two-pass symbol table construction
- [x] Recursion validation via Tarjan SCC analysis (escape path detection)
- [x] Shape cardinality inference (One vs Many) for field constraint validation
- [x] Alternation validation (mixed tagged/untagged detection)
- [ ] Semantic validation: capture naming rules, type annotation consistency

### Liftoff: type inference

The schema infrastructure is built. Type inference is next.

- [x] `node-types.json` parsing and schema representation (`plotnik-core`)
- [x] Proc macro for compile-time schema embedding (`plotnik-macros`)
- [x] 26 languages bundled with static node type tables (`plotnik-langs`)
- [ ] Query validation against language schemas (node types, fields, children)
- [ ] Full type inference: query → output shape → generated structs

### Acceleration: query engine

## Roadmap 🚀
- [ ] Thompson NFA construction for query IR
- [ ] Runtime execution with backtracking cursor walker
- [ ] Advanced validation powered by `grammar.json` (production rules, precedence)
- [ ] Match result API with typed accessors

**Ignition** _(the parser)_
### Orbit: developer experience

- [x] Resuilient query language parser
- [x] Basic error messages
- [x] Name resolution
- [x] Recursion validator
- [ ] Semantic analyzer
The CLI foundation exists. The full developer experience is ahead.

**Liftoff** _(type inference)_
- [x] CLI framework with `debug`, `docs`, `langs` commands
- [x] Query inspection: AST dump, symbol table, cardinalities, spans
- [x] Source inspection: Tree-sitter parse tree visualization
- [ ] CLI distribution: Homebrew, cargo-binstall, npm wrapper
- [ ] Compiled queries via Rust proc macros (zero-cost: query → native code)
- [ ] Language bindings: TypeScript (WASM), Python, Ruby
- [ ] LSP server: diagnostics, completions, hover, go-to-definition
- [ ] Editor extensions: VS Code, Zed, Neovim

- [ ] Basic validation against `node-types.json` schemas
- [ ] Type inference of the query result shape
## Acknowledgments

**Acceleration** _(query engine)_
[Max Brunsfeld](https://github.com/maxbrunsfeld) created Tree-sitter; [Amaan Qureshi](https://github.com/amaanq) and other contributors maintain the parser ecosystem that makes this project possible.

- [ ] Thompson construction of query IR
- [ ] Runtime execution engine
- [ ] Advanced validation powered by `grammar.json` files
## License

**Orbit** _(the tooling)_
This project is licensed under the [MIT license].

- [ ] The CLI app available via installers
- [ ] Compiled queries (using procedural macros)
- [ ] Enhanced error messages
- [ ] Bindings (TypeScript, Python, Ruby)
- [ ] LSP server
- [ ] Editor support (VSCode, Zed, Neovim)
[MIT license]: LICENSE.md