Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
259 changes: 89 additions & 170 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,3 @@
<br/>
<br/>

<p align="center">
<img width="400" alt="The logo: a curled wood shaving on a workbench" src="https://github.com/user-attachments/assets/8f1162aa-5769-415d-babe-56b962256747" />
</p>
Expand All @@ -11,7 +8,7 @@

<p align="center">
A type-safe query language for <a href="https://tree-sitter.github.io">Tree-sitter</a>.<br/>
Query in, typed data out.
Powered by the <a href="https://github.com/bearcove/arborium">arborium</a> grammar collection.
</p>

<br/>
Expand All @@ -26,202 +23,124 @@
<br/>

<p align="center">
⚠️ <a href="#status">ALPHA STAGE</a>: not for production use ⚠️<br/>
<sub>
⚠️ Beta: not for production use ⚠️<br/>
</sub>
</p>

<br/>
<br/>

## The problem

Tree-sitter solved parsing. It powers syntax highlighting and code navigation at GitHub, drives the editing experience in Zed, Helix, and Neovim. It gives you a fast, accurate, incremental syntax tree for virtually any language.

The hard problem now is what comes _after_ parsing: extracting structured data from the tree:

```typescript
function extractFunction(node: SyntaxNode): FunctionInfo | null {
if (node.type !== "function_declaration") {
return null;
}
const name = node.childForFieldName("name");
const body = node.childForFieldName("body");
if (!name || !body) {
return null;
}
return {
name: name.text,
body,
};
}
```

Every extraction requires a new function, each one a potential source of bugs that won't surface until production.

## The solution

Plotnik extends Tree-sitter queries with type annotations:

```clojure
(function_declaration
name: (identifier) @name :: string
body: (statement_block) @body
) @func :: FunctionInfo
```

The query describes structure, and Plotnik infers the output type:

```typescript
interface FunctionInfo {
name: string;
body: SyntaxNode;
}
```

This structure is guaranteed by the query engine. No defensive programming needed.

## But what about Tree-sitter queries?

Tree-sitter already has queries:

```clojure
(function_declaration
name: (identifier) @name
body: (statement_block) @body)
```

The result is a flat capture list:

```typescript
query.matches(tree.rootNode);
// → [{ captures: [{ name: "name", node }, { name: "body", node }] }, ...]
```

The assembly layer is up to you:

```typescript
const name = match.captures.find((c) => c.name === "name")?.node;
const body = match.captures.find((c) => c.name === "body")?.node;
if (!name || !body) throw new Error("Missing capture");
return { name: name.text, body };
```

This means string-based lookup, null checks, and manual type definitions kept in sync by convention.

Tree-sitter queries are designed for matching. Plotnik adds the typing layer: the query _is_ the type definition.

## Why Plotnik?
Tree-sitter gives you the syntax tree. Extracting structured data from it still means writing imperative navigation code, null checks, and maintaining type definitions by hand. Plotnik makes extraction declarative: write a pattern, get typed data. The query is the type definition.

| Hand-written extraction | Plotnik |
| -------------------------- | ---------------------------- |
| Manual navigation | Declarative pattern matching |
| Runtime type errors | Compile-time type inference |
| Repetitive extraction code | Single-query extraction |
| Ad-hoc data structures | Generated structs/interfaces |
## Features

Plotnik extends Tree-sitter's query syntax with:
- [x] Static type inference from query structure
- [x] Named expressions for composition and reuse
- [x] Recursion for nested structures
- [x] Tagged unions (discriminated unions)
- [x] TypeScript type generation
- [x] CLI: `exec` for matches, `infer` for types, `ast`/`trace`/`dump` for debug
- [ ] Grammar verification (validate queries against tree-sitter node types)
- [ ] Compile-time queries via proc-macro
- [ ] LSP server
- [ ] Editor extensions

- **Named expressions** for composition and reuse
- **Recursion** for arbitrarily nested structures
- **Type annotations** for precise output shapes
- **Alternations**: untagged for simplicity, tagged for precision (discriminated unions)
## Example

## Use cases
Extract function signatures from Rust. `Type` references itself to handle nested generics like `Option<Vec<String>>`.

- **Scripting:** Count patterns, extract metrics, audit dependencies
- **Custom linters:** Encode your business rules and architecture constraints
- **LLM Pipelines:** Extract signatures and types as structured data for RAG
- **Code Intelligence:** Outline views, navigation, symbol extraction across grammars

## Language design

Start simple—extract all function names from a file:
`query.ptk`:

```clojure
Functions = (program
{(function_declaration name: (identifier) @name :: string)}* @functions)
```
Type = [
Simple: [(type_identifier) (primitive_type)] @name :: string
Generic: (generic_type
type: (type_identifier) @name :: string
type_arguments: (type_arguments (Type)* @args))
]

Plotnik infers the output type:
Func = (function_item
name: (identifier) @name :: string
parameters: (parameters
(parameter
pattern: (identifier) @param :: string
type: (Type) @type
)* @params))

```typescript
type Functions = {
functions: { name: string }[];
};
Funcs = (source_file (Func)* @funcs)
```

Scale up to tagged unions for richer structure:

```clojure
Statement = [
Assign: (assignment_expression
left: (identifier) @target :: string
right: (Expression) @value)
Call: (call_expression
function: (identifier) @func :: string
arguments: (arguments (Expression)* @args))
]
`lib.rs`:

Expression = [
Ident: (identifier) @name :: string
Num: (number) @value :: string
]
```rust
fn get(key: Option<Vec<String>>) {}

TopDefinitions = (program (Statement)+ @statements)
fn set(key: String, val: i32) {}
```

This produces:
Plotnik infers TypeScript types from the query structure. `Type` is recursive: `args: Type[]`.

```typescript
type Statement =
| { $tag: "Assign"; $data: { target: string; value: Expression } }
| { $tag: "Call"; $data: { func: string; args: Expression[] } };
```sh
❯ plotnik infer query.ptk -l rust
export type Type =
| { $tag: "Simple"; $data: { name: string } }
| { $tag: "Generic"; $data: { name: string; args: Type[] } };

type Expression =
| { $tag: "Ident"; $data: { name: string } }
| { $tag: "Num"; $data: { value: string } };
export interface Func {
name: string;
params: { param: string; type: Type }[];
}

type TopDefinitions = {
statements: [Statement, ...Statement[]];
};
export interface Funcs {
funcs: Func[];
}
```

Then process the results:

```typescript
for (const stmt of result.statements) {
switch (stmt.$tag) {
case "Assign":
console.log(`Assignment to ${stmt.$data.target}`);
break;
case "Call":
console.log(
`Call to ${stmt.$data.func} with ${stmt.$data.args.length} args`,
);
break;
}
Run the query against `lib.rs` to extract structured JSON:

```sh
❯ plotnik exec query.ptk lib.rs
{
"funcs": [
{
"name": "get",
"params": [{
"param": "key",
"type": {
"$tag": "Generic",
"$data": {
"name": "Option",
"args": [{
"$tag": "Generic",
"$data": {
"name": "Vec",
"args": [{ "$tag": "Simple", "$data": { "name": "String" } }]
}
}]
}
}
}]
},
{
"name": "set",
"params": [
{ "param": "key", "type": { "$tag": "Simple", "$data": { "name": "String" } } },
{ "param": "val", "type": { "$tag": "Simple", "$data": { "name": "i32" } } }
]
}
]
}
```

For the detailed specification, see the [Language Reference](docs/lang-reference.md).
## Why

## Documentation
Pattern matching over syntax trees is powerful, but tree-sitter queries produce flat capture lists. You still need to assemble the results, handle missing captures, and define types by hand. Plotnik closes this gap: the query describes structure, the engine guarantees it.

- [CLI Guide](docs/cli.md) — Command-line tool usage
- [Language Reference](docs/lang-reference.md) — Complete syntax and semantics
- [Type System](docs/type-system.md) — How output types are inferred from queries
- [Runtime Engine](docs/runtime-engine.md) — VM execution model (for contributors)

## Supported Languages

Plotnik bundles 15 languages out of the box: Bash, C, C++, CSS, Go, HTML, Java, JavaScript, JSON, Python, Rust, TOML, TSX, TypeScript, and YAML. The underlying [arborium](https://github.com/bearcove/arborium) collection includes 60+ permissively-licensed grammars—additional languages can be enabled as needed.

## Status

**Working now:** Parser with error recovery, type inference, query execution, CLI tools (`check`, `dump`, `infer`, `exec`, `trace`, `tree`, `langs`).

**Next up:** CLI distribution (Homebrew, npm), language bindings (TypeScript/WASM, Python), LSP server, editor extensions.
## Documentation

⚠️ Alpha stage—API may change. Not for production use.
- [CLI Guide](docs/cli.md)
- [Language Reference](docs/lang-reference.md)
- [Type System](docs/type-system.md)

## Acknowledgments

Expand Down