Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
290 changes: 145 additions & 145 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -1,69 +1,138 @@
# plotnik

Query language for tree-sitter AST with named subqueries, recursion, and type inference. See [docs/REFERENCE.md](docs/REFERENCE.md) for spec.

Lexer (logos) + parser (rowan) are resilient: collect errors, don't fail-fast.
# Ethos

- `AGENTS.md` (this file) is our constitution. You're welcome to propose useful amendments.
- We implement resilient parser, provides user-friendly error messages.
- We call error messages "diagnostics" to avoid confusion with other errors (see `diagnostics/` folder).
- We strive to achieve excellent stability by enforcing invariants in the code:
- `assert!` and `.expect()` for simple cases
- `invariants.rs` otherwise, to skip the coverage of unreachable code

# Plotnik Query Language

Plotnik is a strongly-typed, whitespace-delimited pattern matching language for syntax trees (similar to Tree-sitter but stricter).

## Grammar Synopsis

- **Root**: List of definitions (`Def = expr`).
- **Nodes**: `(kind child1 child2)` or `(kind)`.
- **Strings**: `"literal"`, `'literal'`.
- **Wildcards**: `_` (matches any node).
- **Sequences**: `{ expr1 expr2 }`.
- **Alternations**: `[ expr1 expr2 ]` (untagged) OR `[ Label: expr1 Label: expr2 ]` (tagged).
- **References**: `(DefName)` (Must be PascalCase, no children).

## Modifiers & Constraints

| Feature | Syntax | Constraint |
| :------------- | :--------------- | :----------------------------------------------------- |
| **Field** | `name: expr` | `expr` must match exactly **one** node (no multi-seq). |
| **Negation** | `!name` | Asserts field `name` is absent. |
| **Capture** | `expr @name` | `snake_case`. Suffix. |
| **Type** | `expr ::Type` | `PascalCase` or `::string`. Suffix. |
| **Quantifier** | `*`, `+`, `?` | Greedy. Suffix. |
| **Non-Greedy** | `*?`, `+?`, `??` | Suffix. |
| **Anchor** | `.` | Immediate child anchor. |

## CRITICAL RULES (Strict Enforcement)

1. **CASING MATTERS**:
- **Definitions/Refs**: `PascalCase` (e.g., `MethodDecl`, `(MethodDecl)`).
- **Node Kinds**: `snake_case` (e.g., `(identifier)`).
- **Fields/Captures**: `snake_case` (e.g., `name:`, `@val`).
- **Branch Labels**: `PascalCase` (e.g., `[ Ok: (true) Err: (false) ]`).
2. **NO MIXED ALTS**: Alternations must be ALL labeled or ALL unlabeled.
3. **REFS HAVE NO CHILDREN**:
- Does not work: `(MyDef child)`

## Examples

```plotnik
// Definition
Function = (function_definition
name: (identifier) @name
parameters: (parameters {
(identifier)*
})
body: (Block)
)

// Reference usage
Block = (block {
[
Stmt: (Statement)
Expr: (Expression)
]*
})

// Alternation with labels
Boolean = [
True: "true"
False: "false"
]
```

## Project Structure
# Plotnik Query Data Model and Type Inference

1. **Flat Scoping (Golden Rule)**
- Query nesting doesn't create data nesting
- `(A (B (C @val)))` → `{ val: Node }`. Intermediate nodes are ignored.
- **New Scope** is created _only_ by capturing a container: `{...} @name` or `[...] @name`.

2. **Field Generation**
- Only explicit `@capture` creates a field.
- `key: (pattern)` is a structural constraint, **NOT** an extraction. It has nothing to do with tree-sitter fields.

3. **Cardinality**
- `(x) @k` → `k: T` (Required)
- `(x)? @k` → `k: T?` (Optional)
- `(x)* @k` → `k: T[]` (List)
- `(x)+ @k` → `k: [T, ...T[]]` (Non-empty List)

4. **Types**
- `(some_node) @x` (default) → `Node` (AST reference).
- `{...} @x` → receives some synthetic name based on the type of parent scope and capture name
- `Query = { (foo) @foo (bar) @bar (baz) @baz } @qux`:
- `@foo`, `@bar`, `@baz`: `Node` for
- `@qux`: `struct QueryQux { foo: Node, bar: Node, baz: Node }`
- entry point: `struct Query { qux : QueryQux }`
- `@x :: string` → `string` (extracts source text).
- `@x :: Type` → `Type` (assigns nominal type to the structure).

5. **Alternations**
- Tagged: `[ L1: (a) @x L2: (b) @y ]`
→ Discriminated Union: `{ tag: "L1", x: Node } | { tag: "L2", y: Node }`.
- Untagged: `[ (a) @x (b) @x ]`
→ Merged Struct: `{ x: Node }`. Captures must be type-compatible across branches.
- Mixed: `[ (a) @x (b) ]` (invalid) - the diagnostics will be reported, but we infer as for untagged
→ Merged Struct: `{ x: Node }`. Captures must be type-compatible across branches.

# Project Structure

```
crates/
plotnik-lib/ # Core library
src/
diagnostics/ # Diagnostic infrastructure
mod.rs # Diagnostics struct, DiagnosticBuilder
message.rs # DiagnosticMessage, Severity, Fix
printer.rs # DiagnosticsPrinter for rendering
parser/ # Syntax infrastructure
lexer.rs # Token definitions (logos)
cst.rs # SyntaxKind enum
ast.rs # Typed AST wrappers over CST
core.rs # Parser infrastructure
grammar.rs # Grammar rules
invariants.rs # Parser invariant checks
mod.rs # Re-exports, Parse struct, parse()
tests/ # Parser tests (snapshots)
*_tests.rs # Test files (lexer_tests, ast_tests, cst_tests)
query/ # Query processing
mod.rs # Query struct, new(), pipeline
dump.rs # dump_* debug output methods (test-only)
printer.rs # QueryPrinter for AST output
invariants.rs # Query invariant checks
alt_kinds.rs # Alternation validation
symbol_table.rs # Name resolution, symbol table
recursion.rs # Escape analysis (recursion validation)
shapes.rs # Shape inference
*_tests.rs # Test files per module
lib.rs # Re-exports Query, Diagnostics, Error
plotnik-cli/ # CLI tool
src/commands/ # Subcommands (debug, docs, langs)
plotnik-langs/ # Tree-sitter language bindings
plotnik-core/ # Common code
plotnik-lib/ # Plotnik as library
src/
diagnostics/ # Diagnostics (user-friendly errors)
parser/ # Syntactic parsing of the query
query/ # Analysis and representation of the parsed query
plotnik-langs/ # Tree-sitter language bindings (wrapped)
plotnik-macros/ # Proc macros of the project
docs/
adr/ # Architecture Decision Records (ADRs)
REFERENCE.md # Language specification
```

## Pipeline

```rust
parser::parse() // Parse → CST
alt_kinds::validate() // Validate alternation kinds
symbol_table::resolve() // Resolve names → SymbolTable
recursion::validate() // Validate recursion termination
shapes::infer() // Infer and validate shape cardinalities
```

Module = "what", function = "action".

## CLI
# CLI

Run: `cargo run -p plotnik-cli -- <command>`

- `debug` — Inspect queries/sources
- `docs [topic]` — Print docs (reference, examples)
- `debug` — Inspect queries and source file ASTs
- Example: `cargo run -p plotnik-cli -- debug -q '(foo) @bar'`
- `langs` — List supported languages

### debug options

Inputs: `-q/--query <Q>`, `--query-file <F>`, `--source <S>`, `-s/--source-file <F>`, `-l/--lang <L>`

Output (inferred from input): `--only-symbols`, `--cst`, `--raw`, `--spans`, `--cardinalities`
Expand All @@ -76,79 +145,44 @@ cargo run -p plotnik-cli -- debug -s app.ts --raw
cargo run -p plotnik-cli -- debug -q '(function_declaration) @fn' -s app.ts -l typescript
```

## Syntax

Grammar: `(type)`, `[a b]` (alt), `{a b}` (seq), `_` (wildcard), `@name`, `::Type`, `field:`, `*+?`, `"lit"`/`'lit'`, `(a/b)` (supertype), `(ERROR)`, `Name = expr` (def), `[A: ... B: ...]` (tagged alt)

SyntaxKind: `Root`, `Tree`, `Ref`, `Str`, `Field`, `Capture`, `Type`, `Quantifier`, `Seq`, `Alt`, `Branch`, `Wildcard`, `Anchor`, `NegatedField`, `Def`

Expr = `Tree | Ref | Str | Alt | Seq | Capture | Quantifier | Field | Wildcard`. Quantifier/Capture wrap their target. `Anchor` and `NegatedField` are predicates (not expressions).

## Diagnostics

`Diagnostics` struct collects errors/warnings across passes. Access per-pass or combined:

```rust
query.parse_diagnostics() // Parse errors
query.alt_kind_diagnostics() // Alternation validation
query.resolve_diagnostics() // Name resolution
query.ref_cycle_diagnostics() // Recursion validation
query.shape_diagnostics() // Shape cardinality validation
query.all_diagnostics() // All combined
query.diagnostics() // Alias for all_diagnostics()
```

Render: `query.render_diagnostics()` or `query.render_diagnostics_colored(bool)`.

Check validity: `query.is_valid()` returns false if any pass has errors (warnings allowed).

## Constraints
# Coding rules

- Defs must be named except last (entry point)
- Fields: `field: expr` — no sequences as direct values
- Alternations: same-name captures need same type; use `@x :: T` for merged structs; tagged alts for discriminated unions
- `.` anchor = strict adjacency; without = scanning
- Names: `Upper` = user-defined, `lower` = tree-sitter nodes
- Captures: snake_case only, no dots
- Avoid nesting logic: prefer early exit in functions (return) and loops (continue/break)
- Write code comments for seniors, not for juniors

## Data Model
# Testing rules

- Nesting in query ≠ nesting in output: `(a (b @b))` → `{b: Node}`
- New scopes only from captured `{...}@s` or `[...]@c`
- `?`/`*`/`+` = optional/list/non-empty list

## AST Layer (`parser/ast.rs`)

Types: `Root`, `Def`, `Tree`, `Ref`, `Str`, `Alt`, `Branch`, `Seq`, `Capture`, `Type`, `Quantifier`, `Field`, `NegatedField`, `Wildcard`, `Anchor`, `Expr`

Use `Option<T>` for casts, not `TryFrom`. Use `QueryPrinter` from `query/printer.rs` for output.

## Testing

Uses `insta` for snapshot testing.

### File organization
## File organization

- Code lives in `foo.rs`, tests live in `foo_tests.rs`
- Test module included via `#[cfg(test)] mod foo_tests;` in parent

### Test structure
## CLI commands

- IMPORTANT: the `debug` is your first tool you should use to test your changes
- Run tests: `cargo nextest run --hide-progress-bar --status-level none --failure-output final`
- We use snapshot testing (`insta`) heavily
- Accept snapshots: `cargo insta accept`

## Test structure

- Separate AAA (Arrange-Act-Assert) parts by blank lines
- Input: string → Output: snapshot of string
- Exception: when the test is 3 or less lines total
- Desired structure: input is string, output is string (snapshot of something)
- Single-line input: plain string literal
- Multi-line input: `indoc!` macro
- Never write expected snapshot content manually — always `@""`
- IMPORTANT: never write snapshots manually — always use `@""` and then `cargo insta accept`

```rust
#[test]
fn valid_query() {
let input = indoc! {r#"
(function_declaration
(function_declaration
name: (identifier) @name)
"#};

let query = Query::try_from(input).unwrap();

assert!(query.is_valid());
insta::assert_snapshot!(query.dump_ast(), @"");
}
Expand All @@ -168,14 +202,7 @@ fn error_case() {
}
```

### Workflow

```sh
cargo test --workspace
cargo insta accept
```

### Patterns by test type
## Patterns by test type

- Valid parsing: `assert!(query.is_valid())` + snapshot `dump_*()` output
- Error recovery: `assert!(!query.is_valid())` + snapshot `dump_diagnostics()` only
Expand All @@ -188,39 +215,12 @@ Uses `cargo-llvm-cov`, already installed.
Find uncovered lines per file:

```sh
cargo llvm-cov --package plotnik-lib --text --show-missing-lines 2>/dev/null | grep '\.rs: [0-9]\+\(, [0-9]\+\)\*\?'
cargo llvm-cov --package plotnik-lib --text --show-missing-lines 2>/dev/null | grep '\.rs: [0-9]'
```

## Invariants

Two-tier resilience strategy:

1. Parser: resilient, collects errors, continues parsing
2. Our code: strict invariants, maximal coverage in tests, panic on violations

Invariant checks live in dedicated modules named `invariants.rs`.
They are excluded from test coverage because they're unreachable.
They usually wrap a specific assert.
It was done due to limitation of inline coverage exclusion in Rust.
But it seems to be useful to extract such invariant check helpers anyways:

- if it just performs assertion and doesn't return value, it starts with `assert_`
- if it returns value, it's name consists of' `ensure_` and some statement about return value
Find any of such files for more examples.

## Not implemented

- Semantic validation: casing rules

## Deferred

- Predicates (`#match?` etc.) — runtime filters, not structural

## Rules
### `invariants.rs`

- Update AGENTS.md when changes add useful context
- Check diagnostics after changes
- Follow rnix-parser/taplo patterns
- Span-based tokens, no text in intermediate structures
- Don't put AI slop comments in the code
- IMPORTANT: Avoid nesting logic, prefer early exit code flow in functions (return) and loops (continue/break)
- Contains functions and `impl` blocks for invariant check functionality
- Each function panics on invariant violation, it may or may not return the value
- When returning value, the name is `ensure_something(...)`, where something is related to return value
- When there is no return value, the name is `assert_something(...)` and something is related to function arguments