diff --git a/AGENTS.md b/AGENTS.md index 365da2b5..d8f09af7 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -1,69 +1,138 @@ -# plotnik - -Query language for tree-sitter AST with named subqueries, recursion, and type inference. See [docs/REFERENCE.md](docs/REFERENCE.md) for spec. - -Lexer (logos) + parser (rowan) are resilient: collect errors, don't fail-fast. +# Ethos + +- `AGENTS.md` (this file) is our constitution. You're welcome to propose useful amendments. +- We implement resilient parser, provides user-friendly error messages. +- We call error messages "diagnostics" to avoid confusion with other errors (see `diagnostics/` folder). +- We strive to achieve excellent stability by enforcing invariants in the code: + - `assert!` and `.expect()` for simple cases + - `invariants.rs` otherwise, to skip the coverage of unreachable code + +# Plotnik Query Language + +Plotnik is a strongly-typed, whitespace-delimited pattern matching language for syntax trees (similar to Tree-sitter but stricter). + +## Grammar Synopsis + +- **Root**: List of definitions (`Def = expr`). +- **Nodes**: `(kind child1 child2)` or `(kind)`. +- **Strings**: `"literal"`, `'literal'`. +- **Wildcards**: `_` (matches any node). +- **Sequences**: `{ expr1 expr2 }`. +- **Alternations**: `[ expr1 expr2 ]` (untagged) OR `[ Label: expr1 Label: expr2 ]` (tagged). +- **References**: `(DefName)` (Must be PascalCase, no children). + +## Modifiers & Constraints + +| Feature | Syntax | Constraint | +| :------------- | :--------------- | :----------------------------------------------------- | +| **Field** | `name: expr` | `expr` must match exactly **one** node (no multi-seq). | +| **Negation** | `!name` | Asserts field `name` is absent. | +| **Capture** | `expr @name` | `snake_case`. Suffix. | +| **Type** | `expr ::Type` | `PascalCase` or `::string`. Suffix. | +| **Quantifier** | `*`, `+`, `?` | Greedy. Suffix. | +| **Non-Greedy** | `*?`, `+?`, `??` | Suffix. | +| **Anchor** | `.` | Immediate child anchor. | + +## CRITICAL RULES (Strict Enforcement) + +1. **CASING MATTERS**: + - **Definitions/Refs**: `PascalCase` (e.g., `MethodDecl`, `(MethodDecl)`). + - **Node Kinds**: `snake_case` (e.g., `(identifier)`). + - **Fields/Captures**: `snake_case` (e.g., `name:`, `@val`). + - **Branch Labels**: `PascalCase` (e.g., `[ Ok: (true) Err: (false) ]`). +2. **NO MIXED ALTS**: Alternations must be ALL labeled or ALL unlabeled. +3. **REFS HAVE NO CHILDREN**: + - Does not work: `(MyDef child)` + +## Examples + +```plotnik +// Definition +Function = (function_definition + name: (identifier) @name + parameters: (parameters { + (identifier)* + }) + body: (Block) +) + +// Reference usage +Block = (block { + [ + Stmt: (Statement) + Expr: (Expression) + ]* +}) + +// Alternation with labels +Boolean = [ + True: "true" + False: "false" +] +``` -## Project Structure +# Plotnik Query Data Model and Type Inference + +1. **Flat Scoping (Golden Rule)** + - Query nesting doesn't create data nesting + - `(A (B (C @val)))` → `{ val: Node }`. Intermediate nodes are ignored. + - **New Scope** is created _only_ by capturing a container: `{...} @name` or `[...] @name`. + +2. **Field Generation** + - Only explicit `@capture` creates a field. + - `key: (pattern)` is a structural constraint, **NOT** an extraction. It has nothing to do with tree-sitter fields. + +3. **Cardinality** + - `(x) @k` → `k: T` (Required) + - `(x)? @k` → `k: T?` (Optional) + - `(x)* @k` → `k: T[]` (List) + - `(x)+ @k` → `k: [T, ...T[]]` (Non-empty List) + +4. **Types** + - `(some_node) @x` (default) → `Node` (AST reference). + - `{...} @x` → receives some synthetic name based on the type of parent scope and capture name + - `Query = { (foo) @foo (bar) @bar (baz) @baz } @qux`: + - `@foo`, `@bar`, `@baz`: `Node` for + - `@qux`: `struct QueryQux { foo: Node, bar: Node, baz: Node }` + - entry point: `struct Query { qux : QueryQux }` + - `@x :: string` → `string` (extracts source text). + - `@x :: Type` → `Type` (assigns nominal type to the structure). + +5. **Alternations** + - Tagged: `[ L1: (a) @x L2: (b) @y ]` + → Discriminated Union: `{ tag: "L1", x: Node } | { tag: "L2", y: Node }`. + - Untagged: `[ (a) @x (b) @x ]` + → Merged Struct: `{ x: Node }`. Captures must be type-compatible across branches. + - Mixed: `[ (a) @x (b) ]` (invalid) - the diagnostics will be reported, but we infer as for untagged + → Merged Struct: `{ x: Node }`. Captures must be type-compatible across branches. + +# Project Structure ``` crates/ - plotnik-lib/ # Core library - src/ - diagnostics/ # Diagnostic infrastructure - mod.rs # Diagnostics struct, DiagnosticBuilder - message.rs # DiagnosticMessage, Severity, Fix - printer.rs # DiagnosticsPrinter for rendering - parser/ # Syntax infrastructure - lexer.rs # Token definitions (logos) - cst.rs # SyntaxKind enum - ast.rs # Typed AST wrappers over CST - core.rs # Parser infrastructure - grammar.rs # Grammar rules - invariants.rs # Parser invariant checks - mod.rs # Re-exports, Parse struct, parse() - tests/ # Parser tests (snapshots) - *_tests.rs # Test files (lexer_tests, ast_tests, cst_tests) - query/ # Query processing - mod.rs # Query struct, new(), pipeline - dump.rs # dump_* debug output methods (test-only) - printer.rs # QueryPrinter for AST output - invariants.rs # Query invariant checks - alt_kinds.rs # Alternation validation - symbol_table.rs # Name resolution, symbol table - recursion.rs # Escape analysis (recursion validation) - shapes.rs # Shape inference - *_tests.rs # Test files per module - lib.rs # Re-exports Query, Diagnostics, Error plotnik-cli/ # CLI tool src/commands/ # Subcommands (debug, docs, langs) - plotnik-langs/ # Tree-sitter language bindings + plotnik-core/ # Common code + plotnik-lib/ # Plotnik as library + src/ + diagnostics/ # Diagnostics (user-friendly errors) + parser/ # Syntactic parsing of the query + query/ # Analysis and representation of the parsed query + plotnik-langs/ # Tree-sitter language bindings (wrapped) + plotnik-macros/ # Proc macros of the project docs/ + adr/ # Architecture Decision Records (ADRs) REFERENCE.md # Language specification ``` -## Pipeline - -```rust -parser::parse() // Parse → CST -alt_kinds::validate() // Validate alternation kinds -symbol_table::resolve() // Resolve names → SymbolTable -recursion::validate() // Validate recursion termination -shapes::infer() // Infer and validate shape cardinalities -``` - -Module = "what", function = "action". - -## CLI +# CLI Run: `cargo run -p plotnik-cli -- ` -- `debug` — Inspect queries/sources -- `docs [topic]` — Print docs (reference, examples) +- `debug` — Inspect queries and source file ASTs + - Example: `cargo run -p plotnik-cli -- debug -q '(foo) @bar'` - `langs` — List supported languages -### debug options - Inputs: `-q/--query `, `--query-file `, `--source `, `-s/--source-file `, `-l/--lang ` Output (inferred from input): `--only-symbols`, `--cst`, `--raw`, `--spans`, `--cardinalities` @@ -76,79 +145,44 @@ cargo run -p plotnik-cli -- debug -s app.ts --raw cargo run -p plotnik-cli -- debug -q '(function_declaration) @fn' -s app.ts -l typescript ``` -## Syntax - -Grammar: `(type)`, `[a b]` (alt), `{a b}` (seq), `_` (wildcard), `@name`, `::Type`, `field:`, `*+?`, `"lit"`/`'lit'`, `(a/b)` (supertype), `(ERROR)`, `Name = expr` (def), `[A: ... B: ...]` (tagged alt) - -SyntaxKind: `Root`, `Tree`, `Ref`, `Str`, `Field`, `Capture`, `Type`, `Quantifier`, `Seq`, `Alt`, `Branch`, `Wildcard`, `Anchor`, `NegatedField`, `Def` - -Expr = `Tree | Ref | Str | Alt | Seq | Capture | Quantifier | Field | Wildcard`. Quantifier/Capture wrap their target. `Anchor` and `NegatedField` are predicates (not expressions). - -## Diagnostics - -`Diagnostics` struct collects errors/warnings across passes. Access per-pass or combined: - -```rust -query.parse_diagnostics() // Parse errors -query.alt_kind_diagnostics() // Alternation validation -query.resolve_diagnostics() // Name resolution -query.ref_cycle_diagnostics() // Recursion validation -query.shape_diagnostics() // Shape cardinality validation -query.all_diagnostics() // All combined -query.diagnostics() // Alias for all_diagnostics() -``` - -Render: `query.render_diagnostics()` or `query.render_diagnostics_colored(bool)`. - -Check validity: `query.is_valid()` returns false if any pass has errors (warnings allowed). - -## Constraints +# Coding rules -- Defs must be named except last (entry point) -- Fields: `field: expr` — no sequences as direct values -- Alternations: same-name captures need same type; use `@x :: T` for merged structs; tagged alts for discriminated unions -- `.` anchor = strict adjacency; without = scanning -- Names: `Upper` = user-defined, `lower` = tree-sitter nodes -- Captures: snake_case only, no dots +- Avoid nesting logic: prefer early exit in functions (return) and loops (continue/break) +- Write code comments for seniors, not for juniors -## Data Model +# Testing rules -- Nesting in query ≠ nesting in output: `(a (b @b))` → `{b: Node}` -- New scopes only from captured `{...}@s` or `[...]@c` -- `?`/`*`/`+` = optional/list/non-empty list - -## AST Layer (`parser/ast.rs`) - -Types: `Root`, `Def`, `Tree`, `Ref`, `Str`, `Alt`, `Branch`, `Seq`, `Capture`, `Type`, `Quantifier`, `Field`, `NegatedField`, `Wildcard`, `Anchor`, `Expr` - -Use `Option` for casts, not `TryFrom`. Use `QueryPrinter` from `query/printer.rs` for output. - -## Testing - -Uses `insta` for snapshot testing. - -### File organization +## File organization - Code lives in `foo.rs`, tests live in `foo_tests.rs` - Test module included via `#[cfg(test)] mod foo_tests;` in parent -### Test structure +## CLI commands + +- IMPORTANT: the `debug` is your first tool you should use to test your changes +- Run tests: `cargo nextest run --hide-progress-bar --status-level none --failure-output final` +- We use snapshot testing (`insta`) heavily + - Accept snapshots: `cargo insta accept` + +## Test structure - Separate AAA (Arrange-Act-Assert) parts by blank lines -- Input: string → Output: snapshot of string + - Exception: when the test is 3 or less lines total +- Desired structure: input is string, output is string (snapshot of something) - Single-line input: plain string literal - Multi-line input: `indoc!` macro -- Never write expected snapshot content manually — always `@""` +- IMPORTANT: never write snapshots manually — always use `@""` and then `cargo insta accept` ```rust #[test] fn valid_query() { let input = indoc! {r#" - (function_declaration + (function_declaration name: (identifier) @name) "#}; let query = Query::try_from(input).unwrap(); + assert!(query.is_valid()); insta::assert_snapshot!(query.dump_ast(), @""); } @@ -168,14 +202,7 @@ fn error_case() { } ``` -### Workflow - -```sh -cargo test --workspace -cargo insta accept -``` - -### Patterns by test type +## Patterns by test type - Valid parsing: `assert!(query.is_valid())` + snapshot `dump_*()` output - Error recovery: `assert!(!query.is_valid())` + snapshot `dump_diagnostics()` only @@ -188,39 +215,12 @@ Uses `cargo-llvm-cov`, already installed. Find uncovered lines per file: ```sh -cargo llvm-cov --package plotnik-lib --text --show-missing-lines 2>/dev/null | grep '\.rs: [0-9]\+\(, [0-9]\+\)\*\?' +cargo llvm-cov --package plotnik-lib --text --show-missing-lines 2>/dev/null | grep '\.rs: [0-9]' ``` -## Invariants - -Two-tier resilience strategy: - -1. Parser: resilient, collects errors, continues parsing -2. Our code: strict invariants, maximal coverage in tests, panic on violations - -Invariant checks live in dedicated modules named `invariants.rs`. -They are excluded from test coverage because they're unreachable. -They usually wrap a specific assert. -It was done due to limitation of inline coverage exclusion in Rust. -But it seems to be useful to extract such invariant check helpers anyways: - -- if it just performs assertion and doesn't return value, it starts with `assert_` -- if it returns value, it's name consists of' `ensure_` and some statement about return value - Find any of such files for more examples. - -## Not implemented - -- Semantic validation: casing rules - -## Deferred - -- Predicates (`#match?` etc.) — runtime filters, not structural - -## Rules +### `invariants.rs` -- Update AGENTS.md when changes add useful context -- Check diagnostics after changes -- Follow rnix-parser/taplo patterns -- Span-based tokens, no text in intermediate structures -- Don't put AI slop comments in the code -- IMPORTANT: Avoid nesting logic, prefer early exit code flow in functions (return) and loops (continue/break) +- Contains functions and `impl` blocks for invariant check functionality +- Each function panics on invariant violation, it may or may not return the value +- When returning value, the name is `ensure_something(...)`, where something is related to return value +- When there is no return value, the name is `assert_something(...)` and something is related to function arguments