diff --git a/ANNOTATIONS.md b/ANNOTATIONS.md new file mode 100644 index 0000000..2ef9745 --- /dev/null +++ b/ANNOTATIONS.md @@ -0,0 +1,178 @@ +# `#pragma pagurus` Annotations Reference + +## Overview + +pagurus uses pragma annotations to specify lifetime constraints, drop functions, and borrow management. These annotations guide the borrow checker's analysis and enable move semantics for resource-owning types. + +## 1. Borrow release + +Explicitly release a borrow before its natural scope end (mirrors Rust `drop(borrow)`): + +```c +#pragma pagurus borrow_end(p) +``` + +This tells the checker that the borrow through pointer `p` ends at this point, even though `p` may still be in scope. Useful for manually resolving borrow conflicts. + +## 2. Function lifetime annotations + +### Concise syntax (preferred) + +Place immediately before a function declaration. Space-separated tokens represent pointer parameter lifetimes; comma-grouped tokens represent struct parameter lifetimes. + +#### Simple pointer parameters + +Both params constrain the return (lifetime 'a for all): + +```c +#pragma pagurus 'a 'a -> 'a +char *func1(char *a, char *b); +``` + +Only the first param constrains the return (lifetime 'a): + +```c +#pragma pagurus 'a 'b -> 'a +int *get_x(int *x, int *y); +``` + +#### Struct parameters with multiple lifetimes + +Comma-grouped syntax `'b,'a` indicates a struct parameter instantiated with lifetimes 'b (first) and 'a (second) from the struct's declaration: + +```c +#pragma pagurus 'b,'a -> 'b +char *get_data(struct Buf buf); +``` + +The checker resolves which field is returned by matching lifetime 'b to the struct's field annotations (see section 4). + +### Legacy syntax + +Backward compatible with earlier versions: + +```c +#pragma pagurus lifetime(get_ptr, 0) +``` + +This annotation indicates that the return value's lifetime is tied to parameter 0. + +## 3. Struct and field lifetime annotations + +Declare the struct's lifetime parameters and each field's lifetime: + +```c +#pragma pagurus 'a 'b +struct Buf { + #pragma pagurus 'a + const char *data; /* lifetime 'a */ + #pragma pagurus 'b + char *scratch; /* lifetime 'b */ +}; +``` + +**Validation rules:** +- Field with undeclared lifetime → error: "unknown lifetime 'e' in field ..." +- Declared lifetime parameter with no field → error: "declared but not bound 'c' ..." + +All declared lifetimes must be used, and all field lifetimes must be declared. + +## 4. Drop function annotation + +Designates the next function declaration as the required destructor for a C struct or typedef type: + +```c +#pragma pagurus drop(TypeName) +void destructor_fn(TypeName value); +``` + +**Requirements:** +- `TypeName` may be a struct tag name OR a typedef name +- The drop handler must: + - Return `void` + - Take exactly one parameter of type `T` +- Invalid signatures are reported as errors and not registered + +### Examples + +Named struct (tag name): + +```c +#pragma pagurus drop(StrStruct) +void str_free(struct StrStruct s); +``` + +Using a typedef name: + +```c +typedef struct StrStruct { ... } StrType; +#pragma pagurus drop(StrType) +void str_type_free(StrType s); +``` + +### Effect on error checking + +Any local variable of a drop-annotated type that goes out of scope without the drop function being called triggers **E020[missing-drop]**: + +| Plugin flags | E020 severity | Effect | +|---|---|---| +| `-fplugin=` only | **error** | No automatic injection; the drop call must be written explicitly in source | +| `-fplugin= -fpass-plugin=` | **warning** | IR pass (`PagurusDropPass`) auto-injects the drop call into the binary | + +### Move semantics + +Drop-annotated types have **move semantics** — assignment transfers ownership: + +```c +#pragma pagurus drop(StrStruct) +void str_free(struct StrStruct s); + +void example(void) { + struct StrStruct a = str_with_cap(64); + struct StrStruct b = a; // ownership moves to b — a is now Moved + str_free(a); // error: E019[use-after-move] + str_free(b); // OK +} +``` + +Structs without drop annotations are **copy-able** and can be freely assigned without transferring ownership. + +## 5. Lifetime elision (rule 1 — extended) + +For function **declarations** with exactly one "significant" parameter and a pointer return type, the checker automatically infers the return constraint without requiring an explicit annotation. + +A parameter is "significant" when it is: +- A pointer parameter (`T *`), **or** +- A struct parameter passed by value where the struct has exactly one declared lifetime parameter + +### Examples + +Automatically inferred (no annotation needed): + +```c +char *first_char(char *s); +``` + +Redundant annotation (emits warning: "can be elided"): + +```c +#pragma pagurus 'a -> 'a +char *dup(char *s); +``` + +Single-lifetime struct (annotation can be elided): + +```c +#pragma pagurus 'b +struct Buf { #pragma pagurus 'b char *scratch; }; + +#pragma pagurus 'a -> 'a // Warning: can be elided +char *get_data(struct Buf buf); +``` + +The single-lifetime struct annotation itself also warns: +"explicit lifetime declaration for struct 'Buf' can be elided". + +## Error code reference + +For detailed information about specific error codes (E001–E021), see the main [README.md](README.md#checks). diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md new file mode 100644 index 0000000..7675717 --- /dev/null +++ b/ARCHITECTURE.md @@ -0,0 +1,103 @@ +# pagurus Architecture + +## Overview + +``` +your_file.c + │ + ├─ clang -fplugin=./pagurus_plugin.so + │ └─ C++ ASTPlugin + │ • FunctionSummaryVisitor (pass 1) + │ – Fixpoint return-alias summaries (direct, transitive, + │ pointer-arithmetic, conditional) + │ • FunctionEffectVisitor (pass 1b) + │ – Callee-first topological order via clang::CallGraph + │ – Per-param effects: frees / mutBorrows / sharedBorrows + │ • PagurusVisitor (pass 2, RecursiveASTVisitor) + │ – E001–E021 diagnostics with NLL loan release + │ – Conditional/loop loan propagation + │ – Inter-procedural loans via pass-1/1b summaries + │ – Drop semantics: E019/E020/E021 for #pragma pagurus drop(T) + │ • Source-to-source transformation (compile mode, default) + │ – Writes .pagurus.c + │ – Strips #pragma pagurus lines + │ – Injects missing drop calls at scope-exit / early-return + │ • Dry-run mode (-plugin-arg-pagurus dry-run) + │ – Reports what would change; emits E020; no file written + │ + └─ clang -fpass-plugin=./pagurus_plugin.so + └─ PagurusDropPass (module pass, PipelineStartEP) + • Scans all functions for allocas of drop-annotated struct types + • Injects drop-function calls at llvm.lifetime.end / ret + • DominatorTree: no double-drop injection + └─ PagurusIRPass + • AliasAnalysis + MemorySSA + DominatorTree + • IR-E001/E001b: load/store after free + • IR-E002: double-free + • IR-E015/loop: concurrent borrow sites, MemoryPhi + • IR-E018: AtomicRMW/CmpXchg on borrowed alloca +``` + +## Why C++ for this plugin? + +| Capability | C++ plugin | Rust (clang-sys) | +|---|---|---| +| Parse overhead | Zero (accesses host ASTContext directly) | Double-parse via libclang subprocess | +| `RecursiveASTVisitor` | ✅ full typed traversal | ❌ C API cursor walk only | +| `LiveVariables` / CFG | ✅ direct | ❌ not exposed in C API | +| `DiagnosticsEngine` | ✅ inline with source | Limited | +| LLVM IR pass | ✅ `AliasAnalysis`, `MemorySSA`, `DominatorTree` | Stub only | +| `-fplugin=` registration | ✅ `FrontendPluginRegistry::Add` | ❌ requires C++ template static initializer | + +## Analysis Levels + +pagurus operates at two levels simultaneously: + +| Entry point | Load flag | Analysis | +|---|---|---| +| Clang AST plugin | `-fplugin=./pagurus_plugin.so` | RecursiveASTVisitor, precise column diagnostics | +| LLVM IR pass | `-fpass-plugin=./pagurus_plugin.so` | AliasAnalysis + MemorySSA + DominatorTree | + +### AST level (`-fplugin=`) + +The AST-level plugin provides: + +- **E001–E021 checks:** Covers use-after-free, double-free, null-deref, memory leaks, borrow conflicts, move semantics, and drop semantics +- **NLL (Non-Lexical Lifetimes):** Precise borrow tracking with loan release at last use +- **CFG-aware:** Conditional and loop loan propagation +- **Inter-procedural analysis:** Function summaries for return-alias and parameter effects +- **Source transformation:** Compile mode (default) strips pragmas and injects drop calls + +### LLVM IR level (`-fpass-plugin=`) + +The IR-level pass catches patterns invisible at the AST level: + +- **Bit-cast / type-pun aliases:** Detects aliasing through union type punning or explicit bitcast +- **GEP pointer arithmetic:** Tracks pointer arithmetic via GetElementPtr instructions +- **Loop-carried borrows:** Uses MemorySSA φ-nodes to detect borrows across loop back-edges +- **Atomic instruction races:** Detects AtomicRMW/CmpXchg on borrowed memory +- **Drop injection:** Automatically inserts drop calls at `llvm.lifetime.end` intrinsics + +IR checks use: +- `AliasAnalysis::isMustAlias` for precise aliasing +- `MemorySSA` for def-def pairs and loop-carried dependencies +- `DominatorTree` for double-drop prevention and post-dominance + +## Compile mode vs Dry-run mode + +| Mode | Flag | E020 | Output file | +|---|---|---|---| +| Compile (default) | *(none)* | suppressed (auto-injected) | `.pagurus.c` written | +| Dry-run | `-Xclang -plugin-arg-pagurus -Xclang dry-run` | reported | none | + +**Compile mode** is the default behavior: +- Strips all `#pragma pagurus` annotations +- Injects missing drop calls at scope-exit +- Produces plain C code in `.pagurus.c` +- E020 is suppressed because the transformation fixes it + +**Dry-run mode** is an inspector mode: +- Reports all diagnostics including E020 +- Prints a textual report of what changes would be made +- Does not write any output file +- Useful for code review and understanding required changes diff --git a/BUILDING.md b/BUILDING.md new file mode 100644 index 0000000..a38ca8b --- /dev/null +++ b/BUILDING.md @@ -0,0 +1,192 @@ +# Building pagurus + +## Prerequisites + +### Ubuntu 24.04 + +LLVM 14–18 are in the standard `universe` repository: + +```bash +# Replace 18 with 14, 15, 16, or 17 for older LLVM versions. +sudo apt install clang-18 llvm-18-dev libclang-18-dev cmake +``` + +### Ubuntu 22.04 + +LLVM 11–14 are in the standard `universe` repository; LLVM 15–18 +require the [official LLVM apt repository](https://apt.llvm.org/): + +```bash +LLVM_VER=18 # 11–14 need no PPA; 15–18 require the lines below +wget -qO- https://apt.llvm.org/llvm-snapshot.gpg.key \ + | sudo tee /etc/apt/trusted.gpg.d/apt.llvm.org.asc >/dev/null +echo "deb http://apt.llvm.org/jammy/ llvm-toolchain-jammy-${LLVM_VER} main" \ + | sudo tee /etc/apt/sources.list.d/llvm-${LLVM_VER}.list +sudo apt-get update +sudo apt install clang-${LLVM_VER} llvm-${LLVM_VER}-dev libclang-${LLVM_VER}-dev cmake +``` + +**Supported Clang/LLVM versions:** 11 through 18 (tested on Ubuntu 22.04 and 24.04, +x86_64 and arm64). LLVM 11–13 are only tested on Ubuntu 22.04; Ubuntu 24.04 +(noble) does not carry those versions in its standard repositories. + +## Build commands + +```bash +mkdir build && cd build + +LLVM_VER=18 # or 11–17 + +cmake .. \ + -DLLVM_DIR=$(llvm-config-${LLVM_VER} --cmakedir) \ + -DClang_DIR=/usr/lib/llvm-${LLVM_VER}/lib/cmake/clang + +make -j$(nproc) +# → build/pagurus_plugin.so +``` + +## AST-only build (without LLVM IR pass) + +To build **without** the LLVM IR pass (omits `llvm/IR` and `llvm/Analysis` headers, +reduces runtime symbol dependencies to Clang AST symbols only): + +```bash +cmake .. -DPAGURUS_WITH_IR_PASS=OFF \ + -DLLVM_DIR=$(llvm-config-${LLVM_VER} --cmakedir) \ + -DClang_DIR=/usr/lib/llvm-${LLVM_VER}/lib/cmake/clang +make -j$(nproc) +``` + +This AST-only build still covers all E001–E019 checks and can be built with +only `libclang-dev` (no `llvm-dev` needed): + +```bash +sudo apt install clang-18 libclang-18-dev cmake +``` + +## MLIR Upgrade Path + +An optional MLIR-backed analysis tier can be enabled when `libmlir-dev` is +available. It is **not** activated by default because `libmlir-18-dev` is not +shipped in the standard `llvm-dev` package. + +### Version matching: libmlir-*-dev, LLVM, and Clang + +MLIR, LLVM, and Clang are all part of the [LLVM monorepo](https://github.com/llvm/llvm-project) +and share the **same version number** in every release. This means: + +- `libmlir-N-dev` must match the LLVM version (`llvm-N-dev`) **and** the Clang version (`clang-N`). +- There is no separate MLIR version independent of LLVM or Clang — they are always equal. + +| LLVM / Clang version | Required MLIR package | MLIR cmake dir | Ubuntu 22.04 | Ubuntu 24.04 | +|---|---|---|---|---| +| 14 | `libmlir-14-dev` | `/usr/lib/llvm-14/lib/cmake/mlir` | ✅ standard `universe` | ✅ standard `universe` | +| 15 | `libmlir-15-dev` | `/usr/lib/llvm-15/lib/cmake/mlir` | ✅ [LLVM PPA](https://apt.llvm.org) | ❌ not available | +| 16 | `libmlir-16-dev` | `/usr/lib/llvm-16/lib/cmake/mlir` | ✅ [LLVM PPA](https://apt.llvm.org) | ✅ standard `universe` | +| 17 | `libmlir-17-dev` | `/usr/lib/llvm-17/lib/cmake/mlir` | ✅ [LLVM PPA](https://apt.llvm.org) | ✅ standard `universe` | +| 18 | `libmlir-18-dev` | `/usr/lib/llvm-18/lib/cmake/mlir` | ✅ [LLVM PPA](https://apt.llvm.org) | ✅ standard `universe` | + +> **Note:** LLVM 11–13 do not ship `libmlir-N-dev` in Ubuntu repositories; the MLIR tier +> requires LLVM ≥ 14. + +### Installation and build + +Replace **`N`** with your LLVM/Clang version number (e.g. `14`, `16`, `17`, or `18`) +in every command below — `N` is a literal placeholder and cannot be used as-is in the shell: + +```bash +# Ubuntu 22.04: add LLVM PPA first for LLVM ≥ 15 (skip on Ubuntu 24.04) +# Example for N=18: replace "jammy-18" and "llvm-18.list" accordingly +wget -qO- https://apt.llvm.org/llvm-snapshot.gpg.key \ + | sudo tee /etc/apt/trusted.gpg.d/apt.llvm.org.asc >/dev/null +echo "deb http://apt.llvm.org/jammy/ llvm-toolchain-jammy-N main" \ + | sudo tee /etc/apt/sources.list.d/llvm-N.list +sudo apt-get update + +# Install — all four packages must use the same version number N +sudo apt-get install clang-N llvm-N-dev libclang-N-dev libmlir-N-dev mlir-N-tools + +cmake .. -DPAGURUS_WITH_MLIR=ON \ + -DLLVM_DIR=$(llvm-config-N --cmakedir) \ + -DClang_DIR=/usr/lib/llvm-N/lib/cmake/clang \ + -DMLIR_DIR=/usr/lib/llvm-N/lib/cmake/mlir +make -j$(nproc) +``` + +Concrete example for LLVM 18 on Ubuntu 22.04: + +```bash +wget -qO- https://apt.llvm.org/llvm-snapshot.gpg.key \ + | sudo tee /etc/apt/trusted.gpg.d/apt.llvm.org.asc >/dev/null +echo "deb http://apt.llvm.org/jammy/ llvm-toolchain-jammy-18 main" \ + | sudo tee /etc/apt/sources.list.d/llvm-18.list +sudo apt-get update +sudo apt-get install clang-18 llvm-18-dev libclang-18-dev libmlir-18-dev mlir-18-tools + +cmake .. -DPAGURUS_WITH_MLIR=ON \ + -DLLVM_DIR=$(llvm-config-18 --cmakedir) \ + -DClang_DIR=/usr/lib/llvm-18/lib/cmake/clang \ + -DMLIR_DIR=/usr/lib/llvm-18/lib/cmake/mlir +make -j$(nproc) +``` + +### Analysis tier comparison + +| Capability | AST only (`-fplugin=`) | LLVM IR pass (`-fpass-plugin=`) | MLIR (`-DPAGURUS_WITH_MLIR=ON`) | +|---|---|---|---| +| **Alias analysis** | Symbolic (variable names) | `BasicAA + TBAA + ScopedNoAlias` on raw pointer bytes | `mlir::AliasAnalysis` — dialect-type-aware; understands `memref`/`tensor` regions structurally | +| **Memory ownership model** | Implicit (tracks `malloc`/`free` calls) | Implicit (dominance of `free()` instruction) | Explicit: `memref.alloc` / `memref.dealloc` carry region/lifetime metadata; aliasing proved structurally, not heuristically | +| **Ownership transfer** | Move semantics via pragma annotation | Not tracked at IR level | `mlir::bufferization::OwnershipInterface` — flow-sensitive ownership transfer without pointer arithmetic | +| **Side-effect tracking** | Conservative (function call = opaque) | Not tracked beyond `free()` | `mlir::MemoryEffects::Effect` annotates every op as `ReadEffect` / `WriteEffect` / `AllocationEffect` / `FreeEffect` — enables precise borrow propagation through side-effect-free calls | +| **Array bounds (E011)** | Constant index only | Constant index only (GEP operands) | Affine maps in the `affine` dialect cover non-constant symbolic indices (e.g. `a[i+n]`, loop bounds) | +| **Loop-carried borrows** | ❌ (CFG not modelled) | ✅ via `MemorySSA` MemPhi nodes | ✅ via structured region semantics (no raw SSA φ needed) | +| **GEP / bitcast aliases** | ❌ | ✅ (IR-E015b / IR-E015c) | ✅ (dialect types eliminate most GEP ambiguity) | +| **Atomic races (E018)** | AST-level only (`volatile`/`_Atomic`) | ✅ `AtomicRMWInst` / `CmpXchgInst` on borrowed alloca | ✅ `mlir::MemoryEffects` classifies atomic ops uniformly | +| **Drop injection** | Source rewrite (compile mode) | ✅ IR-level at `llvm.lifetime.end` | ✅ Lower-cost: `memref.dealloc` insertion at dialect level | +| **False negatives** | Higher (bit-cast aliases invisible) | Lower (covers GEP, bitcast, loop MemPhi) | Lowest (structural type information eliminates most remaining ambiguity) | +| **Extra dependency** | None | None | `libmlir-N-dev` (same N as LLVM/Clang; not in standard `llvm-dev`) | + +In short: the **LLVM IR pass** catches patterns invisible at the AST level +(GEP, bitcast, loop-carried borrows). **MLIR** additionally eliminates the +remaining false negatives that stem from LLVM IR's flat pointer model, and +extends array-bounds checking to non-constant indices — at the cost of an +extra build dependency. + +The `#ifdef PAGURUS_WITH_MLIR` code paths are already in place inside +`src/pagurus_plugin.cpp` and activate when `-DPAGURUS_WITH_MLIR=ON` is passed +to CMake. + +## Runtime dependencies + +`pagurus_plugin.so` is designed to be loaded into an already-running `clang` or +`opt` process, so it does **not** carry `libLLVM` or `libclang-cpp` in its ELF +`NEEDED` list. All LLVM and Clang symbols are resolved at `dlopen`-time from the +host executable. + +| Library | Version floor | Why | +|---|---|---| +| `libstdc++.so.6` | GLIBCXX ≥ 3.4.29 (GCC 11) | C++ standard library | +| `libgcc_s.so.1` | any | GCC exception-handling support | +| `libc.so.6` | glibc ≥ 2.32 | C library (`__libc_single_threaded`) | +| `clang-N` (host) | same major version as the build | ~90 LLVM/Clang symbols resolved from the host process at load time | + +**Ubuntu summary:** +- Ubuntu 22.04 (glibc 2.35, GCC 12 → GLIBCXX 3.4.30): ✅ +- Ubuntu 24.04 (glibc 2.39, GCC 14 → GLIBCXX 3.4.33): ✅ + +The `clang-N` package (without any `-dev` suffix) is the **only runtime +requirement** — dev packages (`llvm-N-dev`, `libclang-N-dev`) are needed only +at **build time**. + +> **ABI constraint:** the plugin must be loaded into the same Clang major version +> it was compiled against. A plugin built with `-DLLVM_VER=18` will not load +> correctly under `clang-17` or `clang-14`. + +Full dependency inspection commands: +```bash +# Direct ELF dependencies: +readelf -d pagurus_plugin.so | grep NEEDED + +# All symbols resolved from the host clang at load time: +nm -D pagurus_plugin.so | grep ' U ' +``` diff --git a/INTEGRATION.md b/INTEGRATION.md new file mode 100644 index 0000000..77a831c --- /dev/null +++ b/INTEGRATION.md @@ -0,0 +1,137 @@ +# Multi-file Project Integration + +## Overview + +Pagurus analyses one translation unit per `clang` invocation. All +intra-file checks (E001–E021) and inter-procedural summaries that are +visible through headers work without any extra setup. Two tools make +it straightforward to run the checker over an entire Makefile-based +project. + +## `pagurus-check` — standalone script + +`pagurus-check` (located at the root of this repository) runs the +plugin on many source files in a single command and aggregates the +results: + +```bash +# Check two files; include/ is on the search path. +./pagurus-check --plugin=./build/pagurus_plugin.so \ + --cflags="-Iinclude" \ + src/main.c src/widget.c + +# Scan every .c file under src/ with 4 parallel jobs. +./pagurus-check --plugin=./build/pagurus_plugin.so \ + --cflags="-Iinclude -DNDEBUG" \ + --jobs=4 --dir=src + +# Use a compilation database generated by `bear make`. +# Per-file include paths and defines are extracted automatically. +bear make +./pagurus-check --plugin=./build/pagurus_plugin.so \ + --compile-db=compile_commands.json + +# Dry-run: report all diagnostics without writing .pagurus.c files. +./pagurus-check --plugin=./build/pagurus_plugin.so \ + --dry-run --dir=src +``` + +`pagurus-check` exits with code **0** if all files pass (no E0xx +diagnostics) and **1** if any file has errors. + +### Option reference + +``` +Usage: pagurus-check [OPTIONS] [FILE...] + pagurus-check [OPTIONS] --dir=DIR + pagurus-check [OPTIONS] --compile-db=compile_commands.json + + -p PATH, --plugin=PATH pagurus_plugin.so [./build/pagurus_plugin.so] + -C CMD, --clang=CMD Clang executable [clang] + -f FLAGS,--cflags=FLAGS Extra clang flags (e.g. "-DFOO -Iinclude") + -d DIR, --dir=DIR Scan DIR recursively for *.c files + -b FILE, --compile-db=FILE JSON compilation database + -j N, --jobs=N Parallel jobs [1] + --dry-run Dry-run: report but don't write .pagurus.c + --ir-pass Enable LLVM IR analysis (-fpass-plugin=) + -h, --help Show help +``` + +Environment variables `PAGURUS_PLUGIN` and `PAGURUS_CLANG` provide +defaults for `--plugin` and `--clang`. + +## `pagurus.mk` — Makefile include + +Drop `pagurus.mk` (also at the repository root) into an existing +Makefile project: + +```makefile +# myproject/Makefile +CC = clang +CFLAGS = -Wall -Iinclude +SOURCES = src/main.c src/widget.c src/util.c + +# Pagurus integration: define PAGURUS_PLUGIN before the include. +PAGURUS_PLUGIN = /path/to/build/pagurus_plugin.so +include /path/to/pagurus.mk +``` + +### Available targets + +```bash +make pagurus-check # compile mode — borrow-check all PAGURUS_SOURCES +make pagurus-dry-run # dry-run mode — inspect only, no .pagurus.c written +make pagurus-clean # remove *.pagurus.c artefacts +``` + +### Configurable variables + +Set these variables before the `include` line: + +| Variable | Default | Description | +|---|---|---| +| `PAGURUS_PLUGIN` | `/build/pagurus_plugin.so` | Plugin path | +| `PAGURUS_CLANG` | `clang` | Clang executable | +| `PAGURUS_SOURCES` | `$(SOURCES)` if defined, else `*.c` | Files to check | +| `PAGURUS_CFLAGS` | `$(CFLAGS)` | Extra flags passed to clang | +| `PAGURUS_JOBS` | `1` | Parallel jobs | +| `PAGURUS_IR_PASS` | `0` | Set to `1` to enable `-fpass-plugin=` | + +## Generating a compilation database from a Makefile + +When source files are compiled with different flags (include paths, +defines, or per-file options), the most accurate approach is to +generate a [compilation database](https://clang.llvm.org/docs/JSONCompilationDatabase.html) +and pass it to `pagurus-check`: + +```bash +# Install bear (Ubuntu) +sudo apt install bear + +# Capture compile commands from a regular make run +bear -- make + +# Run pagurus on every file with the exact flags used during compilation +./pagurus-check --plugin=./build/pagurus_plugin.so \ + --compile-db=compile_commands.json +``` + +`bear` works with any build system that invokes a C compiler, including +hand-written Makefiles, autotools, and meson. + +## Scope of per-TU analysis + +| Check | Works within one TU | Notes | +|---|---|---| +| E001–E018 (AST checks) | ✅ | Full NLL + CFG within the TU | +| Inter-procedural summaries | ✅ when callee definition is visible | Header-only inlines or same-TU functions get full summaries; opaque `extern` calls get conservative assumptions | +| E019–E021 (drop semantics) | ✅ | Drop annotations propagate via headers | +| IR-E001/E002/E015/E018 | ✅ (requires `-fpass-plugin=`) | One IR module per `clang -c` invocation | + +## Best practices + +1. **Use compilation databases** for projects with per-file build flags +2. **Enable parallel jobs** (`--jobs=N` or `PAGURUS_JOBS=N`) for faster checking +3. **Run in CI/CD** — `pagurus-check` exit code integrates with most CI systems +4. **Start with dry-run** to understand required changes before transforming source +5. **Propagate drop annotations via headers** for consistent ownership semantics across TUs diff --git a/README.md b/README.md index 0584c24..d83f5c7 100644 --- a/README.md +++ b/README.md @@ -2,19 +2,23 @@ -A Clang/LLVM plugin that implements Rust-style borrow checking for C, -operating at two levels simultaneously: +A Clang/LLVM plugin that implements Rust-style borrow checking for C, catching memory safety bugs at compile time. -| Entry point | Load flag | Analysis | -|---|---|---| -| Clang AST plugin | `-fplugin=./pagurus_plugin.so` | RecursiveASTVisitor, precise column diagnostics | -| LLVM IR pass | `-fpass-plugin=./pagurus_plugin.so` | AliasAnalysis + MemorySSA + DominatorTree | +## What does it check? + +pagurus performs comprehensive memory safety analysis covering: ---- +- **Memory safety:** use-after-free, double-free, null-deref, memory leaks +- **Borrow conflicts:** double mutable borrow, simultaneous mutable and shared borrow +- **Lifetime violations:** returning pointers to local variables, use-while-borrowed +- **Array bounds:** constant index out of bounds +- **Dangerous functions:** unsafe functions like `gets`, `strcpy`, format string vulnerabilities +- **Move semantics:** use-after-move for resource-owning types +- **Drop semantics:** missing destructor calls, manual memory management -## Checks +### Complete check reference -### AST level (`-fplugin=`) +#### AST level (`-fplugin=`) | Rule | Name | Description | |---|---|---| @@ -33,11 +37,11 @@ operating at two levels simultaneously: | E016 | `use-while-mut-borrowed` | Read of a mutably-borrowed variable | | E017 | `two-phase-conflict` | `&x` and `x` in the same call (C §6.5.2.2p10) | | E018 | `volatile-borrow` / `atomic-borrow` | Mutable borrow of `volatile` or `_Atomic` variable | -| E019 | `use-after-move` | Drop-annotated struct used after its value was moved to another variable (error) | -| E020 | `missing-drop` | Drop-annotated struct or typedef goes out of scope without its drop function being called; **error** without `-fpass-plugin=`, **warning** with `-fpass-plugin=` (IR pass auto-injects); drop handler must return `void` and take one parameter of type `T` | -| E021 | `drop-necessary` | Struct or typedef has pointer fields but no `#pragma pagurus drop(T)` annotation; **error** at scope exit when any pointer field was NOT freed — only registering a drop handler or freeing ALL pointer fields before scope exit is accepted | +| E019 | `use-after-move` | Drop-annotated struct used after its value was moved | +| E020 | `missing-drop` | Drop-annotated struct goes out of scope without drop function call | +| E021 | `drop-necessary` | Struct with pointer fields but no drop annotation | -### LLVM IR level (`-fpass-plugin=`) +#### LLVM IR level (`-fpass-plugin=`) | Rule | Name | Analysis method | |---|---|---| @@ -48,201 +52,27 @@ operating at two levels simultaneously: | IR-E015b | `borrow-conflict (GEP)` | GEP pointer arithmetic alias | | IR-E015c | `borrow-conflict (bitcast)` | Union type-pun / bitcast alias | | IR-E015 (loop) | `loop-carried borrow` | `MemorySSA` MemPhi nodes across back-edges | -| IR-E018 | `atomic-borrow-conflict` | `AtomicRMWInst`/`AtomicCmpXchgInst` on a borrowed alloca | -| IR-Drop | `drop-injection` | Automatically inserts drop-function calls at scope-exit for structs annotated with `#pragma pagurus drop(T)`; double-drop prevented via DominatorTree | - -IR checks catch patterns invisible at the AST level: bit-cast / type-pun -aliases, GEP pointer arithmetic, loop-carried borrows (SSA φ-nodes), and -atomic instruction races on borrowed memory. +| IR-E018 | `atomic-borrow-conflict` | `AtomicRMWInst`/`AtomicCmpXchgInst` on borrowed alloca | +| IR-Drop | `drop-injection` | Automatically inserts drop calls at scope-exit | ---- +**IR checks catch patterns invisible at AST level:** bit-cast/type-pun aliases, GEP pointer arithmetic, loop-carried borrows (SSA φ-nodes), and atomic instruction races on borrowed memory. -## Build +## Quick start -### Prerequisites - -**Ubuntu 24.04** — LLVM 14–18 are in the standard `universe` repository: +### Build ```bash -# Replace 18 with 14, 15, 16, or 17 for older LLVM versions. +# Ubuntu 24.04 sudo apt install clang-18 llvm-18-dev libclang-18-dev cmake -``` - -**Ubuntu 22.04** — LLVM 11–14 are in the standard `universe` repository; LLVM 15–18 -require the [official LLVM apt repository](https://apt.llvm.org/): - -```bash -LLVM_VER=18 # 11–14 need no PPA; 15–18 require the lines below -wget -qO- https://apt.llvm.org/llvm-snapshot.gpg.key \ - | sudo tee /etc/apt/trusted.gpg.d/apt.llvm.org.asc >/dev/null -echo "deb http://apt.llvm.org/jammy/ llvm-toolchain-jammy-${LLVM_VER} main" \ - | sudo tee /etc/apt/sources.list.d/llvm-${LLVM_VER}.list -sudo apt-get update -sudo apt install clang-${LLVM_VER} llvm-${LLVM_VER}-dev libclang-${LLVM_VER}-dev cmake -``` - -Supported Clang/LLVM versions: **11 through 18** (tested on Ubuntu 22.04 and 24.04, -x86_64 and arm64). LLVM 11–13 are only tested on Ubuntu 22.04; Ubuntu 24.04 -(noble) does not carry those versions in its standard repositories. -### Build commands - -```bash mkdir build && cd build - -LLVM_VER=18 # or 11–17 - cmake .. \ - -DLLVM_DIR=$(llvm-config-${LLVM_VER} --cmakedir) \ - -DClang_DIR=/usr/lib/llvm-${LLVM_VER}/lib/cmake/clang - -make -j$(nproc) -# → build/pagurus_plugin.so -``` - -To build **without** the LLVM IR pass (omits `llvm/IR` and `llvm/Analysis` headers, -reduces runtime symbol dependencies to Clang AST symbols only): - -```bash -cmake .. -DPAGURUS_WITH_IR_PASS=OFF \ - -DLLVM_DIR=$(llvm-config-${LLVM_VER} --cmakedir) \ - -DClang_DIR=/usr/lib/llvm-${LLVM_VER}/lib/cmake/clang -make -j$(nproc) -``` - -This AST-only build still covers all E001–E019 checks and can be built with -only `libclang-dev` (no `llvm-dev` needed): - -```bash -sudo apt install clang-18 libclang-18-dev cmake -``` - -### MLIR Upgrade Path - -An optional MLIR-backed analysis tier can be enabled when `libmlir-dev` is -available. It is **not** activated by default because `libmlir-18-dev` is not -shipped in the standard `llvm-dev` package. - -#### Version matching: libmlir-*-dev, LLVM, and Clang - -MLIR, LLVM, and Clang are all part of the [LLVM monorepo](https://github.com/llvm/llvm-project) -and share the **same version number** in every release. This means: - -- `libmlir-N-dev` must match the LLVM version (`llvm-N-dev`) **and** the Clang version (`clang-N`). -- There is no separate MLIR version independent of LLVM or Clang — they are always equal. - -| LLVM / Clang version | Required MLIR package | MLIR cmake dir | Ubuntu 22.04 | Ubuntu 24.04 | -|---|---|---|---|---| -| 14 | `libmlir-14-dev` | `/usr/lib/llvm-14/lib/cmake/mlir` | ✅ standard `universe` | ✅ standard `universe` | -| 15 | `libmlir-15-dev` | `/usr/lib/llvm-15/lib/cmake/mlir` | ✅ [LLVM PPA](https://apt.llvm.org) | ❌ not available | -| 16 | `libmlir-16-dev` | `/usr/lib/llvm-16/lib/cmake/mlir` | ✅ [LLVM PPA](https://apt.llvm.org) | ✅ standard `universe` | -| 17 | `libmlir-17-dev` | `/usr/lib/llvm-17/lib/cmake/mlir` | ✅ [LLVM PPA](https://apt.llvm.org) | ✅ standard `universe` | -| 18 | `libmlir-18-dev` | `/usr/lib/llvm-18/lib/cmake/mlir` | ✅ [LLVM PPA](https://apt.llvm.org) | ✅ standard `universe` | - -> **Note:** LLVM 11–13 do not ship `libmlir-N-dev` in Ubuntu repositories; the MLIR tier -> requires LLVM ≥ 14. - -#### Installation and build - -Replace **`N`** with your LLVM/Clang version number (e.g. `14`, `16`, `17`, or `18`) -in every command below — `N` is a literal placeholder and cannot be used as-is in the shell: - -```bash -# Ubuntu 22.04: add LLVM PPA first for LLVM ≥ 15 (skip on Ubuntu 24.04) -# Example for N=18: replace "jammy-18" and "llvm-18.list" accordingly -wget -qO- https://apt.llvm.org/llvm-snapshot.gpg.key \ - | sudo tee /etc/apt/trusted.gpg.d/apt.llvm.org.asc >/dev/null -echo "deb http://apt.llvm.org/jammy/ llvm-toolchain-jammy-N main" \ - | sudo tee /etc/apt/sources.list.d/llvm-N.list -sudo apt-get update - -# Install — all four packages must use the same version number N -sudo apt-get install clang-N llvm-N-dev libclang-N-dev libmlir-N-dev mlir-N-tools - -cmake .. -DPAGURUS_WITH_MLIR=ON \ - -DLLVM_DIR=$(llvm-config-N --cmakedir) \ - -DClang_DIR=/usr/lib/llvm-N/lib/cmake/clang \ - -DMLIR_DIR=/usr/lib/llvm-N/lib/cmake/mlir + -DLLVM_DIR=$(llvm-config-18 --cmakedir) \ + -DClang_DIR=/usr/lib/llvm-18/lib/cmake/clang make -j$(nproc) ``` -Concrete example for LLVM 18 on Ubuntu 22.04: - -```bash -wget -qO- https://apt.llvm.org/llvm-snapshot.gpg.key \ - | sudo tee /etc/apt/trusted.gpg.d/apt.llvm.org.asc >/dev/null -echo "deb http://apt.llvm.org/jammy/ llvm-toolchain-jammy-18 main" \ - | sudo tee /etc/apt/sources.list.d/llvm-18.list -sudo apt-get update -sudo apt-get install clang-18 llvm-18-dev libclang-18-dev libmlir-18-dev mlir-18-tools - -cmake .. -DPAGURUS_WITH_MLIR=ON \ - -DLLVM_DIR=$(llvm-config-18 --cmakedir) \ - -DClang_DIR=/usr/lib/llvm-18/lib/cmake/clang \ - -DMLIR_DIR=/usr/lib/llvm-18/lib/cmake/mlir -make -j$(nproc) -``` - -#### Analysis tier comparison - -| Capability | AST only (`-fplugin=`) | LLVM IR pass (`-fpass-plugin=`) | MLIR (`-DPAGURUS_WITH_MLIR=ON`) | -|---|---|---|---| -| **Alias analysis** | Symbolic (variable names) | `BasicAA + TBAA + ScopedNoAlias` on raw pointer bytes | `mlir::AliasAnalysis` — dialect-type-aware; understands `memref`/`tensor` regions structurally | -| **Memory ownership model** | Implicit (tracks `malloc`/`free` calls) | Implicit (dominance of `free()` instruction) | Explicit: `memref.alloc` / `memref.dealloc` carry region/lifetime metadata; aliasing proved structurally, not heuristically | -| **Ownership transfer** | Move semantics via pragma annotation | Not tracked at IR level | `mlir::bufferization::OwnershipInterface` — flow-sensitive ownership transfer without pointer arithmetic | -| **Side-effect tracking** | Conservative (function call = opaque) | Not tracked beyond `free()` | `mlir::MemoryEffects::Effect` annotates every op as `ReadEffect` / `WriteEffect` / `AllocationEffect` / `FreeEffect` — enables precise borrow propagation through side-effect-free calls | -| **Array bounds (E011)** | Constant index only | Constant index only (GEP operands) | Affine maps in the `affine` dialect cover non-constant symbolic indices (e.g. `a[i+n]`, loop bounds) | -| **Loop-carried borrows** | ❌ (CFG not modelled) | ✅ via `MemorySSA` MemPhi nodes | ✅ via structured region semantics (no raw SSA φ needed) | -| **GEP / bitcast aliases** | ❌ | ✅ (IR-E015b / IR-E015c) | ✅ (dialect types eliminate most GEP ambiguity) | -| **Atomic races (E018)** | AST-level only (`volatile`/`_Atomic`) | ✅ `AtomicRMWInst` / `CmpXchgInst` on borrowed alloca | ✅ `mlir::MemoryEffects` classifies atomic ops uniformly | -| **Drop injection** | Source rewrite (compile mode) | ✅ IR-level at `llvm.lifetime.end` | ✅ Lower-cost: `memref.dealloc` insertion at dialect level | -| **False negatives** | Higher (bit-cast aliases invisible) | Lower (covers GEP, bitcast, loop MemPhi) | Lowest (structural type information eliminates most remaining ambiguity) | -| **Extra dependency** | None | None | `libmlir-N-dev` (same N as LLVM/Clang; not in standard `llvm-dev`) | - -In short: the **LLVM IR pass** catches patterns invisible at the AST level -(GEP, bitcast, loop-carried borrows). **MLIR** additionally eliminates the -remaining false negatives that stem from LLVM IR's flat pointer model, and -extends array-bounds checking to non-constant indices — at the cost of an -extra build dependency. - -The `#ifdef PAGURUS_WITH_MLIR` code paths are already in place inside -`src/pagurus_plugin.cpp` and activate when `-DPAGURUS_WITH_MLIR=ON` is passed -to CMake. - -### Runtime dependencies - -`pagurus_plugin.so` is designed to be loaded into an already-running `clang` or -`opt` process, so it does **not** carry `libLLVM` or `libclang-cpp` in its ELF -`NEEDED` list. All LLVM and Clang symbols are resolved at `dlopen`-time from the -host executable. - -| Library | Version floor | Why | -|---|---|---| -| `libstdc++.so.6` | GLIBCXX ≥ 3.4.29 (GCC 11) | C++ standard library | -| `libgcc_s.so.1` | any | GCC exception-handling support | -| `libc.so.6` | glibc ≥ 2.32 | C library (`__libc_single_threaded`) | -| `clang-N` (host) | same major version as the build | ~90 LLVM/Clang symbols resolved from the host process at load time | - -**Ubuntu summary:** -- Ubuntu 22.04 (glibc 2.35, GCC 12 → GLIBCXX 3.4.30): ✅ -- Ubuntu 24.04 (glibc 2.39, GCC 14 → GLIBCXX 3.4.33): ✅ - -The `clang-N` package (without any `-dev` suffix) is the **only runtime -requirement** — dev packages (`llvm-N-dev`, `libclang-N-dev`) are needed only -at **build time**. - -> **ABI constraint:** the plugin must be loaded into the same Clang major version -> it was compiled against. A plugin built with `-DLLVM_VER=18` will not load -> correctly under `clang-17` or `clang-14`. - -Full dependency inspection commands: -```bash -# Direct ELF dependencies: -readelf -d pagurus_plugin.so | grep NEEDED - -# All symbols resolved from the host clang at load time: -nm -D pagurus_plugin.so | grep ' U ' -``` +See [BUILDING.md](BUILDING.md) for detailed build instructions, MLIR integration, and runtime dependencies. ### Usage @@ -258,28 +88,22 @@ clang -fplugin=./build/pagurus_plugin.so \ #### Compile mode (default) -Every invocation of `-fplugin=` runs in **compile mode** by default: the -plugin rewrites the source to produce `.pagurus.c` that: +Every invocation runs in **compile mode** by default: the plugin rewrites the source to produce `.pagurus.c` that: -1. Has every `#pragma pagurus …` line removed (plain C, no pagurus-isms). +1. Has every `#pragma pagurus …` line removed (plain C, no pagurus-isms) 2. Has drop-function calls injected at scope-exit / early-return points - where they were missing (E020 is suppressed because the transformation - fixes those automatically). -All other errors (E001–E019, E021) still fire and cause build failure. +All other errors (E001–E019, E021) still fire and cause build failure. E020 is suppressed because the transformation fixes it automatically. ```bash # Writes your_file.pagurus.c — pagurus annotations stripped, -# missing drop calls injected. E020 is not reported (fixed by injection). +# missing drop calls injected. E020 is not reported (fixed by injection). clang -fplugin=./build/pagurus_plugin.so -c your_file.c ``` #### Dry-run mode -Pass `dry-run` as a plugin argument to enable **dry-run / inspector mode**. -All diagnostics (E001–E021, including E020) are reported, a textual report of -what changes *would* be made is printed to stderr, and **no output file is -written**. +Pass `dry-run` as a plugin argument to enable **inspector mode**. All diagnostics (E001–E021, including E020) are reported, a textual report of what changes *would* be made is printed to stderr, and **no output file is written**. ```bash clang -fplugin=./build/pagurus_plugin.so \ @@ -296,449 +120,79 @@ pagurus [dry-run]: changes that would be applied: warning: E020[missing-drop]: `b` goes out of scope without calling `buf_free` ``` -| Mode | Flag | E020 | Output file | -|---|---|---|---| -| Compile (default) | *(none)* | suppressed (auto-injected) | `.pagurus.c` written | -| Dry-run | `-Xclang -plugin-arg-pagurus -Xclang dry-run` | reported | none | +## Using `#pragma pagurus` annotations -### `#pragma pagurus` annotations +pagurus supports pragma-based annotations for lifetime constraints and drop functions: ```c -// 1. Explicitly release a borrow (mirrors Rust `drop(borrow)`): +// Explicitly release a borrow (mirrors Rust `drop(borrow)`): #pragma pagurus borrow_end(p) -// 2. Concise lifetime annotation (preferred) — placed immediately before a -// function declaration. -// -// Space-separated: each token is a simple pointer parameter lifetime. -// Both params constrain the return (lifetime 'a for all): -#pragma pagurus 'a 'a -> 'a -char *func1(char *a, char *b); - -// Only the first param constrains the return (lifetime 'a): +// Function lifetime annotation — return tied to first parameter: #pragma pagurus 'a 'b -> 'a -int *get_x(int *x, int *y); - -// Comma-grouped: 'b,'a is a struct parameter instantiated with lifetimes -// 'b (1st) and 'a (2nd) from the struct's declaration. The checker -// resolves the return field from the struct pragma annotations (see -// form 4). E003 fires if the struct exits scope while result is live. -#pragma pagurus 'b,'a -> 'b -char *get_data(struct Buf buf); - -// 3. Legacy syntax (backward compatible): -#pragma pagurus lifetime(get_ptr, 0) - -// 4. Struct / field lifetime annotations — declare the struct's lifetime -// parameters and each field's lifetime. Validated by the checker: -// * field with undeclared lifetime → error "unknown lifetime 'e' ..." -// * declared param with no field → error "declared but not bound 'c' ..." -#pragma pagurus 'a 'b -struct Buf { - #pragma pagurus 'a - const char *data; /* lifetime 'a */ - #pragma pagurus 'b - char *scratch; /* lifetime 'b */ -}; +int *get_x(int *x, int *y); -// 5. Drop function annotation — designates the next function declaration as -// the required destructor for a C struct or typedef type. Any local -// variable of that type that goes out of scope without the drop function -// being called is reported as E020[missing-drop]. When -fpass-plugin= is -// also active, the drop call is injected automatically into the binary. -// -// TypeName may be a struct tag name OR a typedef name. -// The drop handler must: (a) return void, and (b) take exactly one parameter -// of type T. An invalid signature is reported as an error and the -// annotation is not registered. -// -// Named struct (tag name): +// Drop function annotation — required destructor: #pragma pagurus drop(StrStruct) void str_free(struct StrStruct s); -// Or using a typedef name (equivalent): -// typedef struct StrStruct { ... } StrType; -// #pragma pagurus drop(StrType) -// void str_type_free(StrType s); -``` - -### Lifetime elision (rule 1 — extended) - -For function **declarations** with exactly one "significant" parameter and a -pointer return type, the checker automatically infers the return constraint -without requiring an explicit `#pragma pagurus` annotation. A parameter is -"significant" when it is: - -* a pointer parameter (`T *`), **or** -* a struct parameter passed by value where the struct's `#pragma pagurus` has - exactly one declared lifetime parameter. - -When an explicit annotation is present but redundant (matches what would be -inferred), a warning is emitted: -`explicit lifetime annotation for 'fn' can be elided`. - -```c -// Warning: "can be elided" — single pointer param, rule 1. -#pragma pagurus 'a -> 'a -char *dup(char *s); - -// No annotation needed — automatically inferred: -char *first_char(char *s); - -// Warning: struct Buf has exactly one lifetime → annotation can be elided. -#pragma pagurus 'b -struct Buf { #pragma pagurus 'b char *scratch; }; - -#pragma pagurus 'a -> 'a // Warning: can be elided -char *get_data(struct Buf buf); -``` - -The single-lifetime struct annotation itself also warns: -`explicit lifetime declaration for struct 'Buf' can be elided`. - -### E018: volatile-borrow / atomic-borrow - -Taking a mutable (`T *`) borrow of a `volatile` or `_Atomic` variable is -unsafe: the compiler may reorder, cache, or eliminate accesses through the -non-volatile pointer, silently defeating the `volatile`/`_Atomic` contract. -The checker reports E018 for any such mutable borrow: - -```c -// E018: mutable borrow of volatile variable -void volatile_borrow(void) { - volatile int x = 10; - int *p = &x; // error: E018[volatile-borrow] - *p = 20; -} - -// E018: mutable borrow of _Atomic variable -void atomic_borrow(void) { - _Atomic int a = 0; - int *p = &a; // error: E018[atomic-borrow] - *p = 1; -} - -// OK: a read-only (const) borrow of volatile preserves the contract -void const_borrow_of_volatile(void) { - volatile int x = 10; - const int *p = &x; // fine: shared (read-only) borrow - (void)*p; -} -``` - -### E019: use-after-move - -Move semantics apply **only** to structs annotated with `#pragma pagurus drop(T)`. -When such a struct is assigned to another variable (or passed by value to a -non-drop function), ownership is transferred — the source variable becomes -`Moved` and any subsequent read triggers E019. Structs without a drop -annotation are **copy-able** and can be freely assigned/passed without -transferring ownership. - -```c -#pragma pagurus drop(StrStruct) -void str_free(struct StrStruct s); - -void example(void) { - struct StrStruct a = str_with_cap(64); - struct StrStruct b = a; // ownership moves to b — a is now Moved - str_free(a); // error: E019[use-after-move] - str_free(b); // OK -} - -// Copy-able struct (no pointer fields) — no move semantics: -struct Point { int x; int y; }; -void copy_example(void) { - struct Point p1 = { 1, 2 }; - struct Point p2 = p1; // OK: copy, not move - (void)p1; // OK: p1 is still valid -} -``` - -Calling the drop function itself also consumes the struct (marking it `Moved`), -so a double-drop is caught as E019: - -```c -str_free(s); // first call: s is consumed -str_free(s); // error: E019[use-after-move] — s already dropped -``` - -### E020: missing-drop - -When a local variable of a drop-annotated type (struct tag or typedef) goes out -of scope without its drop function being called, the severity depends on whether -the IR pass is active: - -| Plugin flags | E020 severity | Effect | -|---|---|---| -| `-fplugin=` only | **error** | No automatic injection; the drop call must be written explicitly in source | -| `-fplugin= -fpass-plugin=` | **warning** | IR pass (`PagurusDropPass`) auto-injects the drop call into the binary | - -The `#pragma pagurus drop(T)` annotation accepts both **struct tag names** and -**typedef names** for `T`. The drop handler must return `void` and take exactly -one parameter of type `T`; an invalid signature is reported as an error and the -annotation is not registered. - -```c -// Using a struct tag: -#pragma pagurus drop(StrStruct) -void str_free(struct StrStruct s); // OK: void return, one StrStruct param - -// Using a typedef name: -typedef struct { size_t len; char *str; } StrType; -#pragma pagurus drop(StrType) -void str_type_free(StrType s); // OK: void return, one StrType param - -// Invalid signatures — rejected with an error: -// #pragma pagurus drop(StrType) -// int bad_return(StrType s); // error: must return void -// -// #pragma pagurus drop(StrType) -// void bad_params(StrType s, int n);// error: must take exactly one parameter -// -// #pragma pagurus drop(StrType) -// void bad_type(int n); // error: parameter must be of type StrType - -void bad(void) { - struct StrStruct s = str_with_cap(64); -} -// Without -fpass-plugin=: error: E020[missing-drop]: `s` goes out of scope -// without calling `str_free` — call `str_free` explicitly, or use -// -fpass-plugin=pagurus_plugin.so for automatic injection -// -// With -fpass-plugin=: warning: E020[missing-drop]: `s` goes out of scope -// without calling `str_free` — drop will be injected automatically by the IR pass -``` - -`PagurusDropPass` uses `llvm.lifetime.end` intrinsics (scope-accurate) and -`DominatorTree` analysis for double-drop prevention: if you already call the -drop function yourself, no second call is injected. - -### E021: drop-necessary - -Any user-defined struct with at least one **raw non-const pointer field** (`T *`, -`char *`, etc.) is considered "drop-necessary" because it likely owns heap memory. -E021 is **always an error**. A drop-necessary struct going out of scope is accepted -only in one of two ways: - -| Acceptable outcome | Method | -|---|---| -| Registered drop handler | `#pragma pagurus drop(T)` before the function declaration | -| Manual field cleanup | Every pointer field explicitly `free()`d before scope exit | - -If **any** pointer field was not freed by scope exit → E021 error. - -```c -struct Buffer { // drop-necessary: has `char *data` but no drop annotation - char *data; - size_t len; +// Struct lifetime parameters: +#pragma pagurus 'a 'b +struct Buf { + #pragma pagurus 'a + const char *data; + #pragma pagurus 'b + char *scratch; }; - -// ERROR: data is never freed — memory leak -void error_example(void) { - struct Buffer b = { (char *)malloc(64), 64 }; - // error: E021[drop-necessary]: `b` (type `Buffer`) goes out of scope with - // unfreed pointer field(s): `data` — register `#pragma pagurus drop(Buffer)` - // or free every pointer field -} - -// OK: all pointer fields freed before scope exit -void ok_manual(void) { - struct Buffer b = { (char *)malloc(64), 64 }; - free(b.data); // every pointer field freed — no E021 -} ``` -To silence E021 permanently, register a drop function: - -```c -#pragma pagurus drop(Buffer) -void buffer_free(struct Buffer b); -``` - -Structs with **only value-type fields** (integers, floats, nested copy-able -structs) are automatically classified as **copy-able** and never receive E021. - ---- +See [ANNOTATIONS.md](ANNOTATIONS.md) for the complete reference. -## Multi-file project integration +## Multi-file projects -Pagurus analyses one translation unit per `clang` invocation. All -intra-file checks (E001–E021) and inter-procedural summaries that are -visible through headers work without any extra setup. Two tools make -it straightforward to run the checker over an entire Makefile-based -project. - -### `pagurus-check` — standalone script - -`pagurus-check` (located at the root of this repository) runs the -plugin on many source files in a single command and aggregates the -results: +Run pagurus across an entire codebase using the included tools: ```bash -# Check two files; include/ is on the search path. +# Check all files under src/ with 4 parallel jobs ./pagurus-check --plugin=./build/pagurus_plugin.so \ --cflags="-Iinclude" \ - src/main.c src/widget.c - -# Scan every .c file under src/ with 4 parallel jobs. -./pagurus-check --plugin=./build/pagurus_plugin.so \ - --cflags="-Iinclude -DNDEBUG" \ --jobs=4 --dir=src -# Use a compilation database generated by `bear make`. -# Per-file include paths and defines are extracted automatically. +# Or use a compilation database bear make ./pagurus-check --plugin=./build/pagurus_plugin.so \ --compile-db=compile_commands.json - -# Dry-run: report all diagnostics without writing .pagurus.c files. -./pagurus-check --plugin=./build/pagurus_plugin.so \ - --dry-run --dir=src -``` - -`pagurus-check` exits with code **0** if all files pass (no E0xx -diagnostics) and **1** if any file has errors. - -Full option reference: - -``` -Usage: pagurus-check [OPTIONS] [FILE...] - pagurus-check [OPTIONS] --dir=DIR - pagurus-check [OPTIONS] --compile-db=compile_commands.json - - -p PATH, --plugin=PATH pagurus_plugin.so [./build/pagurus_plugin.so] - -C CMD, --clang=CMD Clang executable [clang] - -f FLAGS,--cflags=FLAGS Extra clang flags (e.g. "-DFOO -Iinclude") - -d DIR, --dir=DIR Scan DIR recursively for *.c files - -b FILE, --compile-db=FILE JSON compilation database - -j N, --jobs=N Parallel jobs [1] - --dry-run Dry-run: report but don't write .pagurus.c - --ir-pass Enable LLVM IR analysis (-fpass-plugin=) - -h, --help Show help ``` -Environment variables `PAGURUS_PLUGIN` and `PAGURUS_CLANG` provide -defaults for `--plugin` and `--clang`. - -### `pagurus.mk` — Makefile include - -Drop `pagurus.mk` (also at the repository root) into an existing -Makefile project: +Integrate into Makefiles: ```makefile # myproject/Makefile -CC = clang -CFLAGS = -Wall -Iinclude -SOURCES = src/main.c src/widget.c src/util.c - -# Pagurus integration: define PAGURUS_PLUGIN before the include. PAGURUS_PLUGIN = /path/to/build/pagurus_plugin.so include /path/to/pagurus.mk -``` - -Available targets: - -```bash -make pagurus-check # compile mode — borrow-check all PAGURUS_SOURCES -make pagurus-dry-run # dry-run mode — inspect only, no .pagurus.c written -make pagurus-clean # remove *.pagurus.c artefacts -``` - -Configurable variables (set them before the `include` line): - -| Variable | Default | Description | -|---|---|---| -| `PAGURUS_PLUGIN` | `/build/pagurus_plugin.so` | Plugin path | -| `PAGURUS_CLANG` | `clang` | Clang executable | -| `PAGURUS_SOURCES` | `$(SOURCES)` if defined, else `*.c` | Files to check | -| `PAGURUS_CFLAGS` | `$(CFLAGS)` | Extra flags passed to clang | -| `PAGURUS_JOBS` | `1` | Parallel jobs | -| `PAGURUS_IR_PASS` | `0` | Set to `1` to enable `-fpass-plugin=` | - -### Generating a compilation database from a Makefile - -When source files are compiled with different flags (include paths, -defines, or per-file options), the most accurate approach is to -generate a [compilation database](https://clang.llvm.org/docs/JSONCompilationDatabase.html) -and pass it to `pagurus-check`: - -```bash -# Install bear (Ubuntu) -sudo apt install bear - -# Capture compile commands from a regular make run -bear -- make -# Run pagurus on every file with the exact flags used during compilation -./pagurus-check --plugin=./build/pagurus_plugin.so \ - --compile-db=compile_commands.json +# Then run: +# make pagurus-check ``` -`bear` works with any build system that invokes a C compiler, including -hand-written Makefiles, autotools, and meson. +See [INTEGRATION.md](INTEGRATION.md) for complete integration guide. -### Scope of per-TU analysis +## Key features -| Check | Works within one TU | Notes | -|---|---|---| -| E001–E018 (AST checks) | ✅ | Full NLL + CFG within the TU | -| Inter-procedural summaries | ✅ when callee definition is visible | Header-only inlines or same-TU functions get full summaries; opaque `extern` calls get conservative assumptions | -| E019–E021 (drop semantics) | ✅ | Drop annotations propagate via headers | -| IR-E001/E002/E015/E018 | ✅ (requires `-fpass-plugin=`) | One IR module per `clang -c` invocation | +- **Non-lexical lifetimes (NLL):** Precise borrow tracking with loan release at last use +- **Control flow analysis:** Conditional and loop-aware borrow propagation +- **Inter-procedural:** Function summaries for return-alias and parameter effects +- **Move semantics:** Rust-style ownership transfer for drop-annotated types +- **Drop injection:** Automatic RAII-style cleanup at IR level with `-fpass-plugin=` +- **Source transformation:** Produces plain C code without pagurus annotations +- **Two-tier analysis:** AST for precision + IR for patterns invisible at source level ---- +## Documentation -## Architecture - -``` -your_file.c - │ - ├─ clang -fplugin=./pagurus_plugin.so - │ └─ C++ ASTPlugin - │ • FunctionSummaryVisitor (pass 1) - │ – Fixpoint return-alias summaries (direct, transitive, - │ pointer-arithmetic, conditional) - │ • FunctionEffectVisitor (pass 1b) - │ – Callee-first topological order via clang::CallGraph - │ – Per-param effects: frees / mutBorrows / sharedBorrows - │ • PagurusVisitor (pass 2, RecursiveASTVisitor) - │ – E001–E021 diagnostics with NLL loan release - │ – Conditional/loop loan propagation - │ – Inter-procedural loans via pass-1/1b summaries - │ – Drop semantics: E019/E020/E021 for #pragma pagurus drop(T) - │ • Source-to-source transformation (compile mode, default) - │ – Writes .pagurus.c - │ – Strips #pragma pagurus lines - │ – Injects missing drop calls at scope-exit / early-return - │ • Dry-run mode (-plugin-arg-pagurus dry-run) - │ – Reports what would change; emits E020; no file written - │ - └─ clang -fpass-plugin=./pagurus_plugin.so - └─ PagurusDropPass (module pass, PipelineStartEP) - • Scans all functions for allocas of drop-annotated struct types - • Injects drop-function calls at llvm.lifetime.end / ret - • DominatorTree: no double-drop injection - └─ PagurusIRPass - • AliasAnalysis + MemorySSA + DominatorTree - • IR-E001/E001b: load/store after free - • IR-E002: double-free - • IR-E015/loop: concurrent borrow sites, MemoryPhi - • IR-E018: AtomicRMW/CmpXchg on borrowed alloca -``` - -### Why C++ for this plugin? - -| Capability | C++ plugin | Rust (clang-sys) | -|---|---|---| -| Parse overhead | Zero (accesses host ASTContext directly) | Double-parse via libclang subprocess | -| `RecursiveASTVisitor` | ✅ full typed traversal | ❌ C API cursor walk only | -| `LiveVariables` / CFG | ✅ direct | ❌ not exposed in C API | -| `DiagnosticsEngine` | ✅ inline with source | Limited | -| LLVM IR pass | ✅ `AliasAnalysis`, `MemorySSA`, `DominatorTree` | Stub only | -| `-fplugin=` registration | ✅ `FrontendPluginRegistry::Add` | ❌ requires C++ template static initializer | - ---- +- [BUILDING.md](BUILDING.md) — Build instructions, MLIR integration, runtime dependencies +- [ANNOTATIONS.md](ANNOTATIONS.md) — Complete `#pragma pagurus` reference +- [INTEGRATION.md](INTEGRATION.md) — Multi-file project integration with `pagurus-check` and `pagurus.mk` +- [ARCHITECTURE.md](ARCHITECTURE.md) — Technical architecture and implementation details ## License