Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
178 changes: 178 additions & 0 deletions ANNOTATIONS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
# `#pragma pagurus` Annotations Reference

## Overview

pagurus uses pragma annotations to specify lifetime constraints, drop functions, and borrow management. These annotations guide the borrow checker's analysis and enable move semantics for resource-owning types.

## 1. Borrow release

Explicitly release a borrow before its natural scope end (mirrors Rust `drop(borrow)`):

```c
#pragma pagurus borrow_end(p)
```

This tells the checker that the borrow through pointer `p` ends at this point, even though `p` may still be in scope. Useful for manually resolving borrow conflicts.

## 2. Function lifetime annotations

### Concise syntax (preferred)

Place immediately before a function declaration. Space-separated tokens represent pointer parameter lifetimes; comma-grouped tokens represent struct parameter lifetimes.

#### Simple pointer parameters

Both params constrain the return (lifetime 'a for all):

```c
#pragma pagurus 'a 'a -> 'a
char *func1(char *a, char *b);
```

Only the first param constrains the return (lifetime 'a):

```c
#pragma pagurus 'a 'b -> 'a
int *get_x(int *x, int *y);
```

#### Struct parameters with multiple lifetimes

Comma-grouped syntax `'b,'a` indicates a struct parameter instantiated with lifetimes 'b (first) and 'a (second) from the struct's declaration:

```c
#pragma pagurus 'b,'a -> 'b
char *get_data(struct Buf buf);
```

The checker resolves which field is returned by matching lifetime 'b to the struct's field annotations (see section 4).

### Legacy syntax

Backward compatible with earlier versions:

```c
#pragma pagurus lifetime(get_ptr, 0)
```

This annotation indicates that the return value's lifetime is tied to parameter 0.

## 3. Struct and field lifetime annotations

Declare the struct's lifetime parameters and each field's lifetime:

```c
#pragma pagurus 'a 'b
struct Buf {
#pragma pagurus 'a
const char *data; /* lifetime 'a */
#pragma pagurus 'b
char *scratch; /* lifetime 'b */
};
```

**Validation rules:**
- Field with undeclared lifetime → error: "unknown lifetime 'e' in field ..."
- Declared lifetime parameter with no field → error: "declared but not bound 'c' ..."

All declared lifetimes must be used, and all field lifetimes must be declared.

## 4. Drop function annotation

Designates the next function declaration as the required destructor for a C struct or typedef type:

```c
#pragma pagurus drop(TypeName)
void destructor_fn(TypeName value);
```

**Requirements:**
- `TypeName` may be a struct tag name OR a typedef name
- The drop handler must:
- Return `void`
- Take exactly one parameter of type `T`
- Invalid signatures are reported as errors and not registered

### Examples

Named struct (tag name):

```c
#pragma pagurus drop(StrStruct)
void str_free(struct StrStruct s);
```

Using a typedef name:

```c
typedef struct StrStruct { ... } StrType;
#pragma pagurus drop(StrType)
void str_type_free(StrType s);
```

### Effect on error checking

Any local variable of a drop-annotated type that goes out of scope without the drop function being called triggers **E020[missing-drop]**:

| Plugin flags | E020 severity | Effect |
|---|---|---|
| `-fplugin=` only | **error** | No automatic injection; the drop call must be written explicitly in source |
| `-fplugin= -fpass-plugin=` | **warning** | IR pass (`PagurusDropPass`) auto-injects the drop call into the binary |

### Move semantics

Drop-annotated types have **move semantics** — assignment transfers ownership:

```c
#pragma pagurus drop(StrStruct)
void str_free(struct StrStruct s);

void example(void) {
struct StrStruct a = str_with_cap(64);
struct StrStruct b = a; // ownership moves to b — a is now Moved
str_free(a); // error: E019[use-after-move]
str_free(b); // OK
}
```

Structs without drop annotations are **copy-able** and can be freely assigned without transferring ownership.

## 5. Lifetime elision (rule 1 — extended)

For function **declarations** with exactly one "significant" parameter and a pointer return type, the checker automatically infers the return constraint without requiring an explicit annotation.

A parameter is "significant" when it is:
- A pointer parameter (`T *`), **or**
- A struct parameter passed by value where the struct has exactly one declared lifetime parameter

### Examples

Automatically inferred (no annotation needed):

```c
char *first_char(char *s);
```

Redundant annotation (emits warning: "can be elided"):

```c
#pragma pagurus 'a -> 'a
char *dup(char *s);
```

Single-lifetime struct (annotation can be elided):

```c
#pragma pagurus 'b
struct Buf { #pragma pagurus 'b char *scratch; };

#pragma pagurus 'a -> 'a // Warning: can be elided
char *get_data(struct Buf buf);
```

The single-lifetime struct annotation itself also warns:
"explicit lifetime declaration for struct 'Buf' can be elided".

## Error code reference

For detailed information about specific error codes (E001–E021), see the main [README.md](README.md#checks).
103 changes: 103 additions & 0 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
# pagurus Architecture

## Overview

```
your_file.c
├─ clang -fplugin=./pagurus_plugin.so
│ └─ C++ ASTPlugin
│ • FunctionSummaryVisitor (pass 1)
│ – Fixpoint return-alias summaries (direct, transitive,
│ pointer-arithmetic, conditional)
│ • FunctionEffectVisitor (pass 1b)
│ – Callee-first topological order via clang::CallGraph
│ – Per-param effects: frees / mutBorrows / sharedBorrows
│ • PagurusVisitor (pass 2, RecursiveASTVisitor<T>)
│ – E001–E021 diagnostics with NLL loan release
│ – Conditional/loop loan propagation
│ – Inter-procedural loans via pass-1/1b summaries
│ – Drop semantics: E019/E020/E021 for #pragma pagurus drop(T)
│ • Source-to-source transformation (compile mode, default)
│ – Writes <input>.pagurus.c
│ – Strips #pragma pagurus lines
│ – Injects missing drop calls at scope-exit / early-return
│ • Dry-run mode (-plugin-arg-pagurus dry-run)
│ – Reports what would change; emits E020; no file written
└─ clang -fpass-plugin=./pagurus_plugin.so
└─ PagurusDropPass (module pass, PipelineStartEP)
• Scans all functions for allocas of drop-annotated struct types
• Injects drop-function calls at llvm.lifetime.end / ret
• DominatorTree: no double-drop injection
└─ PagurusIRPass
• AliasAnalysis + MemorySSA + DominatorTree
• IR-E001/E001b: load/store after free
• IR-E002: double-free
• IR-E015/loop: concurrent borrow sites, MemoryPhi
• IR-E018: AtomicRMW/CmpXchg on borrowed alloca
```

## Why C++ for this plugin?

| Capability | C++ plugin | Rust (clang-sys) |
|---|---|---|
| Parse overhead | Zero (accesses host ASTContext directly) | Double-parse via libclang subprocess |
| `RecursiveASTVisitor<T>` | ✅ full typed traversal | ❌ C API cursor walk only |
| `LiveVariables` / CFG | ✅ direct | ❌ not exposed in C API |
| `DiagnosticsEngine` | ✅ inline with source | Limited |
| LLVM IR pass | ✅ `AliasAnalysis`, `MemorySSA`, `DominatorTree` | Stub only |
| `-fplugin=` registration | ✅ `FrontendPluginRegistry::Add<T>` | ❌ requires C++ template static initializer |

## Analysis Levels

pagurus operates at two levels simultaneously:

| Entry point | Load flag | Analysis |
|---|---|---|
| Clang AST plugin | `-fplugin=./pagurus_plugin.so` | RecursiveASTVisitor, precise column diagnostics |
| LLVM IR pass | `-fpass-plugin=./pagurus_plugin.so` | AliasAnalysis + MemorySSA + DominatorTree |

### AST level (`-fplugin=`)

The AST-level plugin provides:

- **E001–E021 checks:** Covers use-after-free, double-free, null-deref, memory leaks, borrow conflicts, move semantics, and drop semantics
- **NLL (Non-Lexical Lifetimes):** Precise borrow tracking with loan release at last use
- **CFG-aware:** Conditional and loop loan propagation
- **Inter-procedural analysis:** Function summaries for return-alias and parameter effects
- **Source transformation:** Compile mode (default) strips pragmas and injects drop calls

### LLVM IR level (`-fpass-plugin=`)

The IR-level pass catches patterns invisible at the AST level:

- **Bit-cast / type-pun aliases:** Detects aliasing through union type punning or explicit bitcast
- **GEP pointer arithmetic:** Tracks pointer arithmetic via GetElementPtr instructions
- **Loop-carried borrows:** Uses MemorySSA φ-nodes to detect borrows across loop back-edges
- **Atomic instruction races:** Detects AtomicRMW/CmpXchg on borrowed memory
- **Drop injection:** Automatically inserts drop calls at `llvm.lifetime.end` intrinsics

IR checks use:
- `AliasAnalysis::isMustAlias` for precise aliasing
- `MemorySSA` for def-def pairs and loop-carried dependencies
- `DominatorTree` for double-drop prevention and post-dominance

## Compile mode vs Dry-run mode

| Mode | Flag | E020 | Output file |
|---|---|---|---|
| Compile (default) | *(none)* | suppressed (auto-injected) | `<file>.pagurus.c` written |
| Dry-run | `-Xclang -plugin-arg-pagurus -Xclang dry-run` | reported | none |

**Compile mode** is the default behavior:
- Strips all `#pragma pagurus` annotations
- Injects missing drop calls at scope-exit
- Produces plain C code in `<input>.pagurus.c`
- E020 is suppressed because the transformation fixes it

**Dry-run mode** is an inspector mode:
- Reports all diagnostics including E020
- Prints a textual report of what changes would be made
- Does not write any output file
- Useful for code review and understanding required changes
Loading
Loading