Skip to content

Latest commit

 

History

History
103 lines (85 loc) · 4.87 KB

File metadata and controls

103 lines (85 loc) · 4.87 KB

pagurus Architecture

Overview

your_file.c
    │
    ├─ clang -fplugin=./pagurus_plugin.so
    │         └─ C++ ASTPlugin
    │              • FunctionSummaryVisitor (pass 1)
    │                  – Fixpoint return-alias summaries (direct, transitive,
    │                    pointer-arithmetic, conditional)
    │              • FunctionEffectVisitor (pass 1b)
    │                  – Callee-first topological order via clang::CallGraph
    │                  – Per-param effects: frees / mutBorrows / sharedBorrows
    │              • PagurusVisitor (pass 2, RecursiveASTVisitor<T>)
    │                  – E001–E021 diagnostics with NLL loan release
    │                  – Conditional/loop loan propagation
    │                  – Inter-procedural loans via pass-1/1b summaries
    │                  – Drop semantics: E019/E020/E021 for #pragma pagurus drop(T)
    │              • Source-to-source transformation (compile mode, default)
    │                  – Writes <input>.pagurus.c
    │                  – Strips #pragma pagurus lines
    │                  – Injects missing drop calls at scope-exit / early-return
    │              • Dry-run mode  (-plugin-arg-pagurus dry-run)
    │                  – Reports what would change; emits E020; no file written
    │
    └─ clang -fpass-plugin=./pagurus_plugin.so
              └─ PagurusDropPass (module pass, PipelineStartEP)
                   • Scans all functions for allocas of drop-annotated struct types
                   • Injects drop-function calls at llvm.lifetime.end / ret
                   • DominatorTree: no double-drop injection
              └─ PagurusIRPass
                   • AliasAnalysis + MemorySSA + DominatorTree
                   • IR-E001/E001b: load/store after free
                   • IR-E002: double-free
                   • IR-E015/loop: concurrent borrow sites, MemoryPhi
                   • IR-E018: AtomicRMW/CmpXchg on borrowed alloca

Why C++ for this plugin?

Capability C++ plugin Rust (clang-sys)
Parse overhead Zero (accesses host ASTContext directly) Double-parse via libclang subprocess
RecursiveASTVisitor<T> ✅ full typed traversal ❌ C API cursor walk only
LiveVariables / CFG ✅ direct ❌ not exposed in C API
DiagnosticsEngine ✅ inline with source Limited
LLVM IR pass AliasAnalysis, MemorySSA, DominatorTree Stub only
-fplugin= registration FrontendPluginRegistry::Add<T> ❌ requires C++ template static initializer

Analysis Levels

pagurus operates at two levels simultaneously:

Entry point Load flag Analysis
Clang AST plugin -fplugin=./pagurus_plugin.so RecursiveASTVisitor, precise column diagnostics
LLVM IR pass -fpass-plugin=./pagurus_plugin.so AliasAnalysis + MemorySSA + DominatorTree

AST level (-fplugin=)

The AST-level plugin provides:

  • E001–E021 checks: Covers use-after-free, double-free, null-deref, memory leaks, borrow conflicts, move semantics, and drop semantics
  • NLL (Non-Lexical Lifetimes): Precise borrow tracking with loan release at last use
  • CFG-aware: Conditional and loop loan propagation
  • Inter-procedural analysis: Function summaries for return-alias and parameter effects
  • Source transformation: Compile mode (default) strips pragmas and injects drop calls

LLVM IR level (-fpass-plugin=)

The IR-level pass catches patterns invisible at the AST level:

  • Bit-cast / type-pun aliases: Detects aliasing through union type punning or explicit bitcast
  • GEP pointer arithmetic: Tracks pointer arithmetic via GetElementPtr instructions
  • Loop-carried borrows: Uses MemorySSA φ-nodes to detect borrows across loop back-edges
  • Atomic instruction races: Detects AtomicRMW/CmpXchg on borrowed memory
  • Drop injection: Automatically inserts drop calls at llvm.lifetime.end intrinsics

IR checks use:

  • AliasAnalysis::isMustAlias for precise aliasing
  • MemorySSA for def-def pairs and loop-carried dependencies
  • DominatorTree for double-drop prevention and post-dominance

Compile mode vs Dry-run mode

Mode Flag E020 Output file
Compile (default) (none) suppressed (auto-injected) <file>.pagurus.c written
Dry-run -Xclang -plugin-arg-pagurus -Xclang dry-run reported none

Compile mode is the default behavior:

  • Strips all #pragma pagurus annotations
  • Injects missing drop calls at scope-exit
  • Produces plain C code in <input>.pagurus.c
  • E020 is suppressed because the transformation fixes it

Dry-run mode is an inspector mode:

  • Reports all diagnostics including E020
  • Prints a textual report of what changes would be made
  • Does not write any output file
  • Useful for code review and understanding required changes