Skip to content

Latest commit

 

History

History
192 lines (147 loc) · 8.82 KB

File metadata and controls

192 lines (147 loc) · 8.82 KB

Building pagurus

Prerequisites

Ubuntu 24.04

LLVM 14–18 are in the standard universe repository:

# Replace 18 with 14, 15, 16, or 17 for older LLVM versions.
sudo apt install clang-18 llvm-18-dev libclang-18-dev cmake

Ubuntu 22.04

LLVM 11–14 are in the standard universe repository; LLVM 15–18 require the official LLVM apt repository:

LLVM_VER=18   # 11–14 need no PPA; 15–18 require the lines below
wget -qO- https://apt.llvm.org/llvm-snapshot.gpg.key \
  | sudo tee /etc/apt/trusted.gpg.d/apt.llvm.org.asc >/dev/null
echo "deb http://apt.llvm.org/jammy/ llvm-toolchain-jammy-${LLVM_VER} main" \
  | sudo tee /etc/apt/sources.list.d/llvm-${LLVM_VER}.list
sudo apt-get update
sudo apt install clang-${LLVM_VER} llvm-${LLVM_VER}-dev libclang-${LLVM_VER}-dev cmake

Supported Clang/LLVM versions: 11 through 18 (tested on Ubuntu 22.04 and 24.04, x86_64 and arm64). LLVM 11–13 are only tested on Ubuntu 22.04; Ubuntu 24.04 (noble) does not carry those versions in its standard repositories.

Build commands

mkdir build && cd build

LLVM_VER=18   # or 11–17

cmake .. \
  -DLLVM_DIR=$(llvm-config-${LLVM_VER} --cmakedir) \
  -DClang_DIR=/usr/lib/llvm-${LLVM_VER}/lib/cmake/clang

make -j$(nproc)
# → build/pagurus_plugin.so

AST-only build (without LLVM IR pass)

To build without the LLVM IR pass (omits llvm/IR and llvm/Analysis headers, reduces runtime symbol dependencies to Clang AST symbols only):

cmake .. -DPAGURUS_WITH_IR_PASS=OFF \
         -DLLVM_DIR=$(llvm-config-${LLVM_VER} --cmakedir) \
         -DClang_DIR=/usr/lib/llvm-${LLVM_VER}/lib/cmake/clang
make -j$(nproc)

This AST-only build still covers all E001–E019 checks and can be built with only libclang-dev (no llvm-dev needed):

sudo apt install clang-18 libclang-18-dev cmake

MLIR Upgrade Path

An optional MLIR-backed analysis tier can be enabled when libmlir-dev is available. It is not activated by default because libmlir-18-dev is not shipped in the standard llvm-dev package.

Version matching: libmlir-*-dev, LLVM, and Clang

MLIR, LLVM, and Clang are all part of the LLVM monorepo and share the same version number in every release. This means:

  • libmlir-N-dev must match the LLVM version (llvm-N-dev) and the Clang version (clang-N).
  • There is no separate MLIR version independent of LLVM or Clang — they are always equal.
LLVM / Clang version Required MLIR package MLIR cmake dir Ubuntu 22.04 Ubuntu 24.04
14 libmlir-14-dev /usr/lib/llvm-14/lib/cmake/mlir ✅ standard universe ✅ standard universe
15 libmlir-15-dev /usr/lib/llvm-15/lib/cmake/mlir LLVM PPA ❌ not available
16 libmlir-16-dev /usr/lib/llvm-16/lib/cmake/mlir LLVM PPA ✅ standard universe
17 libmlir-17-dev /usr/lib/llvm-17/lib/cmake/mlir LLVM PPA ✅ standard universe
18 libmlir-18-dev /usr/lib/llvm-18/lib/cmake/mlir LLVM PPA ✅ standard universe

Note: LLVM 11–13 do not ship libmlir-N-dev in Ubuntu repositories; the MLIR tier requires LLVM ≥ 14.

Installation and build

Replace N with your LLVM/Clang version number (e.g. 14, 16, 17, or 18) in every command below — N is a literal placeholder and cannot be used as-is in the shell:

# Ubuntu 22.04: add LLVM PPA first for LLVM ≥ 15  (skip on Ubuntu 24.04)
# Example for N=18: replace "jammy-18" and "llvm-18.list" accordingly
wget -qO- https://apt.llvm.org/llvm-snapshot.gpg.key \
  | sudo tee /etc/apt/trusted.gpg.d/apt.llvm.org.asc >/dev/null
echo "deb http://apt.llvm.org/jammy/ llvm-toolchain-jammy-N main" \
  | sudo tee /etc/apt/sources.list.d/llvm-N.list
sudo apt-get update

# Install — all four packages must use the same version number N
sudo apt-get install clang-N llvm-N-dev libclang-N-dev libmlir-N-dev mlir-N-tools

cmake .. -DPAGURUS_WITH_MLIR=ON \
         -DLLVM_DIR=$(llvm-config-N --cmakedir) \
         -DClang_DIR=/usr/lib/llvm-N/lib/cmake/clang \
         -DMLIR_DIR=/usr/lib/llvm-N/lib/cmake/mlir
make -j$(nproc)

Concrete example for LLVM 18 on Ubuntu 22.04:

wget -qO- https://apt.llvm.org/llvm-snapshot.gpg.key \
  | sudo tee /etc/apt/trusted.gpg.d/apt.llvm.org.asc >/dev/null
echo "deb http://apt.llvm.org/jammy/ llvm-toolchain-jammy-18 main" \
  | sudo tee /etc/apt/sources.list.d/llvm-18.list
sudo apt-get update
sudo apt-get install clang-18 llvm-18-dev libclang-18-dev libmlir-18-dev mlir-18-tools

cmake .. -DPAGURUS_WITH_MLIR=ON \
         -DLLVM_DIR=$(llvm-config-18 --cmakedir) \
         -DClang_DIR=/usr/lib/llvm-18/lib/cmake/clang \
         -DMLIR_DIR=/usr/lib/llvm-18/lib/cmake/mlir
make -j$(nproc)

Analysis tier comparison

Capability AST only (-fplugin=) LLVM IR pass (-fpass-plugin=) MLIR (-DPAGURUS_WITH_MLIR=ON)
Alias analysis Symbolic (variable names) BasicAA + TBAA + ScopedNoAlias on raw pointer bytes mlir::AliasAnalysis — dialect-type-aware; understands memref/tensor regions structurally
Memory ownership model Implicit (tracks malloc/free calls) Implicit (dominance of free() instruction) Explicit: memref.alloc / memref.dealloc carry region/lifetime metadata; aliasing proved structurally, not heuristically
Ownership transfer Move semantics via pragma annotation Not tracked at IR level mlir::bufferization::OwnershipInterface — flow-sensitive ownership transfer without pointer arithmetic
Side-effect tracking Conservative (function call = opaque) Not tracked beyond free() mlir::MemoryEffects::Effect annotates every op as ReadEffect / WriteEffect / AllocationEffect / FreeEffect — enables precise borrow propagation through side-effect-free calls
Array bounds (E011) Constant index only Constant index only (GEP operands) Affine maps in the affine dialect cover non-constant symbolic indices (e.g. a[i+n], loop bounds)
Loop-carried borrows ❌ (CFG not modelled) ✅ via MemorySSA MemPhi nodes ✅ via structured region semantics (no raw SSA φ needed)
GEP / bitcast aliases ✅ (IR-E015b / IR-E015c) ✅ (dialect types eliminate most GEP ambiguity)
Atomic races (E018) AST-level only (volatile/_Atomic) AtomicRMWInst / CmpXchgInst on borrowed alloca mlir::MemoryEffects classifies atomic ops uniformly
Drop injection Source rewrite (compile mode) ✅ IR-level at llvm.lifetime.end ✅ Lower-cost: memref.dealloc insertion at dialect level
False negatives Higher (bit-cast aliases invisible) Lower (covers GEP, bitcast, loop MemPhi) Lowest (structural type information eliminates most remaining ambiguity)
Extra dependency None None libmlir-N-dev (same N as LLVM/Clang; not in standard llvm-dev)

In short: the LLVM IR pass catches patterns invisible at the AST level (GEP, bitcast, loop-carried borrows). MLIR additionally eliminates the remaining false negatives that stem from LLVM IR's flat pointer model, and extends array-bounds checking to non-constant indices — at the cost of an extra build dependency.

The #ifdef PAGURUS_WITH_MLIR code paths are already in place inside src/pagurus_plugin.cpp and activate when -DPAGURUS_WITH_MLIR=ON is passed to CMake.

Runtime dependencies

pagurus_plugin.so is designed to be loaded into an already-running clang or opt process, so it does not carry libLLVM or libclang-cpp in its ELF NEEDED list. All LLVM and Clang symbols are resolved at dlopen-time from the host executable.

Library Version floor Why
libstdc++.so.6 GLIBCXX ≥ 3.4.29 (GCC 11) C++ standard library
libgcc_s.so.1 any GCC exception-handling support
libc.so.6 glibc ≥ 2.32 C library (__libc_single_threaded)
clang-N (host) same major version as the build ~90 LLVM/Clang symbols resolved from the host process at load time

Ubuntu summary:

  • Ubuntu 22.04 (glibc 2.35, GCC 12 → GLIBCXX 3.4.30): ✅
  • Ubuntu 24.04 (glibc 2.39, GCC 14 → GLIBCXX 3.4.33): ✅

The clang-N package (without any -dev suffix) is the only runtime requirement — dev packages (llvm-N-dev, libclang-N-dev) are needed only at build time.

ABI constraint: the plugin must be loaded into the same Clang major version it was compiled against. A plugin built with -DLLVM_VER=18 will not load correctly under clang-17 or clang-14.

Full dependency inspection commands:

# Direct ELF dependencies:
readelf -d pagurus_plugin.so | grep NEEDED

# All symbols resolved from the host clang at load time:
nm -D pagurus_plugin.so | grep ' U '