Skip to content

Architecture

gus edited this page Feb 27, 2026 · 1 revision

Architecture

Aguara is a single Go binary with no runtime dependencies. It uses a pipeline architecture with multiple analysis engines.

Directory Structure

aguara.go              Public API: Scan, ScanContent, Discover, ListRules, ExplainRule
options.go             Functional options for the public API
discover/              MCP client discovery: 17 clients, config parsers, auto-detection
cmd/aguara/            CLI entry point (Cobra)
  commands/            Subcommands: scan, discover, list-rules, explain, version, init
internal/
  engine/
    pattern/           Layer 1: regex/contains matcher + base64/hex decoder + code block awareness
    nlp/               Layer 2: goldmark AST walker, keyword classifier, injection detector
    rugpull/           Rug-pull detection analyzer
    toxicflow/         Taint tracking: source -> sink flow analysis
  rules/               Rule engine: YAML loader, compiler, self-tester
    builtin/           148 embedded rules across 12 YAML files (go:embed)
  scanner/             Orchestrator: file discovery, parallel analysis, result aggregation
  meta/                Post-processing: dedup, scoring, cross-finding correlation
  output/              Formatters: terminal (ANSI), JSON, SARIF, Markdown
  config/              .aguara.yml loader
  state/               Persistence for incremental scanning
  types/               Shared types (Finding, Severity, ScanResult)

Analysis Pipeline

Input (file/directory/content)
  │
  ├── File Discovery (glob, git-changed, ignore patterns)
  │
  ├── Target Loading (read files, build Target structs)
  │
  ├── Parallel Analysis (N workers)
  │   ├── Layer 1: Pattern Matcher
  │   │   ├── Regex matching (compiled rules)
  │   │   ├── Contains matching (literal strings)
  │   │   ├── Base64/Hex decoding (finds obfuscated payloads)
  │   │   └── Code block awareness (skips false positives in docs)
  │   │
  │   ├── Layer 2: NLP Analyzer
  │   │   ├── Goldmark AST parsing (Markdown structure)
  │   │   ├── Heading-body mismatch detection
  │   │   ├── Authority claim detection
  │   │   ├── Hidden instruction detection (HTML comments)
  │   │   ├── Code block language mismatch
  │   │   └── Override + dangerous operation combo
  │   │
  │   ├── Toxic Flow Analyzer
  │   │   ├── Source identification (user input, env vars, API responses)
  │   │   ├── Sink identification (exec, eval, shell, network)
  │   │   └── Taint propagation tracking
  │   │
  │   └── Rug-Pull Analyzer
  │       └── Detects patterns that change behavior after initial trust
  │
  ├── Post-Processing (meta package)
  │   ├── Deduplication (same finding, different patterns)
  │   ├── Scoring (combine findings for risk assessment)
  │   └── Cross-finding correlation (e.g., cred access + network = CRITICAL)
  │
  └── Output (terminal, JSON, SARIF, Markdown)

Key Design Decisions

Single binary, no dependencies

Aguara compiles to a static binary with all rules embedded via go:embed. No need for Python, Node.js, or any runtime. Works offline.

Deterministic output

Same input always produces the same output. No randomness, no LLM calls, no network access. This makes Aguara suitable for CI and reproducible audits.

Analyzer interface

All analysis engines implement the same Analyzer interface:

type Analyzer interface {
    Analyze(ctx context.Context, target *Target) ([]types.Finding, error)
}

New analyzers can be registered with scanner.RegisterAnalyzer().

Rule compilation

Rules are loaded from YAML, compiled once (regex compilation, pattern validation), and then matched against targets. The compilation step catches errors at startup rather than during scanning.

Import cycle prevention

The types package holds shared types (Finding, Severity, ScanResult). Both scanner and meta import types but not each other. The scanner imports meta for post-processing, so meta must not import scanner.

Code block awareness

The pattern matcher is aware of Markdown code blocks. This reduces false positives when dangerous patterns appear in documentation or examples (e.g., a security guide that shows curl | bash as an example of what NOT to do).

Exclude patterns

Rules can define exclude_patterns that cancel a match when the surrounding context (matched line + up to 3 lines before) matches an exclusion. This handles cases like installation guide headings that precede legitimate download commands.

Dependencies

Aguara has minimal dependencies:

Dependency Purpose
cobra CLI framework
goldmark Markdown AST parsing (for NLP analyzer)
yaml.v3 Rule YAML loading
testify Test assertions (test-only)

No network libraries, no database drivers, no cloud SDKs.

Clone this wiki locally