Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
100 changes: 100 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,106 @@
All notable changes to Aguara are documented in this file.
Format based on [Keep a Changelog](https://keepachangelog.com/).

## [0.9.0] — 2026-03-20

Context-aware scanning, false-positive reduction infrastructure, Unicode evasion prevention, and performance optimization.

### Added

#### Context-aware scanning API

New `ScanContentAs()` function accepts a tool name for context-aware false-positive reduction:

```go
result, err := aguara.ScanContentAs(ctx, content, "skill.md", "Edit")
```

When the scanner knows which tool generated the content, it can automatically skip rules that are always false positives for that tool. Also available as an option: `aguara.WithToolName("Edit")`.

#### Built-in tool exemptions

Automatic false-positive elimination for known tool+rule combinations:

| Rule | Exempt tools | Reason |
|------|-------------|--------|
| TC-005 | Bash, Write, Edit, MultiEdit, NotebookEdit, Agent | Shell metacharacters are normal syntax in these tools |
| MCPCFG_002 | Bash, Write, Edit, MultiEdit, NotebookEdit, Agent | MCP config patterns in file-editing tools |
| MCPCFG_004 | WebFetch, Fetch, WebSearch | Remote URLs are the purpose of fetch tools |
| MCPCFG_006 | Bash, Write, Edit, MultiEdit, NotebookEdit | Server config patterns in file-editing tools |
| THIRDPARTY_001 | WebFetch, Fetch, WebSearch | Third-party content is the purpose of fetch tools |

Exemptions activate automatically when a tool name is provided. User config overrides take precedence.

#### Scan profiles

Three enforcement profiles control how aggressively findings block:

| Profile | Behavior | Use case |
|---------|----------|----------|
| `strict` | All rules enforce (default) | Standalone scanning, untrusted agents |
| `content-aware` | Only TC-001, TC-003, TC-006 block | Development agents (Claude Code, Cursor) |
| `minimal` | TC-001, TC-003, TC-006 flag only | Trusted internal agents |

Findings are always preserved in the result. Only the verdict changes. CLI: `--profile content-aware`.

#### Verdict field

`ScanResult` now includes a `Verdict` field (clean/flag/block) computed from findings and profile:

```json
{"findings": [...], "verdict": 2, "tool_name": "Edit", ...}
```

- `0` = clean (no actionable findings)
- `1` = flag (informational)
- `2` = block (action required)

#### Tool-scoped rules in config

Rules can be restricted to specific tools in `.aguara.yml`:

```yaml
rule_overrides:
TC-005:
apply_to_tools: ["Bash"] # only enforce on Bash
MCPCFG_004:
exempt_tools: ["WebFetch"] # enforce on everything except WebFetch
```

`apply_to_tools` and `exempt_tools` are mutually exclusive per rule.

#### NFKC Unicode normalization

All content is NFKC-normalized before scanning, both in `ScanContent()`/`ScanContentAs()` and in file-based `Scan()`. Fullwidth characters, compatibility forms, and homoglyphs are collapsed to their canonical ASCII equivalents before pattern matching.

Example: `\uFF29\uFF47\uFF4E\uFF4F\uFF52\uFF45` (fullwidth "Ignore") is normalized to ASCII "Ignore" and detected by existing rules. Zero false-positive cost.

#### CLI flags

- `--tool-name <name>`: Set tool context for false-positive reduction
- `--profile <strict|content-aware|minimal>`: Set scan enforcement profile

#### WASM build

- `make wasm` produces `aguara.wasm` (6.1MB) + `wasm_exec.js`
- Exposes `aguaraScanContent`, `aguaraScanContentAs`, `aguaraListRules` to JavaScript
- Example HTML page at `cmd/wasm/index.html` for browser-based scanning
- Client-side only, no data leaves the browser

### Improved

#### Aho-Corasick multi-pattern matching

Pattern matcher now uses an Aho-Corasick automaton for `contains` patterns. All substring patterns are compiled into a single DFA at initialization, enabling O(n+m) multi-pattern search. Rules with only `contains` patterns that have no matches are skipped entirely without running individual pattern matching.

Measured improvement: ~7.5% faster on clean files (majority case). The main bottleneck remains regex matching.

### Summary

**177 YAML rules + 4 dynamic** across 13 categories. 7 distribution channels (+ WASM). 500 tests. 0 lint issues. 2 new dependencies (`golang.org/x/text`, `petar-dambovaliev/aho-corasick`).

---

## [0.8.0] — 2026-03-11

Community contributions, 3-phase security audit, and developer experience improvements.
Expand Down
8 changes: 6 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ VERSION ?= dev
COMMIT := $(shell git rev-parse --short HEAD 2>/dev/null || echo "none")
LDFLAGS := -ldflags "-s -w -X $(PKG)/cmd/aguara/commands.Version=$(VERSION) -X $(PKG)/cmd/aguara/commands.Commit=$(COMMIT)"

.PHONY: build test lint run clean fmt vet
.PHONY: build test lint run clean fmt vet wasm

build:
go build $(LDFLAGS) -o $(BINARY) ./cmd/aguara
Expand All @@ -24,5 +24,9 @@ vet:
run:
go run ./cmd/aguara $(ARGS)

wasm:
GOOS=js GOARCH=wasm go build -o aguara.wasm ./cmd/wasm
cp "$$(go env GOROOT)/lib/wasm/wasm_exec.js" .

clean:
rm -f $(BINARY)
rm -f $(BINARY) aguara.wasm wasm_exec.js
91 changes: 80 additions & 11 deletions aguara.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ import (
"sort"
"strings"

"golang.org/x/text/unicode/norm"

"github.com/garagon/aguara/discover"
"github.com/garagon/aguara/internal/engine/nlp"
"github.com/garagon/aguara/internal/engine/pattern"
Expand All @@ -27,6 +29,8 @@ type (
Finding = types.Finding
ScanResult = types.ScanResult
ContextLine = types.ContextLine
Verdict = types.Verdict
ScanProfile = types.ScanProfile
)

const (
Expand All @@ -35,6 +39,14 @@ const (
SeverityMedium = types.SeverityMedium
SeverityHigh = types.SeverityHigh
SeverityCritical = types.SeverityCritical

VerdictClean = types.VerdictClean
VerdictFlag = types.VerdictFlag
VerdictBlock = types.VerdictBlock

ProfileStrict = types.ProfileStrict
ProfileContentAware = types.ProfileContentAware
ProfileMinimal = types.ProfileMinimal
)

// Re-export discover types so consumers don't need a separate import.
Expand All @@ -49,10 +61,13 @@ func Discover() (*DiscoverResult, error) {
return discover.Scan()
}

// RuleOverride allows changing the severity of a rule or disabling it.
// RuleOverride allows changing the severity of a rule, disabling it, or
// restricting it to specific tools.
type RuleOverride struct {
Severity string
Disabled bool
Severity string
Disabled bool
ApplyToTools []string // only enforce on these tools (mutually exclusive with ExemptTools)
ExemptTools []string // enforce on all tools except these
}

// RuleInfo provides summary metadata about a detection rule.
Expand Down Expand Up @@ -93,11 +108,31 @@ func Scan(ctx context.Context, path string, opts ...Option) (*ScanResult, error)

// ScanContent scans inline content without writing to disk.
// filename is a hint for rule target matching (e.g. "skill.md", "config.json").
// Content is NFKC-normalized before scanning to prevent Unicode evasion attacks.
func ScanContent(ctx context.Context, content string, filename string, opts ...Option) (*ScanResult, error) {
return scanContentInternal(ctx, content, filename, "", opts)
}

// ScanContentAs scans inline content with tool context for false-positive reduction.
// toolName identifies the tool that generated the content (e.g. "Bash", "Edit", "WebFetch").
// When provided, built-in tool exemptions and scan profiles can reduce false positives.
// Content is NFKC-normalized before scanning to prevent Unicode evasion attacks.
func ScanContentAs(ctx context.Context, content string, filename string, toolName string, opts ...Option) (*ScanResult, error) {
return scanContentInternal(ctx, content, filename, toolName, opts)
}

func scanContentInternal(ctx context.Context, content string, filename string, toolName string, opts []Option) (*ScanResult, error) {
if filename == "" {
filename = "skill.md"
}
// NFKC normalization prevents Unicode evasion (e.g. fullwidth "Ignore" → "Ignore")
content = norm.NFKC.String(content)

cfg := applyOpts(opts)
// Explicit toolName parameter takes precedence over WithToolName option
if toolName != "" {
cfg.toolName = toolName
}
s, compiled, err := buildScanner(cfg)
if err != nil {
return nil, err
Expand All @@ -118,7 +153,8 @@ func ScanContent(ctx context.Context, content string, filename string, opts ...O
// Use WithCategory to filter by category.
func ListRules(opts ...Option) []RuleInfo {
cfg := applyOpts(opts)
compiled, _ := loadAndCompile(cfg)
cr, _ := loadAndCompile(cfg)
compiled := cr.compiled

sort.Slice(compiled, func(i, j int) bool {
return compiled[i].ID < compiled[j].ID
Expand Down Expand Up @@ -150,7 +186,8 @@ func ListRules(opts ...Option) []RuleInfo {
func ExplainRule(id string, opts ...Option) (*RuleDetail, error) {
id = strings.ToUpper(strings.TrimSpace(id))
cfg := applyOpts(opts)
compiled, _ := loadAndCompile(cfg)
cr, _ := loadAndCompile(cfg)
compiled := cr.compiled

var found *rules.CompiledRule
for _, r := range compiled {
Expand Down Expand Up @@ -196,9 +233,14 @@ func applyOpts(opts []Option) *scanConfig {
return cfg
}

type compileResult struct {
compiled []*rules.CompiledRule
toolScopedRules map[string]scanner.ToolScopedRule
}

// loadAndCompile loads built-in (and optionally custom) rules, compiles them,
// and applies overrides/filters. Used by all public functions.
func loadAndCompile(cfg *scanConfig) ([]*rules.CompiledRule, error) {
func loadAndCompile(cfg *scanConfig) (*compileResult, error) {
rawRules, err := rules.LoadFromFS(builtin.FS())
if err != nil {
return nil, fmt.Errorf("loading built-in rules: %w", err)
Expand All @@ -215,12 +257,30 @@ func loadAndCompile(cfg *scanConfig) ([]*rules.CompiledRule, error) {
compiled, compileErrs := rules.CompileAll(rawRules)
_ = compileErrs // non-fatal: invalid rules are skipped

var toolScoped map[string]scanner.ToolScopedRule
if len(cfg.ruleOverrides) > 0 {
overrides := make(map[string]rules.RuleOverride, len(cfg.ruleOverrides))
for id, ovr := range cfg.ruleOverrides {
overrides[id] = rules.RuleOverride{Severity: ovr.Severity, Disabled: ovr.Disabled}
overrides[id] = rules.RuleOverride{
Severity: ovr.Severity,
Disabled: ovr.Disabled,
ApplyToTools: ovr.ApplyToTools,
ExemptTools: ovr.ExemptTools,
}
}
compiled, _ = rules.ApplyOverrides(compiled, overrides)
// Collect tool-scoped overrides for runtime filtering
for id, ovr := range overrides {
if len(ovr.ApplyToTools) > 0 || len(ovr.ExemptTools) > 0 {
if toolScoped == nil {
toolScoped = make(map[string]scanner.ToolScopedRule)
}
toolScoped[id] = scanner.ToolScopedRule{
ApplyToTools: ovr.ApplyToTools,
ExemptTools: ovr.ExemptTools,
}
}
}
}

if len(cfg.disabledRules) > 0 {
Expand All @@ -231,12 +291,12 @@ func loadAndCompile(cfg *scanConfig) ([]*rules.CompiledRule, error) {
compiled = rules.FilterByIDs(compiled, disabled)
}

return compiled, nil
return &compileResult{compiled: compiled, toolScopedRules: toolScoped}, nil
}

// buildScanner creates a fully wired Scanner with all standard analyzers.
func buildScanner(cfg *scanConfig) (*scanner.Scanner, []*rules.CompiledRule, error) {
compiled, err := loadAndCompile(cfg)
cr, err := loadAndCompile(cfg)
if err != nil {
return nil, nil, err
}
Expand All @@ -249,10 +309,19 @@ func buildScanner(cfg *scanConfig) (*scanner.Scanner, []*rules.CompiledRule, err
if cfg.maxFileSize > 0 {
s.SetMaxFileSize(cfg.maxFileSize)
}
if cfg.toolName != "" {
s.SetToolName(cfg.toolName)
}
if cfg.scanProfile != ProfileStrict {
s.SetScanProfile(cfg.scanProfile)
}
if len(cr.toolScopedRules) > 0 {
s.SetToolScopedRules(cr.toolScopedRules)
}

s.RegisterAnalyzer(pattern.NewMatcher(compiled))
s.RegisterAnalyzer(pattern.NewMatcher(cr.compiled))
s.RegisterAnalyzer(nlp.NewInjectionAnalyzer())
s.RegisterAnalyzer(toxicflow.New())

return s, compiled, nil
return s, cr.compiled, nil
}
Loading
Loading