Per-Project Bash Command Allowlist System by ipodishima · Pull Request #80 · AutoForgeAI/autoforge

ipodishima · 2026-01-22T11:23:12Z

The Problem

AutoCoder agents can only run hardcoded commands (npm, git, curl, etc.), blocking legitimate use cases:

❌ Can't build iOS apps (needs swift, xcodebuild)
❌ Can't build Rust projects (needs cargo, rustc)
❌ Can't run custom build scripts
❌ Can't enforce org-wide security policies

Disclaimer: maybe there are ways of getting around it right now?
Also, I'm considering making updates to autocoder to not only create websites but other things.
Currently experimenting on APIs, CLI tool, and a flutter app

The Solution

Hierarchical, secure-by-default command allowlist system.

Project-Level Config

# .autocoder/allowed_commands.yaml
version: 1
commands:
  - name: swift*        # All Swift tools
  - name: xcodebuild
  - name: ./scripts/build.sh

Organization-Level Config

# ~/.autocoder/config.yaml
version: 1
allowed_commands:
  - name: jq          # Available to ALL projects
blocked_commands:
  - aws               # Blocked across ALL projects
  - kubectl

What's Implemented

✅ Phase 1: Project YAML + Blocklist

Project-specific commands via YAML
Pattern matching (swift* matches all Swift tools)
Hardcoded blocklist (sudo, dd, shutdown - never allowed)
136 unit tests

✅ Phase 2: Org Config + Hierarchy

Org-wide allowlist/blocklist
Hierarchy: Hardcoded Block → Org Block → Org Allow → Global → Project
9 integration tests

📋 Phase 3: Mid-Session Approval (Spec ready, not implemented)

Agent requests approval for unknown commands
Full spec in PHASE3_SPEC.md
Intentionally deferred - Phases 1 & 2 are complete and functional

Security Hierarchy

1. HARDCODED BLOCKLIST (never allowed)
   ↓ sudo, dd, shutdown

2. ORG BLOCKLIST (cannot be overridden)
   ↓ aws, kubectl, terraform

3. ORG ALLOWLIST + GLOBAL ALLOWLIST
   ↓ jq, npm, git, curl

4. PROJECT ALLOWLIST
   ↓ swift, cargo, ./build.sh

Rule: Blocklist always wins.

Key Features

Pattern matching: swift* matches swift, swiftc, swiftlint
Local scripts: ./scripts/build.sh works from any directory
Secure by default: All examples commented out
50 command limit per project
Helpful error messages with config hints
100% backward compatible

Testing

145 tests, all passing:

136 unit tests (pattern matching, YAML, hierarchy)
9 integration tests (real security hook flows)

python test_security.py
python test_security_integration.py

Documentation

examples/README.md - Comprehensive user guide
examples/project_allowed_commands.yaml - Template (all commented)
examples/org_config.yaml - Org config template
PHASE3_SPEC.md - Mid-session approval spec (future)

Files Changed

security.py (+361) - Core implementation
test_security.py (+488) - Unit tests
test_security_integration.py (new) - Integration tests
examples/ (new) - Documentation and templates
PHASE3_SPEC.md (new) - Future enhancement spec

Total: +3,796 lines across 11 files

Why Phase 3 Isn't Included

Phase 3 (mid-session approval) is a UX enhancement, not a core requirement:

Phases 1 & 2 are fully functional
Users can pre-configure commands they need
Better to ship and gather feedback first
Complete implementation spec ready for later

Try It

# Create project
python start.py  # Choose "Create new project"

# Edit config
nano .autocoder/allowed_commands.yaml

# Uncomment what you need:
commands:
  - name: swift
  - name: xcodebuild

# Run agent - now works with iOS! 🎉
python autonomous_agent_demo.py --project-dir your-project

Impact

Enables AutoCoder to build ANY type of project (iOS, Rust, Python, Go) while maintaining security controls.

Summary by CodeRabbit

New Features
- Project-scoped allowed-commands file support and prioritized allowlist hierarchy (hard block → org block → org allow → global allow → project allow)
- Mid-session command approval UI/TUI with Session / Permanent / Deny choices, timeout, and stronger warnings for dangerous commands
Documentation
- Expanded security guide, examples, pattern rules, limits, and testing guidance
Tests
- New unit and integration suites covering patterns, configs, hierarchy, approvals, and enforcement behaviors

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-22T11:23:39Z

📝 Walkthrough

Walkthrough

Adds a hierarchical, YAML-driven bash command security system with org- and project-level allowlists/blocklists, pattern matching, context propagation from client, extensive docs/examples, tests, and a Phase 3 design plus scaffolding for mid-session user approval (ApprovalManager, CLI/UI/WebSocket flows, and persistence).

Changes

Cohort / File(s)	Summary
Documentation & Spec `CLAUDE.md`, `PHASE3_SPEC.md`	Replaces brief security note with full design and Phase 3 spec: UX flows (CLI/UI), architecture, ApprovalManager API, file-lock persistence, and implementation checklist.
Examples & Guides `examples/README.md`, `examples/org_config.yaml`, `examples/project_allowed_commands.yaml`, `examples/OPTIMIZE_CONFIG.md`	Adds comprehensive example configs, optimization guide, README, and templates documenting pattern rules, limits, precedence, and testing.
Core Security Logic `security.py`	Introduces hierarchical resolution (org/project/global), `BLOCKED_COMMANDS` / `DANGEROUS_COMMANDS`, pattern matching helpers, config loaders/validators, `is_command_allowed`, and updates `bash_security_hook` to enforce precedence and provide guidance on denied commands.
Client Context Propagation `client.py`	Injects `project_dir` into pre-tool-use hook context via new inner wrapper so bash_security_hook resolves project-specific configs.
Phase 3 Scaffolding `PHASE3_SPEC.md` (referenced modules)	Adds detailed mid-session approval design and scaffolding: ApprovalManager spec, `approval_tui.py` sketch, WebSocket message shapes, server handlers, persistence behavior and file-locking examples.
Tests `test_security.py`, `test_security_integration.py`	Adds extensive unit and integration tests for pattern matching, YAML loading/validation, hierarchy resolution, blocklist enforcement, project/org interactions, and limits.
Project Scaffolding & Dependencies `prompts.py`, `requirements.txt`	`prompts.py` creates `.autocoder` in new projects and copies example YAML when present; `requirements.txt` adds `pyyaml>=6.0.0`.

Sequence Diagram(s)

sequenceDiagram
    participant Agent
    participant PreToolHook as PreToolUse Hook
    participant BashHook as bash_security_hook
    participant OrgLoader as Org Config Loader
    participant ProjectLoader as Project Config Loader
    participant ApprovalMgr as ApprovalManager
    participant UI as UI/CLI
    participant FS as File System (YAML)

    Agent->>PreToolHook: invoke pre-tool hook (Bash)
    PreToolHook->>BashHook: call with context (includes project_dir)
    BashHook->>OrgLoader: load org config (~/.autocoder/config.yaml)
    OrgLoader-->>BashHook: org allow/block sets
    BashHook->>ProjectLoader: load project config (.autocoder/allowed_commands.yaml)
    ProjectLoader-->>BashHook: project allow set
    BashHook->>BashHook: compute effective allow/block (precedence)
    alt command in blocked set
        BashHook-->>Agent: return block decision
    else command not allowed
        BashHook->>ApprovalMgr: request_approval(command)
        ApprovalMgr->>UI: emit approval_request (CLI overlay or WebSocket)
        UI->>ApprovalMgr: user responds (session / permanent / deny)
        ApprovalMgr-->>BashHook: response (allow/block)
        alt permanent
            BashHook->>FS: persist to project YAML (file lock)
            FS-->>BashHook: write success
        end
        BashHook-->>Agent: allow or block decision
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

feat: add Create Spec button and fix Windows asyncio subprocess #78 — Modifies pre-tool-use hook behavior and permission handling; closely related to the new approval/allowlist changes in security.py and client.py.

Poem

🐇 I hopped through configs, YAML in paw,

Orgs set the fences, projects keep law,
A ping for approval, I thumped my foot,
Session or save — decisions take root,
I nibble safe commands and cheer, "Toot toot!"

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Per-Project Bash Command Allowlist System' clearly and concisely summarizes the main feature added in this PR: a hierarchical command allowlist system with project-specific configuration.
Docstring Coverage	✅ Passed	Docstring coverage is 94.29% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

🤖 Fix all issues with AI agents

In `@security.py`:
- Around line 9-13: The import block in security.py is not sorted into
standard-library and third-party groups; reorder and format the imports so
standard-library imports (os, shlex, pathlib.Path, typing.Optional) come first,
alphabetized, then a blank line, then third-party imports (yaml); ensure the
import lines referencing os, shlex, Path, Optional and yaml are adjusted
accordingly to satisfy the CI import-sorting rule.
- Around line 357-388: The matches_pattern function currently treats a pattern
of "*" as a prefix wildcard that matches everything; update validation to reject
bare wildcards and/or harden matching: in validate_project_command (the config
validation path) explicitly disallow a pattern equal to "*" (raise/return
validation error), and in matches_pattern add an early check that if pattern ==
"*" return False (or treat as invalid) so a lone asterisk cannot match all
commands; reference the functions matches_pattern and validate_project_command
when making these changes.

In `@test_security_integration.py`:
- Around line 346-348: The f-string prefix on the string passed to
(autocoder_dir / "allowed_commands.yaml").write_text is unnecessary because
there are no placeholders; change the call to use a regular string by removing
the leading f so it becomes "version: 1\ncommands:\n" + "\n".join(commands),
keeping the same variables (autocoder_dir, commands) and preserving the
write_text invocation.
- Around line 24-25: Remove the unused import by deleting the create_client
import from test_security_integration.py; locate the top-level import line that
reads "from client import create_client" and remove that symbol so only the used
import (bash_security_hook) remains to resolve the F401 unused-import lint
error.

In `@test_security.py`:
- Around line 266-268: The test constructs a YAML string using
config_path.write_text(f"version: 1\ncommands:\n" + "\n".join(commands)) but the
f-string has no placeholders (causing F541); change the f-string to a regular
string literal by removing the leading f so config_path.write_text("version:
1\ncommands:\n" + "\n".join(commands)) to eliminate the unnecessary f-prefix
while keeping the concatenation with "\n".join(commands) and leaving variables
commands and config_path intact.
- Around line 12-30: Remove the unused imports get_project_allowed_commands,
is_command_allowed, and get_org_config_path from the security import list and
reorder/sort the import blocks per I001 (standard library imports first, then
local/package imports) so the import section is alphabetized and grouped
correctly; update the import statement that currently lists bash_security_hook,
extract_commands, validate_chmod_command, validate_init_script,
load_project_commands, load_org_config, validate_project_command,
matches_pattern, BLOCKED_COMMANDS, DANGEROUS_COMMANDS to reflect the removal and
proper sorted order.
- Line 331: Remove the unused local variable blocked_cmds in test_security.py:
the line assigning blocked_cmds = list(BLOCKED_COMMANDS) +
list(DANGEROUS_COMMANDS) should be deleted (or replaced by using the combined
list where needed). Locate the assignment of blocked_cmds and either remove that
line or refactor the code to use the combined list directly in the assertion or
helper that needs BLOCKED_COMMANDS and DANGEROUS_COMMANDS.

🧹 Nitpick comments (7)

examples/README.md (1)
23-27: Add language specifier to fenced code blocks.

Static analysis flagged code blocks without language specifiers. Use text or plaintext for ASCII diagrams and console output.
📝 Proposed fix
-```
+```text
 my-project/
   .autocoder/
     allowed_commands.yaml    ← Automatically created from template
-```
+```
Apply similar changes to the hierarchy diagram (line 122) and console output blocks (lines 504, 509).
security.py (3)
401-450: Consider logging validation failures for debugging.

When config validation fails, the function silently returns None. Users may not understand why their configuration is being ignored. Consider logging a warning when a config file exists but fails validation.
🔧 Proposed improvement
+import logging
+
+logger = logging.getLogger(__name__)
+
 def load_org_config() -> Optional[dict]:
     # ... existing code ...
 
     if not isinstance(config, dict):
+        logger.warning("Org config ignored: root must be a dict")
         return None
 
     if "version" not in config:
+        logger.warning("Org config ignored: missing 'version' field")
         return None
486-488: Silent rejection of config exceeding 50 commands.

When the 50-command limit is exceeded, the entire config is silently ignored. Users would benefit from knowing why their config isn't taking effect.

684-703: Distinguish between hardcoded and org blocklist in error messages.

The error message on line 689 says "blocked at organization level" but the command could also be in the hardcoded BLOCKED_COMMANDS. Consider making the message more accurate.
📝 Proposed improvement
         # Check blocklist first (highest priority)
         if cmd in blocked_commands:
+            # Determine source of block for clearer messaging
+            if cmd in BLOCKED_COMMANDS or cmd in DANGEROUS_COMMANDS:
+                reason = f"Command '{cmd}' is permanently blocked and cannot be approved."
+            else:
+                reason = f"Command '{cmd}' is blocked at organization level and cannot be approved."
             return {
                 "decision": "block",
-                "reason": f"Command '{cmd}' is blocked at organization level and cannot be approved.",
+                "reason": reason,
             }
PHASE3_SPEC.md (2)
64-64: Add language identifiers to fenced code blocks.

Several fenced code blocks lack language specifiers (e.g., text or plaintext for ASCII diagrams/UX mockups). This helps with syntax highlighting and satisfies markdown lint rules.
Example fix for lines 64, 88, 111, 123, 153
-```
+```text
 Agent is working...
Also applies to: 88-88, 111-111, 123-123, 153-153

1418-1418: Add language identifier for the audit log example.
-```
+```text
 [2026-01-22 10:30:45] User approved 'swift' (session-only)
test_security.py (1)
405-458: Consider using a context manager or fixture for HOME override.

The test_org_config_loading and similar tests directly modify os.environ["HOME"]. While the cleanup is handled, consider extracting a helper context manager to reduce code duplication and ensure cleanup even on exceptions.
Example helper context manager
from contextlib import contextmanager

`@contextmanager`
def override_home(new_home: str):
    """Temporarily override HOME environment variable."""
    original_home = os.environ.get("HOME")
    os.environ["HOME"] = new_home
    try:
        yield
    finally:
        if original_home:
            os.environ["HOME"] = original_home
        else:
            del os.environ["HOME"]
Usage:
with override_home(tmphome):
    # test code here

security.py

test_security_integration.py

test_security.py

Implement hierarchical command security with project and org-level configs: WHAT'S NEW: - Project-level YAML config (.autocoder/allowed_commands.yaml) - Organization-level config (~/.autocoder/config.yaml) - Pattern matching (exact, wildcards, local scripts) - Hardcoded blocklist (sudo, dd, shutdown - never allowed) - Org blocklist (terraform, kubectl - configurable) - Helpful error messages with config hints - Comprehensive documentation and examples ARCHITECTURE: - Hierarchical resolution: Hardcoded → Org Block → Org Allow → Global → Project - YAML validation with 50 command limit per project - Pattern matching: exact ("swift"), wildcards ("swift*"), scripts ("./build.sh") - Secure by default: all examples commented out TESTING: - 136 unit tests (pattern matching, YAML, hierarchy, validation) - 9 integration tests (real security hook flows) - All tests passing, 100% backward compatible DOCUMENTATION: - examples/README.md - comprehensive guide with use cases - examples/project_allowed_commands.yaml - template (all commented) - examples/org_config.yaml - org config template (all commented) - PHASE3_SPEC.md - mid-session approval spec (future enhancement) - Updated CLAUDE.md with security model documentation USE CASES: - iOS projects: Add Swift toolchain (xcodebuild, swift*, etc.) - Rust projects: Add cargo, rustc, clippy - Enterprise: Block aws, kubectl, terraform org-wide - Custom scripts: Allow ./scripts/build.sh PHASES: ✅ Phase 1: Project YAML + blocklist (implemented) ✅ Phase 2: Org config + hierarchy (implemented) 📋 Phase 3: Mid-session approval (spec ready, not implemented) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Add validation to reject bare wildcards for security: - matches_pattern(): return False if pattern == '*' - validate_project_command(): reject name == '*' with clear error - Added 4 new tests for bare wildcard rejection This prevents a config with from matching every command, which would be a major security risk. Tests: 140 unit tests passing (added 4 bare wildcard tests)

coderabbitai

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

CLAUDE.md (1)
64-64: Add languages to fenced code blocks (MD040).
A few fences are missing language tags; adding bash, yaml, or text will satisfy markdownlint.
🛠️ Example fix pattern
-```
+```text
 ...
-```
+```
Also applies to: 88-88, 111-111, 123-123, 153-153, 1418-1418
security.py (1)
655-715: Clarify block reason and approval guidance.

blocked_commands includes hardcoded + dangerous entries, so “blocked at organization level” is misleading. Also, the error text suggests mid‑session approval even though it’s not implemented. Consider neutral wording or flagging it as “planned.”
🔧 Suggested fix
-                "reason": f"Command '{cmd}' is blocked at organization level and cannot be approved.",
+                "reason": f"Command '{cmd}' is blocked by policy and cannot be approved.",
             }
@@
-            error_msg += "  2. Request mid-session approval (the agent can ask)\n"
+            error_msg += "  2. Request mid-session approval (planned; not yet available)\n"

🤖 Fix all issues with AI agents

In `@examples/README.md`:
- Around line 23-27: The fenced code blocks in examples/README.md are missing
language identifiers (MD040); update each triple-backtick block (including the
shown block and those at the ranges referenced: 122-152, 504-515, 509-515) to
include an appropriate language token (e.g., ```text, ```bash, or ```yaml) and
ensure each opening fence has a matching closing fence so markdownlint MD040 is
satisfied; search for the plain ``` blocks in README.md and replace them with
the correct fenced code labels for their content.

In `@PHASE3_SPEC.md`:
- Line 64: Several fenced code blocks in PHASE3_SPEC.md are missing language
identifiers; update each missing triple-backtick fence to include the
appropriate language token (e.g., text, bash, yaml, python, typescript) so
Markdown lint rule MD040 is satisfied. Locate the unlabeled fences indicated in
the review and for each decide the correct language context (examples: plain
output -> text, shell commands -> bash, config -> yaml, snippets ->
python/typescript) and add the language immediately after the opening ``` marker
for each block.
- Around line 1552-1567: Replace the emphasized "Total: 24-36 hours (3-4.5
days)" line under the "Estimated Timeline" section with a proper Markdown
heading (e.g., prepend one to three # characters) so it is a heading instead of
bold text; locate the line directly after the timeline table in PHASE3_SPEC.md
and update it to a heading-level that matches the document structure (for
example "### Total: 24-36 hours (3-4.5 days)").

In `@security.py`:
- Around line 358-396: The path-matching branch in matches_pattern only handles
patterns starting with "./" or "../", so patterns like "scripts/test.sh" never
match; change the condition to treat any pattern containing a "/" as a path
pattern (e.g., if "/" in pattern) and keep the existing behavior: extract
pattern_name via os.path.basename(pattern) and return True when command equals
pattern, equals pattern_name, or endswith("/" + pattern_name); leave the
existing wildcard and exact-match checks (and the "*" guard) intact.
- Around line 399-458: load_org_config currently validates allowed_commands
entries only for dict presence and "name" key, allowing non-string or empty
names to propagate into matches_pattern; update load_org_config so that inside
the allowed_commands loop (the block referencing "allowed" and iterating "for
cmd in allowed") you reject any entry where cmd["name"] is not a non-empty
string (e.g., not isinstance(cmd["name"], str) or cmd["name"].strip() == ""),
returning None for an invalid config; keep other existing structure checks
(dict/list) and ensure the change prevents non-string names from reaching
matches_pattern.

In `@test_security_integration.py`:
- Around line 201-241: The test mutates HOME directly and may skip restoration
if bash_security_hook raises; wrap the environment changes in a safe context
manager or use pytest's monkeypatch to set and restore HOME (and on Windows also
USERPROFILE, HOMEDRIVE, HOMEPATH) for the scope of the test; update the block
around tempfile.TemporaryDirectory() and the call to bash_security_hook (and the
similar block in test_org_allowlist_inheritance) to use a try/finally or a
contextmanager that saves original env values and restores them regardless of
exceptions, referencing the test function and the bash_security_hook call to
locate where to apply the change.

examples/README.md

PHASE3_SPEC.md

security.py

test_security_integration.py

Changes: - Increase command limit from 50 to 100 per project - Add examples/OPTIMIZE_CONFIG.md with optimization strategies - Update all documentation references (50 → 100) - Update tests for new limit Rationale: - 50 was too restrictive for projects with many tools (Flutter, etc.) - Users were unknowingly exceeding limit by listing subcommands - 100 provides headroom while maintaining security - New guide teaches wildcard optimization (flutter* vs listing each subcommand) UI feedback idea: Show command count and optimization suggestions (tracked for Phase 3 or future enhancement)

Changes: - Support path patterns without ./ prefix (e.g., 'scripts/test.sh') - Reject non-string or empty command names in org config - Add 8 new test cases (5 for path patterns, 3 for validation) Details: - matches_pattern() now treats any pattern with '/' as a path pattern - load_org_config() validates that cmd['name'] is a non-empty string - All 148 unit tests + 9 integration tests passing Security hardening: Prevents invalid command names from reaching pattern matching logic, reducing attack surface.

Changes: - Add temporary_home() context manager for safe HOME manipulation - Handle both Unix (HOME) and Windows (USERPROFILE, HOMEDRIVE, HOMEPATH) - Update test_org_blocklist_enforcement to use context manager - Update test_org_allowlist_inheritance to use context manager Benefits: - Environment variables always restored, even on exceptions - Prevents test pollution across test runs - Cross-platform compatibility (Windows + Unix) All 9 integration tests passing.

coderabbitai

Actionable comments posted: 6

🤖 Fix all issues with AI agents

In `@CLAUDE.md`:
- Around line 241-247: Remove the hard-coded test counts next to the test
commands—specifically the "(136 tests - fast)" and "(9 tests - uses real hooks)"
annotations that follow the lines containing "python test_security.py" and
"python test_security_integration.py"; either delete the counts entirely or
replace them with a stable descriptor like "(unit tests)" and "(integration
tests - uses real hooks)" or a note that counts may change and to run the
commands to see current results.

In `@examples/OPTIMIZE_CONFIG.md`:
- Around line 111-117: The example uses a glob './scripts/*.sh' which the
current matcher/path-pattern logic does not support; fix by either updating the
example to list explicit script filenames instead of a glob (replace
'./scripts/*.sh' with the actual script names) or implement glob support in the
matcher by extending the path-pattern logic to recognize and expand patterns
like '*.sh' (update the matcher/path-pattern matching function to handle
wildcards and directory globs so './scripts/*.sh' will match files under
scripts/).
- Line 119: The bold-only emphasis line "**Reduced from 65 → 15 commands (77%
reduction!)**" should be converted to a real Markdown heading to satisfy MD036;
replace the bold markers with an appropriate heading level (e.g., "### Reduced
from 65 → 15 commands (77% reduction!)" or "## ...") and remove the surrounding
** so the text becomes a proper heading rather than bold-only emphasis.
- Around line 194-218: The fenced UI mockup blocks (the ones starting with "⚠️ 
Config Optimization Available" and "📊 Command Usage Stats") are missing
language identifiers which triggers MD040; add explicit language tags (e.g.,
```text) to each fenced code block so the fences are ```text ... ``` rather than
plain ``` to satisfy the linter and preserve formatting for the UI mockups.

In `@examples/README.md`:
- Around line 437-466: Update the README testing section under the "Unit Tests"
and "Integration Tests" headings: remove or update the stale numeric test count
("136 tests - fast") and change the "50 command limit" text to reflect the
current 100-command cap (e.g., "100 command limit" or simply "command limit
(100)"); alternatively drop exact test counts altogether to avoid future drift.
Locate the strings "Unit Tests" / "Integration Tests" in the diff and update the
lines describing the test counts and the command limit accordingly.

In `@test_security.py`:
- Around line 413-652: The tests fail on Windows because they only set HOME and
manually restore it; create a reusable temporary_home context manager (e.g.,
temporary_home) that sets HOME plus Windows equivalents USERPROFILE, HOMEDRIVE
and HOMEPATH to the temp dir and reliably restores original values in a
try/finally (or contextmanager) to avoid leaks on exceptions, then replace the
inline os.environ manipulation inside test_org_config_loading,
test_hierarchy_resolution, and test_org_blocklist_enforcement to use this
context (wrapping the whole test body) so Path.home() and Windows lookups pick
up the temp directory consistently.

coderabbitai · 2026-01-22T15:43:52Z

CLAUDE.md

+```bash
+# Unit tests (136 tests - fast)
+python test_security.py
+
+# Integration tests (9 tests - uses real hooks)
+python test_security_integration.py
+```


⚠️ Potential issue | 🟡 Minor

Avoid hard‑coding test counts in docs.
The counts can drift quickly; consider removing the numbers or ensuring they’re updated alongside test changes.

📝 Suggested doc adjustment

-# Unit tests (136 tests - fast) +# Unit tests (fast) ... -# Integration tests (9 tests - uses real hooks) +# Integration tests (uses real hooks)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

```bash

# Unit tests (136 tests - fast)

python test_security.py

# Integration tests (9 tests - uses real hooks)

python test_security_integration.py

```

# Unit tests (fast)

python test_security.py

# Integration tests (uses real hooks)

python test_security_integration.py

🤖 Prompt for AI Agents

In `@CLAUDE.md` around lines 241 - 247, Remove the hard-coded test counts next to the test commands—specifically the "(136 tests - fast)" and "(9 tests - uses real hooks)" annotations that follow the lines containing "python test_security.py" and "python test_security_integration.py"; either delete the counts entirely or replace them with a stable descriptor like "(unit tests)" and "(integration tests - uses real hooks)" or a note that counts may change and to run the commands to see current results.

coderabbitai · 2026-01-22T15:43:52Z

examples/OPTIMIZE_CONFIG.md

+  - name: ios-deploy
+    description: Deploy to iOS devices
+
+  # Project scripts
+  - name: ./scripts/*.sh
+    description: All project build/test scripts
+```


⚠️ Potential issue | 🟡 Minor

Documented script globbing isn’t supported by the matcher.
./scripts/*.sh won’t match with the current path‑pattern logic (exact name only). Either update the example to explicit script names or implement glob support.

📝 Suggested doc adjustment

- - name: ./scripts/*.sh - description: All project build/test scripts + - name: ./scripts/build.sh + description: Build script (repeat per script; globbing not supported yet)

🤖 Prompt for AI Agents

In `@examples/OPTIMIZE_CONFIG.md` around lines 111 - 117, The example uses a glob './scripts/*.sh' which the current matcher/path-pattern logic does not support; fix by either updating the example to list explicit script filenames instead of a glob (replace './scripts/*.sh' with the actual script names) or implement glob support in the matcher by extending the path-pattern logic to recognize and expand patterns like '*.sh' (update the matcher/path-pattern matching function to handle wildcards and directory globs so './scripts/*.sh' will match files under scripts/).

coderabbitai · 2026-01-22T15:43:52Z

examples/OPTIMIZE_CONFIG.md

+    description: All project build/test scripts
+```
+
+**Reduced from 65 → 15 commands (77% reduction!)**


⚠️ Potential issue | 🟡 Minor

Convert bold-only emphasis to a real heading (MD036).

📝 Suggested doc adjustment

-**Reduced from 65 → 15 commands (77% reduction!)** +#### Reduced from 65 → 15 commands (77% reduction!)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

**Reduced from 65 → 15 commands (77% reduction!)**

#### Reduced from 65 → 15 commands (77% reduction!)

🧰 Tools

🪛 markdownlint-cli2 (0.18.1)

119-119: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

🤖 Prompt for AI Agents

In `@examples/OPTIMIZE_CONFIG.md` at line 119, The bold-only emphasis line "**Reduced from 65 → 15 commands (77% reduction!)**" should be converted to a real Markdown heading to satisfy MD036; replace the bold markers with an appropriate heading level (e.g., "### Reduced from 65 → 15 commands (77% reduction!)" or "## ...") and remove the surrounding ** so the text becomes a proper heading rather than bold-only emphasis.

coderabbitai · 2026-01-22T15:43:52Z

examples/OPTIMIZE_CONFIG.md

+```
+⚠️  Config Optimization Available
+
+Your config has 65 commands. We detected opportunities to reduce it:
+
+• 25 flutter commands → Use flutter* (saves 24 entries)
+• 8 dart commands → Use dart* (saves 7 entries)
+• 12 adb commands → Use adb* (saves 11 entries)
+
+[Optimize Automatically] [Keep As-Is]
+
+Potential reduction: 65 → 23 commands
+```
+
+Or during config editing:
+
+```
+📊 Command Usage Stats
+
+flutter*        : 25 subcommands detected
+dart*           : 8 subcommands detected
+./scripts/*.sh  : 7 scripts detected
+
+💡 Tip: Using wildcards covers all subcommands automatically
+```


⚠️ Potential issue | 🟡 Minor

Add language identifiers to fenced UI mockups (MD040).

📝 Suggested doc adjustment

-``` +```text ⚠️ Config Optimization Available ... Potential reduction: 65 → 23 commands -``` +``` ... -``` +```text 📊 Command Usage Stats ... 💡 Tip: Using wildcards covers all subcommands automatically -``` +```

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

```

⚠️ Config Optimization Available

Your config has 65 commands. We detected opportunities to reduce it:

• 25 flutter commands → Use flutter* (saves 24 entries)

• 8 dart commands → Use dart* (saves 7 entries)

• 12 adb commands → Use adb* (saves 11 entries)

[Optimize Automatically] [Keep As-Is]

Potential reduction: 65 → 23 commands

```

Or during config editing:

```

📊 Command Usage Stats

flutter* : 25 subcommands detected

dart* : 8 subcommands detected

./scripts/*.sh : 7 scripts detected

💡 Tip: Using wildcards covers all subcommands automatically

```

🧰 Tools

🪛 markdownlint-cli2 (0.18.1)

194-194: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

210-210: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

In `@examples/OPTIMIZE_CONFIG.md` around lines 194 - 218, The fenced UI mockup blocks (the ones starting with "⚠️ Config Optimization Available" and "📊 Command Usage Stats") are missing language identifiers which triggers MD040; add explicit language tags (e.g., ```text) to each fenced code block so the fences are ```text ... ``` rather than plain ``` to satisfy the linter and preserve formatting for the UI mockups.

coderabbitai · 2026-01-22T15:43:52Z

examples/README.md

+**Unit Tests** (136 tests - fast):
+```bash
+source venv/bin/activate
+python test_security.py
+```
+
+Tests:
+- Pattern matching (exact, wildcards, scripts)
+- YAML loading and validation
+- Blocklist enforcement
+- Project and org config hierarchy
+- All existing security validations
+
+**Integration Tests** (9 tests - uses real security hooks):
+```bash
+source venv/bin/activate
+python test_security_integration.py
+```
+
+Tests:
+- Blocked commands are rejected (sudo, shutdown, etc.)
+- Default commands work (ls, git, npm, etc.)
+- Non-allowed commands are blocked (wget, python, etc.)
+- Project config allows commands (swift, xcodebuild, etc.)
+- Pattern matching works (swift* matches swiftlint)
+- Org blocklist cannot be overridden
+- Org allowlist is inherited by projects
+- Invalid YAML is safely ignored
+- 50 command limit is enforced
+


⚠️ Potential issue | 🟡 Minor

Update testing section to reflect the 100‑command limit and avoid stale counts.
The testing section still calls out “136 tests” and a “50 command limit,” which no longer matches the 100‑command cap described elsewhere. Consider removing exact counts or updating them, and fix the limit text.

📝 Suggested doc update

-**Unit Tests** (136 tests - fast): +**Unit Tests** (fast): ... -- 50 command limit is enforced +- 100 command limit is enforced

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

**Unit Tests** (136 tests - fast):

```bash

source venv/bin/activate

python test_security.py

```

Tests:

- Pattern matching (exact, wildcards, scripts)

- YAML loading and validation

- Blocklist enforcement

- Project and org config hierarchy

- All existing security validations

**Integration Tests** (9 tests - uses real security hooks):

```bash

source venv/bin/activate

python test_security_integration.py

```

Tests:

- Blocked commands are rejected (sudo, shutdown, etc.)

- Default commands work (ls, git, npm, etc.)

- Non-allowed commands are blocked (wget, python, etc.)

- Project config allows commands (swift, xcodebuild, etc.)

- Pattern matching works (swift* matches swiftlint)

- Org blocklist cannot be overridden

- Org allowlist is inherited by projects

- Invalid YAML is safely ignored

- 50 command limit is enforced

**Unit Tests** (fast):

🤖 Prompt for AI Agents

In `@examples/README.md` around lines 437 - 466, Update the README testing section under the "Unit Tests" and "Integration Tests" headings: remove or update the stale numeric test count ("136 tests - fast") and change the "50 command limit" text to reflect the current 100-command cap (e.g., "100 command limit" or simply "command limit (100)"); alternatively drop exact test counts altogether to avoid future drift. Locate the strings "Unit Tests" / "Integration Tests" in the diff and update the lines describing the test counts and the command limit accordingly.

coderabbitai · 2026-01-22T15:43:52Z

test_security.py

+def test_org_config_loading():
+    """Test organization-level config loading."""
+    print("\nTesting org config loading:\n")
+    passed = 0
+    failed = 0
+
+    # Save original org config path
+    original_home = Path.home()
+
+    with tempfile.TemporaryDirectory() as tmpdir:
+        # Temporarily override home directory for testing
+        import os
+        os.environ["HOME"] = tmpdir
+
+        org_dir = Path(tmpdir) / ".autocoder"
+        org_dir.mkdir()
+        org_config_path = org_dir / "config.yaml"
+
+        # Test 1: Valid org config
+        org_config_path.write_text("""version: 1
+allowed_commands:
+  - name: jq
+    description: JSON processor
+blocked_commands:
+  - aws
+  - kubectl
+""")
+        config = load_org_config()
+        if config and config["version"] == 1:
+            if len(config["allowed_commands"]) == 1 and len(config["blocked_commands"]) == 2:
+                print("  PASS: Load valid org config")
+                passed += 1
+            else:
+                print("  FAIL: Load valid org config (wrong counts)")
+                failed += 1
+        else:
+            print("  FAIL: Load valid org config")
+            print(f"         Got: {config}")
+            failed += 1
+
+        # Test 2: Missing file returns None
+        org_config_path.unlink()
+        config = load_org_config()
+        if config is None:
+            print("  PASS: Missing org config returns None")
+            passed += 1
+        else:
+            print("  FAIL: Missing org config returns None")
+            failed += 1
+
+        # Test 3: Non-string command name is rejected
+        org_config_path.write_text("""version: 1
+allowed_commands:
+  - name: 123
+    description: Invalid numeric name
+""")
+        config = load_org_config()
+        if config is None:
+            print("  PASS: Non-string command name rejected")
+            passed += 1
+        else:
+            print("  FAIL: Non-string command name rejected")
+            print(f"         Got: {config}")
+            failed += 1
+
+        # Test 4: Empty command name is rejected
+        org_config_path.write_text("""version: 1
+allowed_commands:
+  - name: ""
+    description: Empty name
+""")
+        config = load_org_config()
+        if config is None:
+            print("  PASS: Empty command name rejected")
+            passed += 1
+        else:
+            print("  FAIL: Empty command name rejected")
+            print(f"         Got: {config}")
+            failed += 1
+
+        # Test 5: Whitespace-only command name is rejected
+        org_config_path.write_text("""version: 1
+allowed_commands:
+  - name: "   "
+    description: Whitespace name
+""")
+        config = load_org_config()
+        if config is None:
+            print("  PASS: Whitespace-only command name rejected")
+            passed += 1
+        else:
+            print("  FAIL: Whitespace-only command name rejected")
+            print(f"         Got: {config}")
+            failed += 1
+
+        # Restore HOME
+        os.environ["HOME"] = str(original_home)
+
+    return passed, failed
+
+
+def test_hierarchy_resolution():
+    """Test command hierarchy resolution."""
+    print("\nTesting hierarchy resolution:\n")
+    passed = 0
+    failed = 0
+
+    with tempfile.TemporaryDirectory() as tmphome:
+        with tempfile.TemporaryDirectory() as tmpproject:
+            # Setup fake home directory
+            import os
+            original_home = os.environ.get("HOME")
+            os.environ["HOME"] = tmphome
+
+            org_dir = Path(tmphome) / ".autocoder"
+            org_dir.mkdir()
+            org_config_path = org_dir / "config.yaml"
+
+            # Create org config with allowed and blocked commands
+            org_config_path.write_text("""version: 1
+allowed_commands:
+  - name: jq
+    description: JSON processor
+  - name: python3
+    description: Python interpreter
+blocked_commands:
+  - terraform
+  - kubectl
+""")
+
+            project_dir = Path(tmpproject)
+            project_autocoder = project_dir / ".autocoder"
+            project_autocoder.mkdir()
+            project_config = project_autocoder / "allowed_commands.yaml"
+
+            # Create project config
+            project_config.write_text("""version: 1
+commands:
+  - name: swift
+    description: Swift compiler
+""")
+
+            # Test 1: Org allowed commands are included
+            allowed, blocked = get_effective_commands(project_dir)
+            if "jq" in allowed and "python3" in allowed:
+                print("  PASS: Org allowed commands included")
+                passed += 1
+            else:
+                print("  FAIL: Org allowed commands included")
+                print(f"         jq in allowed: {'jq' in allowed}")
+                print(f"         python3 in allowed: {'python3' in allowed}")
+                failed += 1
+
+            # Test 2: Org blocked commands are in blocklist
+            if "terraform" in blocked and "kubectl" in blocked:
+                print("  PASS: Org blocked commands in blocklist")
+                passed += 1
+            else:
+                print("  FAIL: Org blocked commands in blocklist")
+                failed += 1
+
+            # Test 3: Project commands are included
+            if "swift" in allowed:
+                print("  PASS: Project commands included")
+                passed += 1
+            else:
+                print("  FAIL: Project commands included")
+                failed += 1
+
+            # Test 4: Global commands are included
+            if "npm" in allowed and "git" in allowed:
+                print("  PASS: Global commands included")
+                passed += 1
+            else:
+                print("  FAIL: Global commands included")
+                failed += 1
+
+            # Test 5: Hardcoded blocklist cannot be overridden
+            if "sudo" in blocked and "shutdown" in blocked:
+                print("  PASS: Hardcoded blocklist enforced")
+                passed += 1
+            else:
+                print("  FAIL: Hardcoded blocklist enforced")
+                failed += 1
+
+            # Restore HOME
+            if original_home:
+                os.environ["HOME"] = original_home
+            else:
+                del os.environ["HOME"]
+
+    return passed, failed
+
+
+def test_org_blocklist_enforcement():
+    """Test that org-level blocked commands cannot be used."""
+    print("\nTesting org blocklist enforcement:\n")
+    passed = 0
+    failed = 0
+
+    with tempfile.TemporaryDirectory() as tmphome:
+        with tempfile.TemporaryDirectory() as tmpproject:
+            # Setup fake home directory
+            import os
+            original_home = os.environ.get("HOME")
+            os.environ["HOME"] = tmphome
+
+            org_dir = Path(tmphome) / ".autocoder"
+            org_dir.mkdir()
+            org_config_path = org_dir / "config.yaml"
+
+            # Create org config that blocks terraform
+            org_config_path.write_text("""version: 1
+blocked_commands:
+  - terraform
+""")
+
+            project_dir = Path(tmpproject)
+            project_autocoder = project_dir / ".autocoder"
+            project_autocoder.mkdir()
+
+            # Try to use terraform (should be blocked)
+            input_data = {"tool_name": "Bash", "tool_input": {"command": "terraform apply"}}
+            context = {"project_dir": str(project_dir)}
+            result = asyncio.run(bash_security_hook(input_data, context=context))
+
+            if result.get("decision") == "block":
+                print("  PASS: Org blocked command 'terraform' rejected")
+                passed += 1
+            else:
+                print("  FAIL: Org blocked command 'terraform' should be rejected")
+                failed += 1
+
+            # Restore HOME
+            if original_home:
+                os.environ["HOME"] = original_home
+            else:
+                del os.environ["HOME"]
+
+    return passed, failed


⚠️ Potential issue | 🟡 Minor

Make org‑config tests portable by isolating HOME (and Windows vars).
These tests only set HOME, but on Windows Path.home() uses USERPROFILE/HOMEDRIVE/HOMEPATH, so the temp config won’t be picked up. Also, manual restore can leak on exceptions. Consider a shared temporary_home() context (like the integration tests) and wrap these blocks.

🔧 Example refactor for one test (apply similarly)

- # Save original org config path - original_home = Path.home() - - with tempfile.TemporaryDirectory() as tmpdir: - # Temporarily override home directory for testing - import os - os.environ["HOME"] = tmpdir + with tempfile.TemporaryDirectory() as tmpdir: + # Temporarily override home directory for testing (cross‑platform) + with temporary_home(tmpdir): org_dir = Path(tmpdir) / ".autocoder" org_dir.mkdir() org_config_path = org_dir / "config.yaml" ... - # Restore HOME - os.environ["HOME"] = str(original_home)

🤖 Prompt for AI Agents

In `@test_security.py` around lines 413 - 652, The tests fail on Windows because they only set HOME and manually restore it; create a reusable temporary_home context manager (e.g., temporary_home) that sets HOME plus Windows equivalents USERPROFILE, HOMEDRIVE and HOMEPATH to the temp dir and reliably restores original values in a try/finally (or contextmanager) to avoid leaks on exceptions, then replace the inline os.environ manipulation inside test_org_config_loading, test_hierarchy_resolution, and test_org_blocklist_enforcement to use this context (wrapping the whole test body) so Path.home() and Windows lookups pick up the temp directory consistently.

leonvanzyl · 2026-01-23T10:18:35Z

Thank you!

Adds comprehensive test coverage for Keyboard Navigation feature and enhances CSS accessibility support. Test Files: - test_feature_80_keyboard_navigation.py: 43 unit tests for keyboard nav - verify_feature_80.py: Feature verification script CSS Enhancements (globals.css): - High contrast mode support (prefers-contrast: high) - Windows forced-colors mode support (forced-colors: active) - More/less contrast preferences (prefers-contrast: more/less) - Enhanced reduced motion support (prefers-reduced-motion: reduce) - Status indicator patterns for color-blind accessibility All 43 tests pass for Feature AutoForgeAI#80. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…ands Per-Project Bash Command Allowlist System

coderabbitai bot reviewed Jan 22, 2026

View reviewed changes

ipodishima and others added 2 commits January 22, 2026 12:29

ipodishima force-pushed the feature/custom-commands branch from 1786c9d to d1dac13 Compare January 22, 2026 11:52

coderabbitai bot reviewed Jan 22, 2026

View reviewed changes

examples/README.md Outdated Show resolved Hide resolved

PHASE3_SPEC.md Show resolved Hide resolved

PHASE3_SPEC.md Show resolved Hide resolved

security.py Show resolved Hide resolved

security.py Show resolved Hide resolved

test_security_integration.py Outdated Show resolved Hide resolved

ipodishima added 3 commits January 22, 2026 13:29

coderabbitai bot reviewed Jan 22, 2026

View reviewed changes

leonvanzyl merged commit 1fe4773 into AutoForgeAI:master Jan 23, 2026
3 checks passed

This was referenced Jan 27, 2026

fix: Security vulnerabilities and race conditions from code review #112

Open

fix: resolve critical bugs causing agent failures #119

Closed

Merge 15+ PRs: Security, Windows fixes, Knowledge files, Chat persistence, and more #117

Open

CoreAspectStu pushed a commit to CoreAspectStu/autocoder-custom that referenced this pull request Feb 9, 2026

Merge pull request AutoForgeAI#80 from ipodishima/feature/custom-comm…

e87ccf7

…ands Per-Project Bash Command Allowlist System

	Reduced from 65 → 15 commands (77% reduction!)
	#### Reduced from 65 → 15 commands (77% reduction!)

Comments

Conversation

ipodishima commented Jan 22, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

The Problem

The Solution

Project-Level Config

Organization-Level Config

What's Implemented

Security Hierarchy

Key Features

Testing

Documentation

Files Changed

Why Phase 3 Isn't Included

Try It

Impact

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

leonvanzyl commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ipodishima commented Jan 22, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 22, 2026 •

edited

Loading