PentesterPro Architecture Deep Dive

Orchestrator

Class Hierarchy

ExplorationOrchestrator
  ├── ExplorationOrchestratorBase  (main_base.py) - Init: memory, scheduler, intake, decision, worker_manager, lifecycle, frontier, session_memory
  ├── ExplorationOrchestratorScan  (main_scan.py) - start_scan(), _create_seed(), _start_brute_force()
  └── ExplorationOrchestratorLoop  (main_loop.py) - run_loop(), _update_role_state(), _tick(), _dispatch_job(), _handle_worker_result()

Main Loop (`run_loop`)

Tight while-loop with 0.1s sleep
Checks lifecycle.should_stop() (queue empty + no active workers)
_update_role_state(): pops next role from role_sequence deque, replays deferred auth transitions
_tick(): max_workers=1, asks scheduler for next task, calls _dispatch_job()
_dispatch_job(): creates WorkerTask via WorkerManager, submits to ThreadPoolExecutor, registers callback

Task Lifecycle

scheduler.decide_next_task() returns (node_id, node_type)
worker_manager.create_task() claims node in memory + frontier
executor.submit(worker_manager.execute_task(task)) runs in thread
Callback _handle_worker_result() processes result: feeds discoveries to intake, updates scheduler state

Scheduler (Bridge Pattern)

ExplorationScheduler wraps LayeredQueueManager + QueueScheduler
Maps between orchestrator's EntityType and queue's DiscoveryType
Internal _tasks dict tracks all task records
decide_next_task() fetches worker state from memory for state-aware scheduling

Worker

Class Hierarchy

InteractiveWorker
  ├── InteractiveWorkerBase       - Driver pool management, session hydration
  ├── InteractiveWorkerExecution  - execute_task(), _perform_single_action(), _run_observers()
  ├── InteractiveWorkerDiscovery  - harvest_driver_discoveries(), _drain_driver_discoveries()
  ├── InteractiveWorkerCandidates - filter_candidate_scope(), _is_candidate_eligible(), _action_order_key()
  └── InteractiveWorkerProbe      - _is_probe_safe(), _compute_mutation_marker(), _classify_redirect()

WorkerManager (Orchestrator-side)

WorkerManager
  ├── WorkerManagerBase      - Holds InteractiveWorker instance
  ├── WorkerManagerTasks     - create_task(): claims node, builds ContextSlice, applies policy filters
  ├── WorkerManagerExecution - execute_task(): Two-phase OBSERVE->EXECUTE, Frontier Freeze barrier
  └── WorkerManagerResults   - process_result(): feeds discoveries, updates frontier/memory/session

Two-Phase Execution

OBSERVE: Navigate, snapshot DOM, run observers (DOM/JS/Network), build state metadata, compute state hash. No side effects.
Frontier Freeze (in WorkerManagerExecution): Register state in frontier, compute allowed actions via frontier.plan_execution(), filter redundant navigations + already-executed + dominated states
EXECUTE: Iterate allowed_actions in form-grouped order. For each action: generate input (form_filler), execute via driver, capture post-state, check for state transitions. Stop on URL/state change.

Key Invariants

Invariant 2: No Cross-State Execution (state hash must match expected)
Invariant 4: Closed Action Set (EXECUTE only runs planned actions)
Invariant 5: Stop-on-Transition (URL or state change breaks EXECUTE loop)

Browser Driver

BrowserUseDriver (`driver/base.py`)

Wraps browser-use library (Browser + BrowserSession)
Runs its own asyncio event loop in a daemon thread
_run_async() bridges sync calls to the async loop via run_coroutine_threadsafe
Key methods: start(), navigate(), get_dom_snapshot(), execute_action(), take_screenshot()
CDP hooks: Network.responseReceived (captures API discoveries), Page.frameNavigated (scope guard), Page.navigatedWithinDocument (SPA tracking)
last_selector_map: maps action_id -> DOM element node (refreshed every snapshot)

DOM Snapshot (`driver/snapshot.py`)

get_dom_snapshot(): calls browser.get_browser_state_summary(), enriches inputs, detects modals, supplements sparse DOMs
Modal detection via JS: checks for [role='dialog'], .modal, etc.
Hybrid DOM: if element count < 50 or modal detected, runs supplementary JS to discover hidden elements
Returns: {url, title, llm_representation, selector_map, screenshot, is_modal, response_status}

Action Execution

XPath-based element location
Unified JS scripts for click, fill, TomSelect handling
Falls back to ID-based xpath if element has id attribute
Special handling for Strategy clicks (direct URL navigation)

Exploration Frontier (`exploration_frontier.py`)

Layer 1: Atomic URL Queue (pending/claimed/completed per role)
Layer 2: Stateful Frontier Expansion (FrontierKey = role+url+state_hash)
Layer 3: Action Exhaustion (per-state executed action tracking)
Layer 4: Pruning Proofs (audit trail for skipped actions)
Dominance detection: state A dominates B if A's actions ⊇ B's actions
plan_execution(): the Frontier Freeze implementation - deduplicates, prunes redundant navigations, orders by priority

Intake Manager (`intake.py`)

Gateway for all discovery events
Scope gating -> URL normalization -> Resource reclassification (JS/CSS/etc) -> Frontier dedup -> Action saturation -> Memory storage -> Scheduler enqueue
Fan-out guard: samples after 20 discoveries from same parent (search result containment)
Cross-phase optimization: STATE discovery marks corresponding PAGE as completed

Session Memory

RAM-level cross-task dedup: (role, page_state_hash) -> set of executed action signatures
Tracks exhausted page states, explored URLs, deferred auth transitions
Thread-safe via Lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PentesterPro Architecture Deep Dive

Orchestrator

Class Hierarchy

Main Loop (`run_loop`)

Task Lifecycle

Scheduler (Bridge Pattern)

Worker

Class Hierarchy

WorkerManager (Orchestrator-side)

Two-Phase Execution

Key Invariants

Browser Driver

BrowserUseDriver (`driver/base.py`)

DOM Snapshot (`driver/snapshot.py`)

Action Execution

Exploration Frontier (`exploration_frontier.py`)

Intake Manager (`intake.py`)

Session Memory

FilesExpand file tree

architecture.md

Latest commit

History

architecture.md

File metadata and controls

PentesterPro Architecture Deep Dive

Orchestrator

Class Hierarchy

Main Loop (run_loop)

Task Lifecycle

Scheduler (Bridge Pattern)

Worker

Class Hierarchy

WorkerManager (Orchestrator-side)

Two-Phase Execution

Key Invariants

Browser Driver

BrowserUseDriver (driver/base.py)

DOM Snapshot (driver/snapshot.py)

Action Execution

Exploration Frontier (exploration_frontier.py)

Intake Manager (intake.py)

Session Memory

Main Loop (`run_loop`)

BrowserUseDriver (`driver/base.py`)

DOM Snapshot (`driver/snapshot.py`)

Exploration Frontier (`exploration_frontier.py`)

Intake Manager (`intake.py`)