PentesterPro Architecture Deep Dive
ExplorationOrchestrator
├── ExplorationOrchestratorBase (main_base.py) - Init: memory, scheduler, intake, decision, worker_manager, lifecycle, frontier, session_memory
├── ExplorationOrchestratorScan (main_scan.py) - start_scan(), _create_seed(), _start_brute_force()
└── ExplorationOrchestratorLoop (main_loop.py) - run_loop(), _update_role_state(), _tick(), _dispatch_job(), _handle_worker_result()
Tight while-loop with 0.1s sleep
Checks lifecycle.should_stop() (queue empty + no active workers)
_update_role_state(): pops next role from role_sequence deque, replays deferred auth transitions
_tick(): max_workers=1, asks scheduler for next task, calls _dispatch_job()
_dispatch_job(): creates WorkerTask via WorkerManager, submits to ThreadPoolExecutor, registers callback
scheduler.decide_next_task() returns (node_id, node_type)
worker_manager.create_task() claims node in memory + frontier
executor.submit(worker_manager.execute_task(task)) runs in thread
Callback _handle_worker_result() processes result: feeds discoveries to intake, updates scheduler state
Scheduler (Bridge Pattern)
ExplorationScheduler wraps LayeredQueueManager + QueueScheduler
Maps between orchestrator's EntityType and queue's DiscoveryType
Internal _tasks dict tracks all task records
decide_next_task() fetches worker state from memory for state-aware scheduling
InteractiveWorker
├── InteractiveWorkerBase - Driver pool management, session hydration
├── InteractiveWorkerExecution - execute_task(), _perform_single_action(), _run_observers()
├── InteractiveWorkerDiscovery - harvest_driver_discoveries(), _drain_driver_discoveries()
├── InteractiveWorkerCandidates - filter_candidate_scope(), _is_candidate_eligible(), _action_order_key()
└── InteractiveWorkerProbe - _is_probe_safe(), _compute_mutation_marker(), _classify_redirect()
WorkerManager (Orchestrator-side)
WorkerManager
├── WorkerManagerBase - Holds InteractiveWorker instance
├── WorkerManagerTasks - create_task(): claims node, builds ContextSlice, applies policy filters
├── WorkerManagerExecution - execute_task(): Two-phase OBSERVE->EXECUTE, Frontier Freeze barrier
└── WorkerManagerResults - process_result(): feeds discoveries, updates frontier/memory/session
OBSERVE : Navigate, snapshot DOM, run observers (DOM/JS/Network), build state metadata, compute state hash. No side effects.
Frontier Freeze (in WorkerManagerExecution): Register state in frontier, compute allowed actions via frontier.plan_execution(), filter redundant navigations + already-executed + dominated states
EXECUTE : Iterate allowed_actions in form-grouped order. For each action: generate input (form_filler), execute via driver, capture post-state, check for state transitions. Stop on URL/state change.
Invariant 2: No Cross-State Execution (state hash must match expected)
Invariant 4: Closed Action Set (EXECUTE only runs planned actions)
Invariant 5: Stop-on-Transition (URL or state change breaks EXECUTE loop)
BrowserUseDriver (driver/base.py)
Wraps browser-use library (Browser + BrowserSession)
Runs its own asyncio event loop in a daemon thread
_run_async() bridges sync calls to the async loop via run_coroutine_threadsafe
Key methods: start(), navigate(), get_dom_snapshot(), execute_action(), take_screenshot()
CDP hooks: Network.responseReceived (captures API discoveries), Page.frameNavigated (scope guard), Page.navigatedWithinDocument (SPA tracking)
last_selector_map: maps action_id -> DOM element node (refreshed every snapshot)
DOM Snapshot (driver/snapshot.py)
get_dom_snapshot(): calls browser.get_browser_state_summary(), enriches inputs, detects modals, supplements sparse DOMs
Modal detection via JS: checks for [role='dialog'], .modal, etc.
Hybrid DOM: if element count < 50 or modal detected, runs supplementary JS to discover hidden elements
Returns: {url, title, llm_representation, selector_map, screenshot, is_modal, response_status}
XPath-based element location
Unified JS scripts for click, fill, TomSelect handling
Falls back to ID-based xpath if element has id attribute
Special handling for Strategy clicks (direct URL navigation)
Exploration Frontier (exploration_frontier.py)
Layer 1: Atomic URL Queue (pending/claimed/completed per role)
Layer 2: Stateful Frontier Expansion (FrontierKey = role+url+state_hash)
Layer 3: Action Exhaustion (per-state executed action tracking)
Layer 4: Pruning Proofs (audit trail for skipped actions)
Dominance detection: state A dominates B if A's actions ⊇ B's actions
plan_execution(): the Frontier Freeze implementation - deduplicates, prunes redundant navigations, orders by priority
Intake Manager (intake.py)
Gateway for all discovery events
Scope gating -> URL normalization -> Resource reclassification (JS/CSS/etc) -> Frontier dedup -> Action saturation -> Memory storage -> Scheduler enqueue
Fan-out guard: samples after 20 discoveries from same parent (search result containment)
Cross-phase optimization: STATE discovery marks corresponding PAGE as completed
RAM-level cross-task dedup: (role, page_state_hash) -> set of executed action signatures
Tracks exhausted page states, explored URLs, deferred auth transitions
Thread-safe via Lock