Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 20 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,15 +33,18 @@ Welcome to Vectorly's Web Hacker... **No API? No Problem!**
Define once. Reuse everywhere. Automate anything you can do in a browser.

Each Routine includes:

- **name** — a human-readable identifier
- **description** — what the Routine does
- **parameters** — input values the Routine needs to run (e.g. URLs, credentials, text)
- **operations** — the ordered browser actions that perform the automation

Example:

> Navigate to a dashboard, search based on keywords, and return results — all as a reusable Routine.

### Quickstart

<p align="center">
<video src="https://github.com/user-attachments/assets/ec47cfb2-5fb8-4726-b136-cb51df3c1b83" width="760" controls autoplay loop muted>
Video not supported? Watch the demo on YouTube: https://youtu.be/YFVQSuiaWmM
Expand All @@ -57,22 +60,21 @@ Example:
- **Parameter Validation**: Parameters support validation constraints such as `min_length`, `max_length`, `min_value`, `max_value`, `pattern` (regex), `enum_values`, and `format`.
- **Reserved Prefixes**: Parameter names cannot start with reserved prefixes: `sessionStorage`, `localStorage`, `cookie`, `meta`, `uuid`, `epoch_milliseconds`.


### Operations

Operations define the executable steps of a Routine. They are represented as a **typed list** (see [`RoutineOperationUnion`](https://github.com/VectorlyApp/web-hacker/blob/main/src/data_models/production_routine.py)) and are executed sequentially by a browser.

Each operation specifies a `type` and its parameters:

- **navigate** — open a URL in the browser.
- **navigate** — open a URL in the browser.
```json
{ "type": "navigate", "url": "https://example.com" }
```
- **sleep** — pause execution for a given duration (in seconds).
- **sleep** — pause execution for a given duration (in seconds).
```json
{ "type": "sleep", "timeout_seconds": 1.5 }
```
- **fetch** — perform an HTTP request defined by an `endpoint` object (method, URL, headers, body, credentials). Optionally, store the response under a `session_storage_key`.
- **fetch** — perform an HTTP request defined by an `endpoint` object (method, URL, headers, body, credentials). Optionally, store the response under a `session_storage_key`.
```json
{
"type": "fetch",
Expand All @@ -86,12 +88,13 @@ Each operation specifies a `type` and its parameters:
"session_storage_key": "userData"
}
```
- **return** — return the value previously stored under a `session_storage_key`.
- **return** — return the value previously stored under a `session_storage_key`.
```json
{ "type": "return", "session_storage_key": "userData" }
```

Example sequence:

```json
[
{ "type": "navigate", "url": "https://example.com/login" },
Expand All @@ -111,7 +114,6 @@ Example sequence:

This defines a deterministic flow: open → wait → authenticate → return a session token.


### Placeholder Interpolation `{{...}}`

Placeholders inside operation fields are resolved at runtime:
Expand Down Expand Up @@ -221,6 +223,7 @@ python quickstart.py
```

The quickstart script will:

1. ✅ Automatically launch Chrome in debug mode
2. 📊 Start browser monitoring (you perform actions)
3. 🤖 Discover routines from captured data
Expand Down Expand Up @@ -312,6 +315,7 @@ The reverse engineering process follows a simple three-step workflow:
Each step is detailed below. Start by ensuring Chrome is running in debug mode (see [Launch Chrome in Debug Mode](#launch-chrome-in-debug-mode-🐞) above).

### 0. Legal & Privacy Notice ⚠️

Reverse-engineering and automating a website can violate terms of service. Store captures securely and scrub any sensitive fields before sharing.

### 1. Monitor Browser While Performing Some Task
Expand Down Expand Up @@ -356,21 +360,24 @@ Use the **routine-discovery pipeline** to analyze captured data and synthesize a
> ⚠️ **Important:** You must specify your own `--task` parameter. The example below is just for demonstration—replace it with a description of what you want to automate.

**Linux/macOS (bash):**

```bash
web-hacker-discover \
--task "Recover API endpoints for searching for trains and their prices" \
--cdp-captures-dir ./cdp_captures \
--output-dir ./routine_discovery_output \
--llm-model gpt-5
--llm-model gpt-5.1
```

**Windows (PowerShell):**

```powershell
# Simple task (no quotes inside):
web-hacker-discover --task "Recover the API endpoints for searching for trains and their prices" --cdp-captures-dir ./cdp_captures --output-dir ./routine_discovery_output --llm-model gpt-5
```

**Example tasks:**

- `"recover the api endpoints for searching for trains and their prices"` (shown above)
- `"discover how to search for flights and get pricing"`
- `"find the API endpoint for user authentication"`
Expand Down Expand Up @@ -400,18 +407,20 @@ routine_discovery_output/
⚠️ **Important:** If you have a string-typed parameter used in a JSON body field, it may need to be escaped. When the agent generates routines, string parameters are sometimes placed as `"{{PARAM}}"` when they should be `"\"{{PARAM}}\""` to ensure proper JSON string escaping.

**Example:** If you see:

```json
"field": "{{paramName}}"
```

And `paramName` is a string parameter, manually change it to:

```json
"field": "\"{{paramName}}\""
```

This ensures the parameter value is properly quoted as a JSON string when substituted.

Run the example routine:
Run the example routine:

```bash
# Using a parameters file:
Expand Down Expand Up @@ -449,13 +458,13 @@ web-hacker-execute \

- Export the key in your shell or create a `.env` file and run via `uv run` (dotenv is loaded).
- `No such file or directory: './cdp_captures/network/transactions/N/A'` or similar transaction path errors

- The agent cannot find any network transactions relevant to your task. This usually means:

- The `--task` description doesn't match what you actually performed during monitoring
- The relevant network requests weren't captured (they may have been blocked or filtered)
- The task description is too vague or too specific

- **Fix:** Reword your `--task` parameter to more accurately describe what you did during the monitoring step, or re-run the browser monitor and ensure you perform the exact actions you want to automate.
- **Fix:** Reword your `--task` parameter to more accurately describe what you did during the monitoring step, or re-run the browser monitor and ensure you perform the exact actions you want to automate.

## Coming Soon 🔮

Expand All @@ -465,9 +474,6 @@ web-hacker-execute \
- Checkpointing progress and resumability

- Avoid re-running the entire discovery pipeline after exceptions; the agent will checkpoint progress and resume from the last successful stage.
- Context overflow management

- On detection of context overflow, the agent will checkpoint state, summarize findings, and spawn a continuation agent to proceed with discovery without losing context.
- Parameter resolution visibility

- During execution, show which placeholders (e.g., `{{sessionStorage:...}}`, `{{cookie:...}}`, `{{localStorage:...}}` resolved successfully and which failed
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
{
"name": "fix_placeholders_test_routine",
"description": "Test routine for fix_placeholders method.",
"operations": [
{
"type": "navigate",
"url": "https://example.com/users/\"{{user_id}}\"/profile"
},
{
"type": "fetch",
"endpoint": {
"url": "https://api.example.com/data?q=\"{{search_term}}\"",
"description": null,
"method": "POST",
"headers": {
"Content-Type": "application/json",
"X-User-Name": "\"{{user_name}}\"",
"X-User-Age": "{{user_age}}",
"X-Session-ID": "\"{{sessionStorage:session.id}}\"",
"X-Existing-Quote": "\"{{user_name}}\""
},
"body": {
"user": {
"name": "\"{{user_name}}\"",
"age": "{{user_age}}",
"role": "\"{{user_role}}\"",
"meta": "\"{{sessionStorage:meta.data}}\""
},
"query": "\"{{search_term}}\"",
"quoted_already": "\"{{search_term}}\""
},
"credentials": "same-origin"
},
"session_storage_key": "result"
},
{
"type": "return",
"session_storage_key": "result"
}
],
"parameters": [
{
"name": "user_id",
"type": "string",
"description": "User ID"
},
{
"name": "search_term",
"type": "string",
"description": "Search term"
},
{
"name": "user_name",
"type": "string",
"description": "User name"
},
{
"name": "user_age",
"type": "integer",
"description": "User age"
},
{
"name": "user_role",
"type": "enum",
"enum_values": ["admin", "user"],
"description": "User role"
}
]
}

68 changes: 68 additions & 0 deletions tests/data/input/production_routine/fix_placeholders_test.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
{
"name": "fix_placeholders_test_routine",
"description": "Test routine for fix_placeholders method.",
"operations": [
{
"type": "navigate",
"url": "https://example.com/users/{{user_id}}/profile"
},
{
"type": "fetch",
"endpoint": {
"url": "https://api.example.com/data?q={{search_term}}",
"method": "POST",
"headers": {
"Content-Type": "application/json",
"X-User-Name": "{{user_name}}",
"X-User-Age": "{{user_age}}",
"X-Session-ID": "{{sessionStorage:session.id}}",
"X-Existing-Quote": "\"{{user_name}}\""
},
"body": {
"user": {
"name": "{{user_name}}",
"age": "{{user_age}}",
"role": "{{user_role}}",
"meta": "{{sessionStorage:meta.data}}"
},
"query": "{{search_term}}",
"quoted_already": "\"{{search_term}}\""
}
},
"session_storage_key": "result"
},
{
"type": "return",
"session_storage_key": "result"
}
],
"parameters": [
{
"name": "user_id",
"type": "string",
"description": "User ID"
},
{
"name": "search_term",
"type": "string",
"description": "Search term"
},
{
"name": "user_name",
"type": "string",
"description": "User name"
},
{
"name": "user_age",
"type": "integer",
"description": "User age"
},
{
"name": "user_role",
"type": "enum",
"enum_values": ["admin", "user"],
"description": "User role"
}
]
}

Loading