Intelligent end-to-end test generation powered by GPT-4o - Describe test steps in natural language, let AI generate the Playwright code, then execute with confidence.
- π§ AI-Assisted Code Generation: GPT-4o converts natural language descriptions into executable Playwright code
- πΎ Smart Caching: Zero-cost reruns with intelligent code caching system
- π Configurable Retry Logic: Multi-attempt strategies with error context learning
- π¦ StepsPacks: Organize tests into reusable, isolated test suites with dedicated configuration
- π― Intelligent HTML Cleaning: Optimize context sent to AI by removing irrelevant HTML elements
- π Comprehensive Reporting: JSON and HTML reports with detailed token usage and cost tracking
- π§ Mock Mode: Debug workflows without API costs using simulated AI responses
- β‘ Flexible Strength Levels: Balance reliability vs. cost with onlycache/medium/high modes
- β Custom Expectations: Define validation rules per step with automatic error handling
- π Global Expectations: Apply common validation rules across all steps in a test suite
- What is E2EGen AI?
- Installation
- Quick Start
- Configuration
- CLI Options
- How It Works
- StepsPacks
- Expectations System
- Examples
- Cost Optimization
- Troubleshooting
- Contributing
E2EGen AI is an AI-assisted testing framework that bridges the gap between human intent and automated browser testing. Unlike fully autonomous AI-driven testing where AI makes decisions independently, E2EGen AI:
- Assists developers: You define test logic in natural language, AI generates the implementation
- Maintains control: You review, cache, and reuse generated code for deterministic test execution
- Reduces friction: Eliminates the tedious work of writing selectors and handling browser APIs
- Optimizes costs: Caching ensures AI is only used for code generation, not repeated execution
Think of it as: A coding assistant specialized in Playwright automation, not a replacement for human test design.
- Node.js 16+
- npm or yarn
- OpenAI API key (Azure OpenAI or standard OpenAI)
# Clone repository
git clone <your-repo-url>
cd pw-ai-smartpeg
# Install dependencies
npm install
# Configure API key
cp .env.example .env
# Edit .env and add your OPENAI_API_KEYCreate a .env file:
OPENAI_API_KEY=your_azure_openai_key_hereEdit aidriven-settings.json:
{
"execution": {
"entrypoint_url": "https://your-site.com",
"headless": false,
"steps_file": "aidriven-steps.json"
},
"ai_agent": {
"type": "gpt-4o",
"endpoint": "https://your-endpoint.openai.azure.com/openai/deployments/gpt-4o",
"cost_input_token": "0.000005",
"cost_output_token": "0.00002",
"cost_cached_token": "0.0000025"
}
}Edit aidriven-steps.json:
{
"steps": [
{
"sub_prompt": "Click the login button",
"timeout": "5000"
},
{
"sub_prompt": "Fill username with test@example.com and password with SecurePass123",
"timeout": "3000"
},
{
"sub_prompt": "Click submit and wait for dashboard",
"timeout": "8000"
}
]
}# First run - AI generates Playwright code and builds cache
node index.js --strength medium
# Subsequent runs - Execute cached code (zero AI cost)
node index.js --strength onlycache
# High reliability mode - 3 retry attempts with error learning
node index.js --strength highaidriven-settings.json:
| Field | Description | Example |
|---|---|---|
execution.entrypoint_url |
Starting URL for test execution | "https://example.com" |
execution.headless |
Run browser in headless mode | false |
execution.steps_file |
Path to steps JSON file | "aidriven-steps.json" |
execution.global_expectations |
Array of validations applied to all steps | ["No error banner visible"] |
ai_agent.type |
AI model identifier | "gpt-4o" |
ai_agent.endpoint |
Azure OpenAI deployment endpoint | "https://..." |
ai_agent.cost_input_token |
Cost per input token (USD) | "0.000005" |
ai_agent.cost_output_token |
Cost per output token (USD) | "0.00002" |
ai_agent.cost_cached_token |
Cost per cached token (USD) | "0.0000025" |
aidriven-steps.json:
{
"steps": [
{
"id": "73443201", // Auto-generated MD5 hash (optional)
"sub_prompt": "Your task description in natural language",
"timeout": "10000", // Milliseconds to wait after step execution
"expectations": [ // Optional: step-specific validations
"Success message must appear",
"No error dialog visible"
]
}
]
}Step Fields:
sub_prompt(required): Natural language task descriptiontimeout(optional): Pause duration after step completion (default: 10000ms)expectations(optional): Array of validation rules specific to this stepid(auto-generated): MD5 hash based on prompt + timeout + expectations (used for caching)
--strength <level>| Level | Attempts | Cache Behavior | Use Case |
|---|---|---|---|
onlycache |
1 | Required | Zero-cost reruns of stable tests (fails if cache missing) |
medium |
2 | Preferred | Default - Balance of cost and reliability |
high |
3 | Preferred | Complex workflows requiring retry with error context |
# Disable caching entirely (always generate fresh code)
--nocache
# Mock mode (no API calls, uses predefined actions)
--mock
# Use a specific StepsPack
--stepspack <name>
# Generate HTML report in addition to JSON
--html-report
# Customize HTML cleaning behavior
--htmlclean-remove <items>
--htmlclean-keep <items>
# Clean orphaned cache files
--clean orphansControl which HTML elements are removed before sending context to AI (reduces token usage):
# Default configuration (recommended)
node index.js
# Aggressive cleaning - remove everything except specific attributes
--htmlclean-remove all --htmlclean-keep id,class,data-testid
# Custom cleaning strategy
--htmlclean-remove comments,script,style,svg,img,longtextAvailable cleaning items:
comments- HTML commentsscript-<script>tags and contentstyle-<style>tags and inline stylessvg- SVG graphics and pathsimg- Image src attributesinlinestyle- Inline style attributesattributes- Non-essential data-* and aria-* attributeslongtext- Text content exceeding 25 charactersall- Remove all of the above (use with--htmlclean-keep)
βββββββββββββββ
β index.js β CLI entry point and orchestration
ββββββββ¬βββββββ
β
βββΊ ConfigManager β Load settings, validate options, manage StepsPacks
βββΊ CodeGenerator β Generate Playwright code via GPT-4o
βββΊ TestExecutor β Execute generated code with Playwright
βββΊ RetryManager β Handle retry logic with error context
βββΊ TestReporter β Track execution, calculate costs, generate reports
βββΊ TestRunner β Coordinate end-to-end test execution
- Parse CLI arguments and validate configuration
- Load settings from JSON (standard or StepsPack)
- Initialize OpenAI client (or MockOpenAI for debugging)
- Configure retry strategy based on
--strengthlevel
- Read test steps from JSON file
- Generate unique MD5 hash ID for each step (based on prompt + timeout + expectations)
- Validate cache availability (critical for
onlycachemode) - Apply global expectations to all steps
- Launch Chromium via Playwright
- Navigate to entry point URL
- Wait for initial page load (networkidle)
For each test step:
a) Cache Lookup (if caching enabled):
const cachePath = `./generated/aidriven/step-${hash}.js`;
if (fs.existsSync(cachePath)) {
// Use cached code β Zero API cost
code = fs.readFileSync(cachePath, "utf8");
}b) AI Code Generation (if cache miss):
// Extract and clean HTML from current page
const rawHtml = await page.$eval("body", el => el.outerHTML);
const cleanedHtml = executor.cleanHtml(rawHtml);
// Generate code via GPT-4o with context
const response = await client.chat.completions.create({
model: "gpt-4o",
messages: [
{ role: "system", content: systemPrompt },
{
role: "user",
content: `Task: ${step.subPrompt}\nURL: ${page.url()}\nHTML: ${cleanedHtml}`
}
]
});
const code = extractCodeFromResponse(response);
// Save to cache for future runs
fs.writeFileSync(cachePath, code);c) Code Execution:
// Wrap generated code in async function with Playwright context
const asyncFn = eval(`(async (page, expect) => { ${code} })`);
try {
await asyncFn(page, expect);
step.success = true;
} catch (error) {
step.errors.push(error);
}d) Retry Logic (if execution failed):
- Check remaining attempts based on strength level
- On retry: Include previous error message in AI prompt for smarter code generation
- If error message starts with "Test failed:" β Stop retrying (intentional failure)
- Update token usage counters for cost tracking
e) Post-Step Actions:
- Log execution result (success/failure, tokens used, cache hit)
- Wait for configured timeout before next step
- Proceed to next step (or halt if critical error +
--stop-on-error)
- Close browser session
- Calculate total token usage and estimated cost
- Save execution log to
run-logs.jsonwith detailed analytics - Update steps file with auto-generated IDs for caching
- Generate HTML report (if
--html-reportflag enabled)
ID Generation:
const stepData = {
sub_prompt: step.subPrompt,
timeout: step.timeout,
expectations: step.expectations
};
const id = crypto.createHash("md5")
.update(JSON.stringify(stepData))
.digest("hex")
.substring(0, 8);
// Cache path: ./generated/aidriven/step-{id}.jsCache Validation (onlycache mode):
const missingCache = steps.filter(step => !fs.existsSync(`${outputDir}/step-${step.id}.js`));
if (missingCache.length > 0) {
console.error("β Missing cache for steps:", missingCache.map(s => s.index));
console.error("π‘ Run with --strength medium/high to generate cache");
process.exit(1);
}Benefits:
- Zero AI cost on cache hits (99% of reruns after initial generation)
- Deterministic behavior - same code executes every time
- Faster execution - no network latency for AI requests
- Version control friendly - cache files can be committed for team sharing
Organize related test scenarios into isolated, self-contained packages with dedicated configuration, cache, and reports.
stepspacks/
βββ login-flow/
β βββ .env # Optional: Pack-specific API keys
β βββ settings.json # Pack configuration
β βββ steps.json # Test steps definition
β βββ media/ # Assets (images, test data files)
β β βββ test-image.png
β βββ generated/ # Execution artifacts
β βββ step-{hash}.js # Cached Playwright code
β βββ run-logs.json # Execution history
β βββ report.html # HTML report
β βββ debug/ # HTML snapshots (pre/post cleaning)
β βββ pre-clean/
β βββ post-clean/
βββ checkout-flow/
βββ admin-panel/
# 1. Create pack directory structure
mkdir -p stepspacks/login-flow/{media,generated}
# 2. Create settings.json
cat > stepspacks/login-flow/settings.json << 'EOF'
{
"execution": {
"entrypoint_url": "https://myapp.com/login",
"headless": false,
"global_expectations": [
"No error banner with 'Application Error' text visible"
]
},
"ai_agent": {
"type": "gpt-4o",
"endpoint": "https://your-endpoint.openai.azure.com/openai/deployments/gpt-4o",
"cost_input_token": "0.000005",
"cost_output_token": "0.00002",
"cost_cached_token": "0.0000025"
}
}
EOF
# 3. Create steps.json
cat > stepspacks/login-flow/steps.json << 'EOF'
{
"steps": [
{
"sub_prompt": "Enter email user@example.com in the email field",
"timeout": "3000"
},
{
"sub_prompt": "Enter password SecurePass123 and click the login button",
"timeout": "5000",
"expectations": [
"Welcome message must appear within 3 seconds"
]
}
]
}
EOF
# 4. (Optional) Add pack-specific API key
echo "OPENAI_API_KEY=your_pack_specific_key" > stepspacks/login-flow/.env# Execute specific pack
node index.js --stepspack login-flow --strength medium
# With HTML report generation
node index.js --stepspack login-flow --html-report --strength high
# List available packs
ls stepspacks/
# Output: login-flow checkout-flow admin-panelβ
Isolation: Separate cache, reports, and configuration per test suite
β
Reusability: Share packs across projects via version control
β
Security: Pack-specific .env files for different API keys/environments
β
Organization: Group related scenarios (e.g., all checkout flows)
β
Collaboration: Team members can work on different packs independently
Expectations allow you to define validation rules that AI must verify during step execution, enabling sophisticated test assertions in natural language.
Define expectations specific to a single step:
{
"steps": [
{
"sub_prompt": "Click the submit button",
"timeout": "5000",
"expectations": [
"Success message with text 'Data saved' must appear",
"No error toast visible"
]
}
]
}How it works:
- AI generates code that checks for these conditions
- If expectations fail, AI throws error:
Test failed: [expectation description] - Failed expectations trigger retries (unless
Test failed:prefix detected)
Apply common validations across all steps in a test suite:
{
"execution": {
"entrypoint_url": "https://myapp.com",
"global_expectations": [
"No banner with 'Application Error' text visible",
"No network error dialogs present"
]
}
}Global expectations are automatically merged with step-specific expectations, so you don't need to repeat common checks.
β
Use natural language: "Success banner appears" not "expect(locator).toBeVisible()"
β
Be specific: "Welcome message contains 'John Doe'" not "Some text appears"
β
Include timeouts: "Within 3 seconds after clicking, modal must close"
β
Case insensitive: AI automatically handles case variations
β
Negative assertions: "No error message visible" is valid
{
"steps": [
{
"sub_prompt": "Enter username 'admin' and password 'wrong_password', then click login",
"timeout": "5000",
"expectations": [
"Wait 3 seconds after clicking login",
"An error banner with text 'Invalid credentials' must appear",
"Login button must still be visible (not navigated away)"
]
}
]
}AI will generate code like:
await page.fill('#username', 'admin');
await page.fill('#password', 'wrong_password');
await page.click('#login-btn');
await page.waitForTimeout(3000);
const errorBanner = page.locator('text=/invalid credentials/i');
if (!(await errorBanner.isVisible())) {
throw new Error("Test failed: Error banner with 'Invalid credentials' not visible");
}
const loginBtn = page.locator('#login-btn');
if (!(await loginBtn.isVisible())) {
throw new Error("Test failed: Login button not visible after failed attempt");
}stepspacks/ecommerce-login/steps.json:
{
"steps": [
{
"sub_prompt": "Wait for page to fully load, then click the 'Sign In' link in the header navigation",
"timeout": "3000"
},
{
"sub_prompt": "Fill email field with user@example.com and password field with SecurePass123!",
"timeout": "2000"
},
{
"sub_prompt": "Click the login submit button and wait for dashboard",
"timeout": "5000",
"expectations": [
"Welcome message containing user's name must appear",
"User avatar icon visible in top-right corner"
]
}
]
}Execution:
# Generate cache (first run only)
node index.js --stepspack ecommerce-login --strength medium
# All subsequent runs use cache ($0.00 AI cost)
node index.js --stepspack ecommerce-login --strength onlycachestepspacks/data-export/steps.json:
{
"steps": [
{
"sub_prompt": "Navigate to Analysis dropdown menu and click 'Smart Compare'",
"timeout": "5000"
},
{
"sub_prompt": "Select date range 'Last 30 days' from the filter dropdown",
"timeout": "3000"
},
{
"sub_prompt": "Check if export button is enabled. If disabled, throw error 'Export unavailable'. If enabled, click it.",
"timeout": "8000",
"expectations": [
"Download notification or progress bar must appear within 5 seconds"
]
}
]
}High reliability execution:
node index.js --stepspack data-export --strength high --html-reportstepspacks/profile-photo/steps.json:
{
"steps": [
{
"sub_prompt": "Click the three-dot menu icon in the profile section",
"timeout": "3000"
},
{
"sub_prompt": "Click the 'Edit Photo' button with id #btn_modifica_foto",
"timeout": "4000"
},
{
"sub_prompt": "Click 'Choose File' and select /path/to/stepspacks/profile-photo/media/avatar.png. Wait 3 seconds, then click the enabled save button",
"timeout": "15000",
"expectations": [
"Success toast with text 'Photo updated' appears",
"New photo is visible in profile section"
]
}
]
}stepspacks/invalid-login/steps.json:
{
"steps": [
{
"sub_prompt": "If cookie consent banner is visible, click 'Accept All'",
"timeout": "3000"
},
{
"sub_prompt": "Click the login button in header",
"timeout": "2000"
},
{
"sub_prompt": "Enter username 'admin' and password 'wrong_password', then click login",
"timeout": "5000",
"expectations": [
"Wait 3 seconds after clicking login",
"Error banner with text 'Invalid username or password' must appear"
]
}
]
}Note: When expectations explicitly validate errors (like above), the test passes if the error appears as expected. AI detects this pattern and generates appropriate validation code.
stepspacks/onboarding/steps.json:
{
"steps": [
{
"sub_prompt": "Fill 'First Name' with John, 'Last Name' with Doe, 'Email' with john@example.com, then click Next",
"timeout": "3000",
"expectations": [
"Step 2 indicator becomes active",
"Step 1 indicator shows completed checkmark"
]
},
{
"sub_prompt": "Select 'Developer' from role dropdown, enter company name 'Acme Corp', click Next",
"timeout": "3000",
"expectations": [
"Step 3 indicator becomes active"
]
},
{
"sub_prompt": "Check 'I agree to terms' checkbox, click 'Complete Registration'",
"timeout": "8000",
"expectations": [
"Success page with text 'Welcome to the platform' appears",
"Confirmation email sent message visible"
]
}
]
}# First run: Generate code and build cache
node index.js --stepspack my-test --strength medium
# Cost: ~$0.30 (one-time for 10 steps)
# All subsequent runs: Execute cached code
node index.js --stepspack my-test --strength onlycache
# Cost: $0.00 β¨ (indefinitely, until steps change)Savings: 100% cost reduction on reruns. For a test suite run daily:
- Month 1: $0.30 (initial) + $0.00 Γ 29 days = $0.30
- Without caching: $0.30 Γ 30 days = $9.00
- Savings: 97% ($8.70/month)
- Default:
--strength medium(2 attempts) balances cost and reliability - Reserve high: Use
--strength high(3 attempts) only for flaky/complex flows - Use onlycache: For stable tests in CI/CD pipelines after initial cache generation
# Development: Allow AI to retry on failures
npm run test:dev -- --strength medium
# CI/CD: Use cached code only (fails fast if cache missing)
npm run test:ci -- --strength onlycacheReduce token usage by stripping unnecessary HTML elements:
# Aggressive cleaning (minimal tokens, maximum savings)
node index.js --htmlclean-remove all --htmlclean-keep id,class,data-testid
# Balanced approach (default, recommended)
node index.js --htmlclean-remove comments,script,style,svg,img,longtext
# Conservative (keep more context, higher tokens)
node index.js --htmlclean-remove comments,scriptImpact: Aggressive cleaning can reduce input tokens by 60-80%, saving ~$0.02-0.05 per step generation.
After execution, review run-logs.json for cost insights:
{
"runs": [{
"usage": {
"total_tokens": 12450,
"input_tokens": 10000,
"output_tokens": 2000,
"cached_tokens": 8500,
"calculated_cost": 0.0375
}
}]
}Key metrics:
- Cached tokens: Azure OpenAI automatically caches repeated prompt content (50% cheaper)
- Input tokens: Reduce via HTML cleaning and concise prompts
- Output tokens: AI-generated code length (optimize by being specific in prompts)
β Good: Clear and focused
{
"sub_prompt": "Click login button with id #btn_login"
}β Bad: Verbose and redundant
{
"sub_prompt": "Please locate the login button on the page, which should be somewhere near the top of the form area, and when you successfully find it, proceed to click on it so we can move to the next step of the authentication process"
}Impact: Verbose prompts can double token usage with no benefit. Concise prompts also generate simpler, more reliable code.
Instead of repeating common checks:
β Inefficient:
{
"steps": [
{
"sub_prompt": "Click submit",
"expectations": ["No error banner visible"]
},
{
"sub_prompt": "Click next",
"expectations": ["No error banner visible"]
}
]
}β Efficient:
{
"execution": {
"global_expectations": ["No error banner visible"]
},
"steps": [
{ "sub_prompt": "Click submit" },
{ "sub_prompt": "Click next" }
]
}Assumptions:
- 10 steps, ~1000 tokens per step (input)
- ~100 tokens per step (output)
- 50% of input tokens cached by Azure OpenAI on reruns
| Mode | API Calls | Input Tokens | Output Tokens | Cached Tokens | Cost (USD) |
|---|---|---|---|---|---|
| First run (medium) | 10 | 10,000 | 1,000 | 0 | $0.27 |
| Rerun with cache | 0 | 0 | 0 | 0 | $0.00 β¨ |
| Medium (no cache) | 10 | 10,000 | 1,000 | 5,000 | $0.21 |
| High (3 attempts) | 15 | 15,000 | 1,500 | 7,500 | $0.31 |
Cost breakdown:
Input tokens: 10,000 Γ $0.000005 = $0.05
Output tokens: 1,000 Γ $0.00002 = $0.02
Cached tokens: 5,000 Γ $0.0000025 = $0.0125
Total: $0.0825 (typical rerun with partial cache)
Monthly projection (30 runs):
- With caching: $0.27 (first) + $0.00 Γ 29 = $0.27/month
- Without caching: $0.27 Γ 30 = $8.10/month
- Savings: 97% ($7.83/month per test suite)
- Development: Use
--strength mediumto build cache - CI/CD: Use
--strength onlycachefor zero-cost execution - Debugging: Add
--nocachetemporarily to regenerate problematic steps - Production: Monitor
run-logs.jsonand optimize HTML cleaning if costs exceed budget
β ERRORE: Cache mancante per i seguenti step:
- Step 1: "Click login button"
File atteso: ./generated/aidriven/step-aa9c1054.js
π‘ Suggerimento: Esegui prima con --strength medium o --strength highCause: Step definition changed (prompt, timeout, or expectations), invalidating cache hash.
Solutions:
# Regenerate cache for all steps
node index.js --strength medium --nocache
# Or use medium strength without nocache (updates only missing cache)
node index.js --strength mediumCommon causes:
- Page not fully loaded before step execution
- Dynamic content/selectors changed since code generation
- Element hidden behind modal or outside viewport
- Race condition (element appears/disappears quickly)
Solutions:
a) Increase timeout to allow more load time:
{
"sub_prompt": "Click submit button",
"timeout": "10000" // Increased from 5000
}b) Use high strength for retry with error learning:
node index.js --strength highAI will receive previous error message and generate smarter code (e.g., explicit waits, alternative selectors
node index.js --strength highAI will receive previous error message and generate smarter code (e.g., explicit waits, alternative selectors).
c) Clear cache if page structure changed:
node index.js --nocache --strength mediumd) Inspect generated code to debug selector issues:
cat ./generated/aidriven/step-{hash}.jse) Be more specific in your prompt:
// β Vague
{
"sub_prompt": "Click the button"
}
// β
Specific
{
"sub_prompt": "Click the blue 'Submit' button with id #btn-submit in the form footer"
}Symptoms:
- Reported costs don't match expected values
- Cached token count seems wrong
- Usage stats missing in
run-logs.json
Debugging steps:
- Verify cost configuration in settings:
{
"ai_agent": {
"cost_input_token": "0.000005", // Check Azure pricing page
"cost_output_token": "0.00002",
"cost_cached_token": "0.0000025"
}
}- Review execution log:
cat ./generated/aidriven/run-logs.json | jq '.runs[-1].usage'- Check OpenAI response for token details:
- Cached tokens only reported by Azure OpenAI (not standard OpenAI API)
- Ensure you're using Azure endpoint with
api-version: 2024-12-01-preview
β --strength onlycache e --nocache sono opzioni incompatibiliCause: Conflicting flags that contradict each other.
Invalid combinations:
--strength onlycache+--nocache(onlycache requires cache, nocache disables it)--mock+--stepspack(mock mode uses hardcoded actions, incompatible with StepsPacks)
Solution: Review your command and remove conflicting flags.
β Step 2 fallito (tentativo 1)
Errore: Test failed: Invalid credentials error banner not visibleThis is expected behavior, not a bug. When AI detects "Test failed:" prefix, it means:
- Your expectations explicitly required an error/condition
- That condition was not met
- Test should fail (no retry attempted)
Example scenario:
{
"sub_prompt": "Enter wrong password and click login",
"expectations": [
"Error banner with 'Invalid credentials' must appear"
]
}If the error banner doesn't appear, the test should fail because the application didn't behave as expected.
Not an error: This validates your application is working correctly (or catches bugs).
Symptoms:
- AI generates code that can't find elements
- Selectors in generated code are overly generic
- Steps fail that previously worked
Cause: --htmlclean-remove stripped essential attributes AI needs for locators.
Solutions:
a) Use less aggressive cleaning:
# Instead of:
node index.js --htmlclean-remove all --htmlclean-keep id
# Try:
node index.js --htmlclean-remove all --htmlclean-keep id,class,data-testid,aria-labelb) Review cleaned HTML to verify important attributes remain:
cat ./generated/aidriven/debug/post-clean/1.htmlc) Default cleaning is usually optimal:
# Recommended balance of token reduction and context preservation
node index.js
# (no htmlclean flags = default behavior)β StepsPack non trovato: my-pack
StepsPacks disponibili:
- login-flow
- checkout-flowCause: Typo in pack name or pack doesn't exist.
Solutions:
a) List available packs:
ls stepspacks/b) Check exact spelling (case-sensitive):
# β Wrong
node index.js --stepspack Login-Flow
# β
Correct
node index.js --stepspack login-flowc) Create the pack if it doesn't exist:
mkdir -p stepspacks/my-pack
cp stepspacks/login-flow/settings.json stepspacks/my-pack/
# Edit settings.json and create steps.jsonSymptoms:
- Global expectations in
settings.jsonnot validated - Steps pass when global expectation should fail
Causes & Solutions:
a) Check settings.json syntax:
{
"execution": {
"global_expectations": [ // β
Correct: array
"No error banner visible"
]
}
}
// β Wrong:
{
"execution": {
"global_expect": "No error" // Wrong key name
}
}b) Verify in generated prompt:
# Check console output during execution - AI prompt should include:
# "Devono verificarsi queste expectations: [global expectations + step expectations]"c) Cache invalidation: If you added global expectations after cache generation:
# Regenerate cache
node index.js --nocache --strength mediumSymptoms:
- Expected $0.00 costs but seeing charges
cached_tokenscount is low or zero
Possible causes:
a) Cache miss due to modified steps:
- Changed
sub_prompt,timeout, orexpectations - Step hash changed, forcing regeneration
b) First run after cache clear:
# This will incur costs (expected)
node index.js --nocache --strength mediumc) Dynamic page content causing different HTML each run:
- Even with cache, HTML extraction happens for validation
- AI prompt uses current HTML, but code is cached
- Solution: HTML cleaning reduces variability
d) Azure OpenAI caching not enabled:
- Ensure
api-version: 2024-12-01-previewin settings - Cached tokens only work with Azure OpenAI (not standard API)
Step-by-step troubleshooting process:
{
"execution": {
"headless": false
}
}node index.js --mockThis simulates AI responses with hardcoded actions (see mock-openai.js).
# Find step hash from error message, then:
cat ./generated/aidriven/step-{hash}.js
# Example:
cat ./generated/aidriven/step-aa9c1054.js# Pre-cleaning (raw HTML):
cat ./generated/aidriven/debug/pre-clean/1.html
# Post-cleaning (what AI sees):
cat ./generated/aidriven/debug/post-clean/1.html# Latest run details:
cat ./generated/aidriven/run-logs.json | jq '.runs[-1]'
# Failed steps only:
cat ./generated/aidriven/run-logs.json | jq '.runs[-1].results[] | select(.status == "error")'
# Token usage summary:
cat ./generated/aidriven/run-logs.json | jq '.runs[-1].usage'# Regenerate all step code (ignore cache)
node index.js --nocache --strength high
# Regenerate + save new cache:
node index.js --nocache --strength medium# Create temporary StepsPack with only failing step:
mkdir -p stepspacks/debug-step
cat > stepspacks/debug-step/steps.json << 'EOF'
{
"steps": [
{
"sub_prompt": "The exact prompt that's failing",
"timeout": "10000",
"expectations": ["Your expectations here"]
}
]
}
EOF
# Copy settings and test isolated:
cp stepspacks/original-pack/settings.json stepspacks/debug-step/
node index.js --stepspack debug-step --strength high# Check for JSON syntax errors:
cat aidriven-settings.json | jq .
# Check StepsPack settings:
cat stepspacks/my-pack/settings.json | jq .
cat stepspacks/my-pack/steps.json | jq .# Remove cached code for deleted/modified steps:
node index.js --stepspack my-pack --clean orphans
# Manually inspect cache directory:
ls -lh ./stepspacks/my-pack/generated/step-*.js// Temporarily add to _buildPrompt() method:
console.log("=== FULL PROMPT SENT TO AI ===");
console.log(prompt);
console.log("=== END PROMPT ===");If issues persist after trying the above:
- Collect diagnostic info:
# Create a support bundle:
tar -czf debug-bundle.tar.gz \
stepspacks/my-pack/settings.json \
stepspacks/my-pack/steps.json \
stepspacks/my-pack/generated/run-logs.json \
stepspacks/my-pack/generated/step-*.js \
stepspacks/my-pack/generated/debug/- Review logs for error patterns:
run-logs.json: Execution history- Console output: Real-time errors
- Generated code: AI's interpretation
- Open an issue on GitHub with:
- E2EGen AI version (
cat package.json | jq .version) - Node.js version (
node --version) - Operating system
- Full error message
- Redacted configuration files
- Steps to reproduce
Best practices:
- Use
.envfiles (automatically ignored by Git):
# Root .env for global API key:
echo "OPENAI_API_KEY=your_key_here" > .env
# Pack-specific .env for isolated keys:
echo "OPENAI_API_KEY=pack_specific_key" > stepspacks/my-pack/.env- Verify
.gitignoreconfiguration:
# Should include:
.env
.env.local
.env.*.local
stepspacks/*/.env- Audit commits before pushing:
git diff --cached | grep -i "api_key\|password\|secret"β Never hardcode credentials in step prompts:
{
"sub_prompt": "Login with username admin@company.com and password MySecretPass123!"
}β Use generic placeholders and load from environment:
{
"sub_prompt": "Login with credentials from environment variables TEST_USER and TEST_PASS"
}Then handle in custom wrapper or use test data files:
export TEST_USER=admin@company.com
export TEST_PASS=secure_password
node index.js --stepspack login-testImportant: AI-generated code executes with full Playwright permissions (file system access, network requests, etc.).
Security checklist:
- Review generated code before committing to cache:
cat ./generated/aidriven/step-*.js | grep -i "eval\|exec\|require\|import"-
Avoid
eval()in production - while E2EGen AI uses eval internally, ensure generated code doesn't contain nested eval calls. -
Sanitize file paths in prompts:
// β
Safe:
{
"sub_prompt": "Upload file from ./stepspacks/my-pack/media/test.png"
}
// β Risky:
{
"sub_prompt": "Upload file from /etc/passwd"
}- Run tests in isolated environments:
- Use Docker containers for CI/CD
- Avoid running on production databases
- Use test accounts with limited permissions
Execution logs may contain sensitive data:
- Selectors with internal IDs
- URLs with session tokens
- Error messages with system paths
Before sharing logs:
# Redact sensitive info:
cat run-logs.json | jq 'del(.runs[].results[].errors[].stack)' > run-logs-sanitized.json
# Remove debug HTML snapshots:
rm -rf ./generated/aidriven/debug/Recommended schedule:
- Development keys: Rotate every 90 days
- Production keys: Rotate every 30 days
- Immediately rotate if:
- Key accidentally committed to Git
- Team member with access leaves
- Unusual API usage detected
Rotation process:
# 1. Generate new key in Azure Portal
# 2. Update .env files:
echo "OPENAI_API_KEY=new_key_here" > .env
# 3. Test with one StepsPack:
node index.js --stepspack test-pack --strength onlycache
# 4. If successful, update all packs:
for pack in stepspacks/*/; do
echo "OPENAI_API_KEY=new_key_here" > "$pack/.env"
done
# 5. Invalidate old key in Azure PortalSecurity consideration: Running browsers in headed mode on servers can expose sensitive data.
Production settings:
{
"execution": {
"headless": true // β
Always true for CI/CD
}
}Exception: Use headed mode only in secure, isolated development environments.
Contributions are welcome! E2EGen AI is an evolving framework, and community input helps shape its direction.
Priority features for future releases:
- Environment variable injection in prompts:
"Login with username ${PROCESS.ENV.TEST_USER}" - Screenshot capture on failure automatically saved to reports
- Multiple browser support (Firefox, Safari, WebKit)
- Step dependency system:
"depends_on": ["step-1", "step-2"]to optimize execution order - Conditional execution:
"run_if": "previous_step_passed"for branching logic
- Parallel step execution for independent tests (10x speedup potential)
- Visual regression testing integration (Percy, Applitools, Playwright's visual compare)
- CI/CD integration templates (GitHub Actions, GitLab CI, Jenkins)
- Web UI for step configuration (drag-and-drop test builder)
- Video recording of test execution (Playwright traces)
- Real-time progress dashboard via WebSocket
- Multi-language prompt support (English, Italian, Spanish, etc.)
- Test data generation via AI (generate realistic form inputs)
- Cross-browser comparison reports (Chrome vs Firefox differences)
- Performance profiling (execution time per step, network bottlenecks)
git clone https://github.com/your-username/pw-ai-smartpeg.git
cd pw-ai-smartpeg
git remote add upstream https://github.com/original-repo/pw-ai-smartpeg.git# Use descriptive branch names:
git checkout -b feature/screenshot-on-failure
git checkout -b fix/cache-invalidation-bug
git checkout -b docs/improve-troubleshootingDevelopment setup:
# Install dependencies:
npm install
# Run tests (if available):
npm test
# Test your changes with a StepsPack:
node index.js --stepspack test-pack --strength mediumCode style guidelines:
- Use ES6+ syntax (async/await, destructuring, arrow functions)
- Follow existing naming conventions (
camelCasefor functions,PascalCasefor classes) - Add JSDoc comments for public methods:
/**
* Generates Playwright code for a test step
* @param {Object} step - Step configuration
* @param {Object} context - Execution context (html, url, error)
* @returns {Promise<Object>} Generated code and token usage
*/
async generate(step, context) { ... }Create test files in tests/ directory:
// tests/code-generator.test.js
import { CodeGenerator } from '../core/CodeGenerator.js';
import { MockOpenAI } from '../mock-openai.js';
describe('CodeGenerator', () => {
it('should generate code for simple click action', async () => {
const client = new MockOpenAI({ apiKey: 'test' });
const generator = new CodeGenerator(client);
const result = await generator.generate(
{ subPrompt: 'Click button with id #submit' },
{ html: '<button id="submit">Submit</button>', url: 'http://test.com' }
);
expect(result.code).toContain('page.click(\'#submit\')');
});
});Use conventional commit messages:
git add .
# Format: <type>(<scope>): <subject>
git commit -m "feat(retry): add exponential backoff for retries"
git commit -m "fix(cache): resolve hash collision for similar prompts"
git commit -m "docs(readme): add troubleshooting section for cache errors"
git commit -m "refactor(executor): extract HTML cleaning to utility class"Commit types:
feat: New featurefix: Bug fixdocs: Documentation changesrefactor: Code refactoring (no functionality change)test: Adding/updating testschore: Maintenance (dependencies, config)
git push origin feature/your-feature-namePR template:
## Description
Brief description of changes and motivation.
## Type of Change
- [ ] Bug fix (non-breaking change fixing an issue)
- [ ] New feature (non-breaking change adding functionality)
- [ ] Breaking change (fix or feature causing existing functionality to break)
- [ ] Documentation update
## Testing
- [ ] Tested manually with StepsPack: [name]
- [ ] Added/updated unit tests
- [ ] All tests pass locally
## Checklist
- [ ] Code follows existing style guidelines
- [ ] Added JSDoc comments for new functions
- [ ] Updated README.md if needed
- [ ] No sensitive data (API keys, passwords) in commits
## Related Issues
Closes #[issue-number]Prerequisites:
# Node.js 16+
node --version
# Git
git --versionLocal development workflow:
# Install dependencies:
npm install
# Create test StepsPack:
mkdir -p stepspacks/dev-test
cat > stepspacks/dev-test/settings.json << 'EOF'
{
"execution": {
"entrypoint_url": "https://example.com",
"headless": false
},
"ai_agent": {
"type": "gpt-4o",
"endpoint": "https://your-endpoint.openai.azure.com/...",
"cost_input_token": "0.000005",
"cost_output_token": "0.00002",
"cost_cached_token": "0.0000025"
}
}
EOF
cat > stepspacks/dev-test/steps.json << 'EOF'
{
"steps": [
{
"sub_prompt": "Wait for page load",
"timeout": "3000"
}
]
}
EOF
# Test changes:
node index.js --stepspack dev-test --strength medium
# Use mock mode for rapid iteration:
node index.js --stepspack dev-test --mockBug report template:
## Describe the Bug
Clear description of what's happening.
## Steps to Reproduce
1. Configure StepsPack with settings: [attach sanitized settings.json]
2. Run command: `node index.js --stepspack X --strength medium`
3. Observe error: [error message]
## Expected Behavior
What should happen instead.
## Environment
- E2EGen AI version: [cat package.json | jq .version]
- Node.js version: [node --version]
- Operating System: [e.g., Ubuntu 22.04, macOS 14, Windows 11]
- Playwright version: [@playwright/test version from package.json]
## Additional Context
- Execution logs: [attach run-logs.json excerpt]
- Generated code: [attach problematic step-{hash}.js if relevant]
- Screenshots: [if applicable]For reviewers:
Check:
- Code follows existing patterns and style
- No hardcoded credentials or sensitive data
- New features documented in README
- Breaking changes clearly marked
- Error handling is comprehensive
- Token usage is optimized (avoid unnecessary AI calls)
Test:
# Checkout PR branch:
git fetch origin pull/ID/head:pr-branch
git checkout pr-branch
# Test with multiple StepsPacks:
node index.js --stepspack login-flow --strength medium
node index.js --stepspack checkout-flow --strength high
# Verify cost calculations:
cat stepspacks/*/generated/run-logs.json | jq '.runs[-1].usage'Summary: Permission to use, copy, modify, and distribute this software for any purpose with or without fee, provided copyright and permission notice are included.
E2EGen AI is built on the shoulders of giants:
- Playwright - Reliable, fast browser automation framework by Microsoft
- OpenAI GPT-4o - Advanced language model enabling natural language code generation
- Azure OpenAI Service - Enterprise-grade AI with automatic prompt caching
- Commander.js - Elegant CLI argument parsing
- JSDOM - Pure JavaScript HTML parser and DOM implementation
- dotenv - Secure environment variable management
Special thanks to the open-source community for testing, feedback, and contributions.
For issues, feature requests, or questions:
- π§ Open an issue on GitHub with detailed reproduction steps
- π¬ Check existing issues for solutions and workarounds
- π Review this README and inline code documentation
- π Search closed issues for previously resolved problems
- Examples repository: github.com/e2egen-ai/examples (coming soon)
- Video tutorials: youtube.com/@e2egen-ai (coming soon)
- Discord community: discord.gg/e2egen-ai (coming soon)
Happy Testing! π
E2EGen AI - Bridging human intent and browser automation through AI assistance
