Execute intelligent web data playbook using Resets.ai's powerful search and extract APIs
Playbook Tool is a Python-based automation framework that orchestrates complex web data extraction workflows using the Resets.ai API. It enables you to build sophisticated data playbook where each step can use outputs from previous steps, perfect for competitive analysis, market research, and web intelligence gathering.
- โ Sequential Playbook Processing - Chain multiple operations where each step can use data from previous steps
- โ Dynamic Data Flow - Extract URLs from one page, then analyze each URL in subsequent steps
- โ Real-time Progress Tracking - Output files update after each operation
- โ Smart Credit Management - Track API credits used per operation
- โ Flexible Configuration - YAML-based playbook definitions
- โ Dual Output Formats - Get output files in both JSON (for data processing) and MARKDOWN (for easy reading)
- โ Markdown Tables - Results formatted in clean, readable tables
- โ Error Recovery - Continues processing even if individual steps fail
This tool is currently in beta (early testing phase). To ensure the best experience:
- ๐ Start Small: Test with 3-5 URLs before scaling up
- ๐ Use Multiple Files: Break large projects into smaller files (see "Smart File Strategy" below)
- โฑ๏ธ Be Patient: Each operation takes time - don't overwhelm the system
- ๐ณ Monitor Credits: Keep an eye on credit usage in your output files
No installation needed - run everything in your browser!
-
Get a Resets.ai API Key:
- Go to Resets.ai
- Sign up for a free account
- Click Get API key button
-
Open Replit:
- Go to Replit.com
- Sign up for free
- Click "Create Repl" โ Choose "Python" โ Name it "playbook"
-
Add the Code:
- Copy the
main.pycode from our GitHub - Paste it into Replit
- Find
API_KEY = "your_resets_api_key_here"(around line 500) - Replace with your actual API key
- Copy the
-
Create Your First Playbook:
- Click "New file" โ Name it
config.yaml - Copy this simple example:
report: name: "My First Search" industry: "General" sections: - name: "Search Test" steps: - id: "search_something" type: "search" input: query: "coffee shops in Seattle" limit: 5
- Click "New file" โ Name it
-
Run It:
- Click the big green "Run" button
- Watch your results appear!
- Open the
.mdfile to see your markdown report
That's it! You're now running web intelligence playbooks! ๐
For Beginners (Using Replit):
- โ A Resets.ai API key (Get one here)
- โ A free Replit account (Sign up here)
Perfect for beginners - everything runs in your browser!
- Go to Replit.com and create a free account
- Create a new Repl:
- Click "Create Repl"
- Choose "Python" as the template
- Name it "playbook" or anything you like
- Add the code:
- Delete any existing code in
main.py - Copy all the code from
main.py - Paste it into Replit's
main.pyfile
- Delete any existing code in
- Add your API key:
- Find the line
API_KEY = "your_resets_api_key_here" - Replace with your actual API key from Resets.ai
- Find the line
- Create your config:
- Click "New file" in Replit
- Name it
config.yaml - Copy one of the example configurations from this guide
- Run it:
- Click the big green "Run" button
- That's it! No Python installation needed!
Create a config.yaml file:
report:
name: "My First Analysis"
industry: "E-commerce"
sections:
- name: "Homepage Analysis"
steps:
- id: "extract_homepage"
type: "extract"
input:
urls:
- "https://example.com"
extraction:
prompt: "Extract the main headline and featured products"
schema:
headline: "string"
products:
- name: "string"
price: "string"-
In your Replit project:
- Make sure you have
main.py(with your API key) - Make sure you have
config.yaml(your playbook)
- Make sure you have
-
Run it:
- Click the big green "Run" button at the top
- Or type in the Shell tab:
python main.py config.yaml
-
See your results:
- Files will appear in the file list on the left
- Click on the
.mdfile to read your markdown report - Download files by clicking the three dots โ Download
# Open Terminal (Mac) or Command Prompt (Windows)
# Navigate to your folder
cd path/to/your/folder
# Run the playbook
python main.py config.yamlEvery run creates TWO files:
- ๐ JSON file:
report_20250724_143022.json- All your data in structured format - ๐ MARKDOWN file:
report_20250724_143022.md- Nicely formatted report with tables
Example markdown report:
Report: My First Analysis
Industry: E-commerce
Generated: 2025-07-24T14:30:22
Status: completed
Total Credits Used: 5
Credits Used: 5
Input Count: 1 | Output Count: 1 | Credits: 5
| URL | headline | products |
|---|---|---|
| https://example.com | Welcome to the Best Online Store | Laptop Pro 2024, Smartphone X, Wireless Earbuds |
End of Report
- Define Your Playbook - Create a YAML configuration with sections and steps
- Sequential Execution - Each step runs in order within a section
- API Calls - Steps execute via Resets.ai's search or extract endpoints
- Data Flow - Steps can reference data from previous steps using JSONPath
- Real-time Output - Results are saved incrementally as both JSON and MARKDOWN files
- Credit Tracking - Monitor API usage throughout execution
The Playbook Tool uses two main Resets.ai API endpoints:
The search endpoint allows you to search across the web using multiple search engines simultaneously:
- Multi-engine search: Combines results from Google, neural search, and other advanced engines
- Advanced filtering: Filter by time range, country, language, and more
- Flexible queries: Single query or multiple queries in one request
- Credit cost: Varies based on search complexity and result count
The extract endpoint enables structured data extraction from any webpage:
- 99% coverage: Successfully extracts content from virtually any website
- Schema-based extraction: Define exactly what data structure you want
- Intelligent routing: Automatically handles different website types
- LLM-powered: Uses advanced language models to understand and extract content
- Credit cost: Based on page complexity and extraction requirements
Both endpoints are designed to work together - search finds relevant URLs, then extract pulls structured data from those URLs.
YAML is like a shopping list for your playbook. It uses indentation (spaces) to organize information:
# This is a comment - it won't affect anything
report: # This will group all items under report in a report table for different URLs
name: "My Report" # 2 spaces before 'name'
industry: "Tech" # Same indentation as 'name'
sections: # Back to the start
- name: "Section 1" # The dash means it's a list item, and we might have more than one section
steps: # 2 more spaces
- id: "step1" # Another list, indented more
type: "search"Key Rules:
- Use spaces, not tabs
- Keep the same indentation level for items at the same level
- The dash (
-) creates a list item - Text after
:is the value - If you are adding quotes inside your query for a exact term match make sure the outside quotes for the full query are different.
# Notice the single quotes to wrap the query and double quotes used inside the query for exact term match
query: '(site:boards.greenhouse.io OR site:jobs.lever.co) ("product manager" OR "product owner") AND (startup OR "series A" OR "series B") AND ("united states" OR "usa" OR "us")'
report:
name: "Report Name"
industry: "Industry Type"
sections:
- name: "Section Name"
steps:
- id: "unique_step_id"
type: "search|extract" # Choose one endpoint only either search or extract
input:
# Input configuration
# Additional parametersFind content across the web with advanced filters:
- id: "search_competitors"
type: "search"
input:
query: "athletic shoe brands 2024" # Single query
limit: 20 # Number of results per query (default: 10)
filters:
time_range: "m" # Date filter: d=day, w=week, m=month, y=year
country: "us" # Country code for localized results
language: "en" # Language code for results- id: "search_competitors"
type: "search"
input:
queries: # Multiple queries
- "best running shoes 2024"
- "marathon shoe reviews"
limit: 20 # Number of results per query (default: 10)
filters:
time_range: "m" # Date filter: d=day, w=week, m=month, y=year
country: "us" # Country code for localized results
language: "en" # Language code for resultsSearch Parameters:
- query/queries: Single search string or array of multiple queries
- limit: Maximum results to return per query (default: 10)
- filters.time_range: Filter results by recency (passed as
dateparameter to API)d- Past 24 hoursw- Past weekm- Past monthy- Past year
- filters.country: ISO country code (e.g., "us", "uk", "ca")
- filters.language: ISO language code (e.g., "en", "es", "fr")
Extract structured data from any webpage:
- id: "extract_products"
type: "extract"
input:
# Direct URLs
urls:
- "https://store.com/products"
- "https://store.com/about"
# OR from previous step
from_step: "search_results"
field: "url" # JSONPath to URL field
limit: 10 # Max URLs to process from previous step
extraction:
prompt: "Extract all product information including prices and features"
schema:
products:
- name: "string"
price: "number"
in_stock: "boolean"
features: "array"Extract Parameters:
- Input Sources (use one):
- urls: Array of direct URLs to extract from
- from_step + field: Reference URLs from a previous step's output
- from_step: ID of the previous step
- field: JSONPath to the URL field (e.g., "url", "data.links", "results[].page_url")
- limit: Maximum number of URLs to process
- extraction.prompt: Natural language instruction for what to extract
- extraction.schema: Expected data structure (JSON schema format)
- Use "string", "number", "boolean", "array", or nested objects
Think of chaining steps like following a recipe - each step builds on the previous one!
When you chain steps, you're creating a data pipeline where:
- Step 1 finds information (like a list of websites)
- Step 2 uses that list to gather more details
- Step 3 might analyze those details further
Here's a beginner-friendly example that finds companies and then analyzes them:
report:
name: "Sustainable Fashion Research"
industry: "Fashion"
sections:
- name: "Find and Analyze Companies"
steps:
# STEP 1: Search for companies
- id: "find_companies" # Give this step a name
type: "search"
input:
query: "sustainable fashion brands 2024"
limit: 5 # Start small - just 5 results!
# STEP 2: Visit each company website found in Step 1
- id: "analyze_companies" # Another unique name
type: "extract"
input:
from_step: "find_companies" # Tell it to use Step 1's results
field: "url" # Use the 'url' field from each result
limit: 3 # Only analyze first 3 companies
extraction:
prompt: "What products does this company sell?"
schema:
company_name: "string"
main_products: "array"
price_range: "string"When you write:
from_step: "find_companies"
field: "url"You're saying: "Look at the results from 'find_companies' and grab the URL from each one."
# First: Find websites
- id: "search_step"
type: "search"
input:
query: "your search terms"
limit: 10
# Then: Extract data from those websites
- id: "extract_step"
type: "extract"
input:
from_step: "search_step"
field: "url"
limit: 5 # Process only first 5# First: Get list of product pages
- id: "get_product_list"
type: "extract"
input:
urls: ["https://store.com/products"]
extraction:
prompt: "Find all product page URLs"
schema:
product_urls: "array"
# Then: Analyze each product page
- id: "analyze_products"
type: "extract"
input:
from_step: "get_product_list"
field: "data.product_urls" # Note: 'data.' prefix for extracted content
limit: 10Instead of creating one massive playbook, break your research into smaller, focused files:
โ DON'T DO THIS:
# massive_research.yaml - Too big!
sections:
- name: "Find all companies" # 50 companies
- name: "Analyze all websites" # 50 websites
- name: "Extract all products" # 500 products
- name: "Get all reviews" # 1000s of reviews
# This could take hours and might fail halfway!โ DO THIS INSTEAD:
Create separate files for each phase:
File 1: find_companies.yaml
report:
name: "Step 1 - Find Companies"
industry: "Your Industry"
sections:
- name: "Company Discovery"
steps:
- id: "search_companies"
type: "search"
input:
query: "your search terms"
limit: 20 # Reasonable numberFile 2: analyze_top_companies.yaml
report:
name: "Step 2 - Analyze Top 5"
industry: "Your Industry"
sections:
- name: "Deep Dive"
steps:
- id: "analyze_companies"
type: "extract"
input:
urls: # Manually copy top 5 URLs from previous results
- "https://company1.com"
- "https://company2.com"
- "https://company3.com"
- "https://company4.com"
- "https://company5.com"
extraction:
prompt: "Extract company details"
schema:
name: "string"
products: "array"- Faster Testing - Run small tests before big operations
- Easy Recovery - If something fails, you don't lose everything
- Credit Control - Monitor spending at each phase
- Quality Control - Review results before proceeding
- System Friendly - Avoids overwhelming the beta system
- Search steps: Max 20 results per query
- Extract from search: Max 10 URLs per step
- Direct URL extract: Max 5-10 URLs per step
- Total operations: Max 50 per file
# Step 1: Run your discovery phase
# In Replit Shell, type:
python main.py find_companies.yaml
# โ
Creates: report_20250724_143022.json and report_20250724_143022.md
# Step 2: Read your results
# Click on report_20250724_143022.md in the file list
# Review the companies found in nicely formatted tables
# Pick the best ones for deeper analysis
# Step 3: Create your next config file
# Click "New file" โ Name it "analyze_top5.yaml"
# Copy the URLs you want from the previous results
# Add them to your new config
# Step 4: Run the analysis
python main.py analyze_top5.yaml
# โ
Creates: report_20250724_144512.json and report_20250724_144512.md
# Step 5: Continue with more phases as needed
python main.py extract_products.yaml
# โ
Creates: report_20250724_145033.json and report_20250724_145033.md๐ก Replit Tips:
- Download reports: Click โฎ (three dots) โ Download
- Clear old files: Select files โ Delete (keeps your workspace clean)
- Share your Repl: Click "Share" to collaborate with others
- Files persist between sessions - come back anytime!
When chaining steps, you need to tell the tool where to find data. Here's how:
| What You Want | Pattern | Example |
|---|---|---|
| URL from search results | field: "url" |
Gets website link |
| Data you extracted | field: "data.company_name" |
Gets company name |
| Items from a list | field: "data.products[]" |
Gets all products |
| Nested information | field: "data.details.address" |
Gets address from details |
If your Step 1 results look like this:
{
"url": "https://example.com",
"title": "Example Company",
"data": {
"products": ["Shirts", "Pants", "Shoes"],
"location": "New York"
}
}Then in Step 2, you can access:
field: "url"โ Gets "https://example.com"field: "data.location"โ Gets "New York"field: "data.products"โ Gets the whole product list
company_research.yaml
report:
name: "Tech Startup Research"
industry: "Technology"
sections:
- name: "Find and Analyze"
steps:
# Find companies
- id: "search_startups"
type: "search"
input:
query: "AI startups San Francisco 2024"
limit: 10
filters:
time_range: "m" # Last month
# Analyze top 5
- id: "get_details"
type: "extract"
input:
from_step: "search_startups"
field: "url"
limit: 5 # Only top 5
extraction:
prompt: "What does this company do and how many employees?"
schema:
company: "string"
description: "string"
employee_count: "string"File 1: find_furniture_stores.yaml
report:
name: "Step 1 - Find Stores"
industry: "E-commerce"
sections:
- name: "Store Discovery"
steps:
- id: "search_stores"
type: "search"
input:
query: "online furniture stores USA"
limit: 15File 2: analyze_best_stores.yaml
report:
name: "Step 2 - Store Analysis"
industry: "E-commerce"
sections:
- name: "Store Details"
steps:
- id: "extract_store_info"
type: "extract"
input:
urls: # Copy best 5 URLs from Step 1 results
- "https://store1.com"
- "https://store2.com"
- "https://store3.com"
- "https://store4.com"
- "https://store5.com"
extraction:
prompt: "Find product categories and price ranges"
schema:
store_name: "string"
categories: "array"
price_range: "string"
ships_to: "string"The tool generates TWO files for every run - one for machines, one for humans!
File: report_YYYYMMDD_HHMMSS.json
{
"metadata": {
"report_name": "Competitive Analysis",
"industry": "Athletic Footwear",
"generated": "2024-01-15T10:30:00Z",
"status": "completed",
"total_credits": 45
},
"sections": [
{
"name": "Competitor Analysis",
"credits_used": 20,
"steps": [
{
"id": "extract_homepages",
"type": "extract",
"credits_used": 20,
"input_count": 5,
"output_count": 5,
"data": [
{
"url": "https://nike.com",
"title": "Nike. Just Do It",
"data": {
"brand_name": "Nike",
"tagline": "Just Do It",
"products_found": 150
}
}
]
}
]
}
]
}File: report_YYYYMMDD_HHMMSS.md
Report: Competitive Analysis
Industry: Athletic Footwear
Generated: 2024-01-15T10:30:00Z
Status: completed
Total Credits Used: 45
Credits Used: 20
Input Count: 5 | Output Count: 5 | Credits: 20
| URL | brand_name | tagline | products_found |
|---|---|---|---|
| https://nike.com | Nike | Just Do It | 150 |
| https://adidas.com | Adidas | Impossible is Nothing | 200 |
| https://puma.com | Puma | Forever Faster | 120 |
| https://reebok.com | Reebok | Be More Human | 80 |
| https://underarmour.com | Under Armour | Protect This House | 175 |
End of Report
๐ก Tip: Open the MARKDOWN (.md) file in any text editor or markdown viewer to see formatted tables!
-
Make sure your Replit has:
main.pywith your API key addedconfig.yamlwith your playbook configuration
-
Run your playbook:
- Click the big green "Run" button
- Watch the output in the Console tab
- Your report files will appear in the file list
-
View your results:
- Click on
report_[timestamp].mdto read your markdown report - Download files: Click โฎ (three dots) โ Download
- Click on
-
Run different playbooks:
- To run a different config file, type in Shell:
python main.py my_other_config.yaml
# Run with default config.yaml
python main.py
# Run with specific config file
python main.py my_analysis.yaml
# Enable debug mode (shows detailed API responses)
python main.py --debug config.yamlWhen you run a playbook, you'll see progress updates:
Loading configuration...
๐ Output files: report_20250724_143022.json & report_20250724_143022.md
[1/2] Processing: Company Discovery...
๐ Processing Pipeline: Company Discovery
Total steps: 1
Step 1/1: search_companies (search)
Processing 1 inputs...
[1/1] Query: sustainable fashion brands...
โ Searching: sustainable fashion brands...
โ Found results (credits: 5)
โ Saved: 10 results
โ Section 1/2 complete
โ
Analysis complete! Output saved to:
๐ JSON: report_20250724_143022.json
๐ Markdown: report_20250724_143022.md
Total credits used: 5
The playbook continues processing even if individual steps fail:
{
"id": "failed_step",
"type": "extract",
"error": "Failed to extract from URL",
"credits_used": 0,
"data": []
}"ModuleNotFoundError: No module named 'requests'"
- Click the "Packages" icon (cube) in Replit
- Search for "requests" and "pyyaml"
- Click install on both
- Or add this to the top of main.py:
import os os.system('pip install requests pyyaml')
"Invalid API key"
- Check line ~500 in main.py:
API_KEY = "your_key_here" - Make sure you replaced it with your actual API key from Resets.ai
- Don't include quotes within quotes
"File not found: config.yaml"
- Make sure the file is named exactly
config.yaml(notconfig.yml) - Check it's in the main folder, not in a subfolder
- In Replit, you should see it in the file list on the left
Getting empty results?
- Try more specific search queries
- Start with the example configs in this guide
- Check if websites are accessible (some block automated access)
- Use smaller limits (5-10) while testing
"python: command not found"
- You need Python installed: Visit python.org
- Or just use Replit instead - no installation needed!
"pip: command not found"
- If Python is installed but pip isn't working:
- Windows:
python -m pip install requests pyyaml - Mac/Linux:
python3 -m pip install requests pyyaml - Or just use Replit!
# First test: Just 2-3 items
limit: 3Run one step at a time and check results before adding more
# Good prompt - clear and specific
prompt: "What products does this company sell?"
# Too complex for beta
prompt: "Analyze the entire business model, supply chain, and 5-year projections"Keep copies of your YAML files and results - you might need them later!
Check the "credits_used" in your output files regularly
When extracting data, you can have both flat fields and nested structures. The tool automatically organizes them into appropriate tables:
report:
name: "LinkedIn Job Analysis"
industry: "HR Tech"
sections:
- name: "Job Market Research"
steps:
- id: "extract_job_info"
type: "extract"
input:
urls:
- "https://linkedin.com/jobs/view/12345"
extraction:
prompt: "Extract comprehensive job information"
schema:
# Flat fields (will go in main table)
job_title: string
company_name: string
location: string
posted_date: string
# Nested structure (will get separate table)
Requirements:
experience: string
education: string
skills: array
# Another nested structure (another table)
Compensation:
salary_min: number
salary_max: number
benefits: arrayThis will create three tables in your markdown:
- Main Data - All flat fields (job_title, company_name, etc.)
- Requirements - Experience, education, skills
- Compensation - Salary and benefits information
If you're new to programming or just want to get started quickly:
Replit Benefits:
- โ No Installation - Works in any web browser
- โ No Setup - Python and packages are pre-configured
- โ Cloud Storage - Your files are saved automatically
- โ Easy Sharing - Send a link to collaborate
- โ Mobile Friendly - Works on tablets and phones too
- โ Free Plan - Generous free tier for personal projects
Traditional Installation Challenges:
- โ Installing Python correctly
- โ Setting up PATH variables
- โ Managing pip and packages
- โ Terminal/command line complexity
- โ Different commands for Windows/Mac/Linux
Bottom Line: Unless you're already comfortable with Python development, use Replit! You can always download your code and run it locally later if needed.
We welcome contributions! Please see our Contributing Guide for details.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Built with Resets.ai API
- Inspired by modern data pipeline architectures
- Community feedback and contributions
- ๐ง Email: hello@resets.ai
Built by the Resets Team
