diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..2ce6add --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,106 @@ +# CLAUDE.md + +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. + +## Overview + +This is a collection of example projects demonstrating Hyperbrowser SDK usage. Each example is a standalone TypeScript project showing different use cases for web automation, scraping, and AI-powered workflows. + +## Key Architecture Patterns + +### SDK Integration +All examples use the `@hyperbrowser/sdk` package as the core browser automation client. Common pattern: + +```typescript +import { Hyperbrowser } from '@hyperbrowser/sdk'; +const hbClient = new Hyperbrowser({ apiKey: process.env.HYPERBROWSER_API_KEY }); +``` + +### Example Structure +Most examples follow this structure: +- Single entry point TypeScript file (e.g., `example-name.ts`) +- `package.json` with dependencies +- README.md explaining the use case +- Requires `HYPERBROWSER_API_KEY` environment variable +- Many also require `OPENAI_API_KEY` for LLM features + +### Complex Multi-Module Examples +Some examples have more sophisticated architectures: + +**hb-intern-bot** (`hb-intern-bot/src/`): +- Pipeline architecture: scrape → normalize → score → summarize → output +- Multiple scrapers (`scraper/`) for different sources (HN, Reddit, ProductHunt, blogs) +- Event aggregation and scoring system +- Watch mode for continuous monitoring + +**llm-crawl**: +- CLI tool using `commander` for argument parsing +- Combines Hyperbrowser Crawl API + OpenAI LLM processing +- Supports multiple output formats (JSON, JSONL, FAISS database) +- Example: `llmcrawl "instruction" --json -o output.jsonl` + +**site2prompt**: +- Web content → prompt-ready format converter +- Token budgeting and deduplication +- Multiple output formats (JSONL, CSV, citations) +- Heuristic and LLM-based compression modes + +## Common Development Commands + +### Running Examples +Most examples are run with: +```bash +cd +npm install +npx tsx .ts +``` + +Or (if configured in package.json): +```bash +npm run dev +``` + +### Building TypeScript Projects +Examples with build scripts: +```bash +npm run build +``` + +### Environment Setup +Required for all examples: +```bash +export HYPERBROWSER_API_KEY="your-key" +``` + +Many examples also need: +```bash +export OPENAI_API_KEY="your-key" +``` + +## Dependencies + +Common dependencies across examples: +- `@hyperbrowser/sdk` - Core browser automation SDK +- `@hyperbrowser/agent` - Agent-based automation (some examples) +- `dotenv` - Environment variable management +- `typescript` + `ts-node` or `tsx` - TypeScript execution +- `openai` - OpenAI API client +- `@langchain/openai` - LangChain integration (some examples) +- `zod` - Schema validation (some examples) +- `commander` / `yargs` - CLI frameworks (CLI-based examples) +- `cheerio` - HTML parsing (scraping examples) + +## Important Notes + +- Each example is independent with its own `package.json` +- No shared code between examples (intentionally isolated for clarity) +- Examples demonstrate patterns, not production-ready code +- The SDK version may vary between examples (check individual `package.json` files) +- Some examples output to files, others to stdout +- API keys should be in `.env` file or environment variables, never committed + +## Resources + +- Documentation: https://docs.hyperbrowser.ai +- Discord: https://discord.gg/zsYzsgVRjh +- Support: info@hyperbrowser.ai \ No newline at end of file diff --git a/CUA-CTA-Validator/README.md b/CUA-CTA-Validator/README.md index c6f14c7..5ae813e 100644 --- a/CUA-CTA-Validator/README.md +++ b/CUA-CTA-Validator/README.md @@ -1,62 +1,160 @@ # CUA-CTA-Validator -A tool that uses Hyperbrowser's Conversational User Agent (OPENAI CUA) to validate and analyze Call-to-Action (CTA) buttons on websites. +**Built with [Hyperbrowser](https://hyperbrowser.ai)** -## Description +An intelligent CLI tool that uses Hyperbrowser's Conversational User Agent (CUA) with OpenAI to automatically validate and analyze Call-to-Action (CTA) buttons on websites. Perfect for UX audits, accessibility checks, and conversion optimization workflows. -This tool automatically identifies the main CTA button in a website's hero section and performs a comprehensive analysis of its accessibility and SEO characteristics. It then provides specific improvement suggestions based on this analysis. +## ✨ Features -## Features +- 🔍 **Smart CTA Detection**: Automatically identifies the primary CTA button in hero sections +- ♿ **Accessibility Analysis**: Evaluates color contrast, text clarity, and ARIA attributes +- 📊 **SEO Best Practices**: Checks positioning, semantic markup, and user experience +- 💡 **Actionable Insights**: Generates 3-5 specific improvement suggestions +- 🤖 **AI-Powered**: Uses Hyperbrowser's CUA agent for intelligent page analysis -- Automatically identifies the main CTA button in a website's hero section -- Analyzes CTA buttons for accessibility and SEO best practices -- Checks color contrast, text clarity, and positioning -- Provides 3-5 specific improvement suggestions +## 🔧 Installation -## Requirements +1. Install dependencies: +```bash +npm install +``` -- Node.js -- Hyperbrowser API key (get your API keys at hyperbrowser.ai) +2. **Get an API key** at [https://hyperbrowser.ai](https://hyperbrowser.ai) -## Installation +3. Set up environment variables: +```bash +# Create a .env file +echo "HYPERBROWSER_API_KEY=your_key_here" > .env +``` -1. Clone the repository -2. Install dependencies: +## 🚀 Quick Start ```bash +# Install dependencies npm install + +# Set environment variable +export HYPERBROWSER_API_KEY="your_key_here" + +# Run with URL argument +npx tsx cua-cta-validator.ts https://example.com + +# Or run interactively (will prompt for URL) +npx tsx cua-cta-validator.ts ``` -3. Create a `.env` file in the root directory with your Hyperbrowser API key: +## 💡 Usage Examples +### Validate a Landing Page +```bash +npx tsx cua-cta-validator.ts https://productland.com +``` + +### Analyze a SaaS Homepage +```bash +npx tsx cua-cta-validator.ts https://yourapp.com/pricing ``` -HYPERBROWSER_API_KEY=your_api_key_here + +### Check E-commerce CTA +```bash +npx tsx cua-cta-validator.ts https://shop.example.com ``` -## Usage +## 🎯 What Gets Analyzed + +The tool performs a three-step analysis: + +### Step 1: CTA Identification +- Locates the primary CTA button in the hero section +- Ignores secondary CTAs and other page sections +- Captures button text, styling, and position -Run the validator by providing a URL as a command-line argument: +### Step 2: Accessibility & SEO Analysis +- **Color Contrast**: WCAG compliance check +- **Text Clarity**: Readability and action-oriented copy +- **Positioning**: Visual hierarchy and fold placement +- **Semantic HTML**: Proper button/link usage +- **Mobile Responsiveness**: Touch target size + +### Step 3: Improvement Suggestions +- 3-5 actionable recommendations +- Prioritized by impact on conversion +- Based on industry best practices + +## 🔑 Environment Variables ```bash -npm start -- https://example.com +HYPERBROWSER_API_KEY # Required - Get at https://hyperbrowser.ai ``` -Or run without arguments to be prompted for a URL: +## 🏗️ How It Works -```bash -npm start +1. **Session Creation**: Initializes a Hyperbrowser session with CUA agent +2. **Navigation**: Opens the target URL in a real browser environment +3. **CTA Detection**: AI agent identifies the primary hero CTA button +4. **Analysis**: Evaluates accessibility, SEO, and UX best practices +5. **Suggestions**: Generates specific, actionable improvements +6. **Cleanup**: Closes the browser session + +## 📊 Output Format + +``` +Starting CTA Validator Agent... +Validating CTA buttons on: https://example.com + +Step 1: Identifying CTA button in hero section... +Output: +Main CTA button found: "Get Started Free" - Blue button with white text... + +Step 2: Analyzing CTA for accessibility and SEO... +Analysis: +• Color contrast: 4.8:1 (passes WCAG AA) +• Text clarity: Action-oriented and clear +• Positioning: Above the fold, centered... + +Step 3: Generating improvement suggestions... +Suggestions: +1. Increase color contrast to 7:1 for WCAG AAA compliance +2. Add aria-label for better screen reader support +3. Consider A/B testing "Start Free Trial" for higher conversion +4. Increase button size for better mobile touch targets +5. Add subtle animation to draw attention ``` -## How It Works +## 🎛️ Technical Details + +- **Agent**: Uses `@hyperbrowser/sdk` with CUA (Conversational User Agent) +- **Runtime**: TypeScript with `tsx` for execution +- **Schema Validation**: Zod for type-safe output parsing +- **Session Management**: Keeps browser open between steps for efficiency +- **Max Steps**: Configurable limits per analysis phase (15-20 steps) + +## 🎯 Use Cases -1. The tool connects to the Hyperbrowser API using your API key -2. It navigates to the specified URL -3. Step 1: Identifies the main CTA button in the hero section -4. Step 2: Analyzes the CTA for accessibility and SEO best practices -5. Step 3: Generates specific improvement suggestions +- **UX Audits**: Validate CTA effectiveness across multiple pages +- **A/B Testing Prep**: Identify improvement opportunities before testing +- **Accessibility Compliance**: Ensure CTAs meet WCAG standards +- **Competitor Analysis**: Benchmark your CTAs against competitors +- **Conversion Optimization**: Data-driven suggestions for higher conversions + +## 📦 Dependencies + +- `@hyperbrowser/sdk` - Official Hyperbrowser SDK for browser automation +- `@hyperbrowser/agent` - CUA agent capabilities +- `dotenv` - Environment variable management +- `zod` - Runtime type validation +- `typescript` + `ts-node` - TypeScript execution + +## 🔄 Development + +```bash +# Run in development mode +npx tsx cua-cta-validator.ts https://example.com + +# With verbose output +DEBUG=* npx tsx cua-cta-validator.ts https://example.com +``` -## Dependencies +--- -- @hyperbrowser/sdk - For interacting with Hyperbrowser -- dotenv - For loading environment variables -- zod - For runtime type checking +Follow [@hyperbrowser](https://x.com/hyperbrowser) for updates. diff --git a/ChatWithWebsite-Scrape/README.md b/ChatWithWebsite-Scrape/README.md index 88db192..2d7ffa3 100644 --- a/ChatWithWebsite-Scrape/README.md +++ b/ChatWithWebsite-Scrape/README.md @@ -1,67 +1,118 @@ # Chat With Any Website -Welcome to Chat With Any Website, an official example project powered by the [Hyperbrowser API](https://hyperbrowser.ai/)! This tool demonstrates how easy it is to leverage Hyperbrowser's advanced web scraping capabilities and combine them with OpenAI's conversational AI—all from your terminal. +**Built with [Hyperbrowser](https://hyperbrowser.ai)** - -👉 **Get your free API key today at [hyperbrowser.ai](https://hyperbrowser.ai/)!** +An interactive CLI tool that scrapes any webpage using Hyperbrowser SDK and enables conversational AI-powered chat about the content using OpenAI. Perfect for quick website analysis, research, and content exploration. ## Features -- Scrape any webpage using the Hyperbrowser SDK -- Chat with an OpenAI-powered assistant about the scraped content -- Simple command-line interface +- Web Scraping: Uses Hyperbrowser's official SDK with `client.scrape.startAndWait()` +- Interactive Chat: OpenAI-powered conversational interface about scraped content +- Markdown Support: Automatically extracts and uses markdown format when available +- Simple CLI: Easy-to-use readline-based terminal interface -## Requirements +## Installation -- Node.js (v18 or higher recommended) -- npm or yarn -- API keys for [Hyperbrowser](https://hyperbrowser.ai/) and [OpenAI](https://platform.openai.com/) - -## Setup - -1. **Clone the repository:** - ```bash - git clone - cd ChatWithWebsite-Scrape - ``` -2. **Install dependencies:** - ```bash - npm install - # or - yarn install - ``` -3. **Set up your API keys:** - - Sign up at [hyperbrowser.ai](https://hyperbrowser.ai/) to get your free Hyperbrowser API key. - - Get your OpenAI API key from [platform.openai.com](https://platform.openai.com/). - - Create a `.env` file in the root directory with the following content: - ```env - HYPERBROWSER_API_KEY=your_hyperbrowser_api_key - OPENAI_API_KEY=your_openai_api_key - ``` +1. Install dependencies: +```bash +npm install +``` -## Usage +2. **Get an API key** at [https://hyperbrowser.ai](https://hyperbrowser.ai) -Run the script using ts-node or compile it with tsc: +3. Set up environment variables: +```bash +# Create .env file with: +HYPERBROWSER_API_KEY=your_hyperbrowser_api_key +OPENAI_API_KEY=your_openai_api_key +``` + +## Quick Start ```bash +# Install dependencies +npm install + +# Set environment variables +export HYPERBROWSER_API_KEY="your_key_here" +export OPENAI_API_KEY="your_key_here" + +# Run the tool npx ts-node scrapechat.ts ``` -- Enter the URL you want to scrape when prompted. -- Chat with the AI about the page content. Type `exit` to quit. +## Usage -## Notes +```bash +npx ts-node scrapechat.ts +``` + +1. Enter the URL you want to scrape when prompted +2. Wait for the scraping to complete +3. Ask questions about the page content +4. Type `exit` to quit the chat + +### Example Session + +``` +Enter the URL to scrape: https://example.com/article +Scrape result received. Starting chat... + +💬 Chat mode: Ask anything about the page (type "exit" to quit + +You: What is the main topic of this article? +AI: The article discusses... + +You: Can you summarize the key points? +AI: Here are the main points... + +You: exit +👋 Chat ended. +``` + +## How It Works + +1. **Scrape**: Fetches webpage content using Hyperbrowser SDK +2. **Extract**: Prioritizes markdown format, falls back to HTML or raw data +3. **Chat**: Maintains conversation context with OpenAI GPT-5 +4. **Interact**: Simple readline interface for asking questions -- Make sure your API keys have sufficient quota. -- This tool is for educational and personal use. Please respect website terms of service when scraping. -- For more advanced scraping, batch jobs, or custom integrations, check out the [Hyperbrowser API docs](https://hyperbrowser.ai/docs). +## Code Structure -## About Hyperbrowser +- **`scrapechat.ts`**: Main entry point with scraping and chat logic +- Uses `@hyperbrowser/sdk` for reliable web scraping +- Uses `openai` for conversational AI +- Uses `readline-sync` for terminal input -Hyperbrowser is trusted by developers and enterprises worldwide for real-time, reliable web data extraction. Our mission is to make the web programmable for everyone. Join our community and supercharge your projects with the best web scraping API available! +## Environment Variables + +```bash +HYPERBROWSER_API_KEY # Get at https://hyperbrowser.ai +OPENAI_API_KEY # Get at https://platform.openai.com +``` + +## Requirements + +- Node.js v18 or higher +- npm or yarn +- Active Hyperbrowser API key +- Active OpenAI API key + +## Use Cases + +- **Research**: Quickly analyze and query academic papers or articles +- **Content Analysis**: Extract insights from blog posts and documentation +- **Competitive Intelligence**: Explore competitor websites interactively +- **Learning**: Ask questions about technical documentation +- **Data Extraction**: Conversational approach to finding specific information + +## Notes -👉 **Sign up now and get started for free at [hyperbrowser.ai](https://hyperbrowser.ai/)!** +- The tool uses OpenAI's GPT-5 model by default +- Conversation history is maintained throughout the session +- Supports markdown, HTML, and JSON content formats +- Respects website terms of service when scraping -## License +--- -MIT +Follow [@hyperbrowser](https://x.com/hyperbrowser) for updates. diff --git a/Extract-github-analyzer/README.md b/Extract-github-analyzer/README.md index 5fde0b6..e4827d7 100644 --- a/Extract-github-analyzer/README.md +++ b/Extract-github-analyzer/README.md @@ -1,51 +1,143 @@ +**Built with [Hyperbrowser](https://hyperbrowser.ai)** + # GitHub Profile Analyzer -This tool analyzes GitHub profiles to extract information about a user's tech stack, programming languages, frameworks, tools, and repositories. +An interactive command-line tool that analyzes GitHub user profiles to extract comprehensive information about a developer's tech stack, programming languages, frameworks, tools, and top repositories using AI-powered extraction. ## Features -- Extracts primary programming languages used -- Identifies frameworks and tools in repositories -- Lists and summarizes top repositories -- Handles GitHub profile analysis with captcha solving +- 🔍 **Smart Profile Analysis**: Automatically extracts tech stack information from any public GitHub profile +- 💻 **Language Detection**: Identifies primary programming languages used across repositories +- 🛠️ **Framework & Tool Discovery**: Detects frameworks, libraries, and development tools +- 📊 **Repository Insights**: Lists and summarizes top repositories with descriptions +- 🤖 **AI-Powered Extraction**: Uses Hyperbrowser's Extract API with structured schema validation +- 🔐 **Captcha Handling**: Built-in proxy support and captcha solving for reliable access +- ⌨️ **Interactive CLI**: User-friendly command-line interface with prompts ## Prerequisites -- Node.js and npm installed -- HyperBrowser API key (get yours at [hyperbrowser.ai](https://hyperbrowser.ai)) +- Node.js (v18 or later) +- Hyperbrowser API key - Get yours at [hyperbrowser.ai](https://hyperbrowser.ai) -## Setup +## Quick Start -1. Clone this repository -2. Install dependencies: - ``` +1. **Install dependencies:** + ```bash npm install ``` -3. Create a `.env` file in the project root with your API key: + +2. **Set up environment variables:** + Create a `.env` file in the project root: + ```env + HYPERBROWSER_API_KEY=your_hyperbrowser_api_key_here ``` - HYPERBROWSER_API_KEY=your_api_key_here + +3. **Run the analyzer:** + ```bash + npx tsx github-summarizer.ts ``` -## Usage +4. **Enter a GitHub username** when prompted, and the tool will analyze their profile + +## Usage Example + +```bash +$ npx tsx github-summarizer.ts +Enter a GitHub username: torvalds +Analyzing GitHub profile: https://github.com/torvalds + +Tech Stack Analysis: { + "username": "torvalds", + "primaryLanguages": ["C", "Shell", "Makefile"], + "frameworks": [], + "tools": ["Git", "Linux Kernel"], + "repositories": [ + { + "name": "linux", + "summary": "Linux kernel source tree" + }, + { + "name": "subsurface", + "summary": "Dive log program" + } + ] +} +``` -Run the tool with: +## How It Works +1. **User Input**: Prompts for a GitHub username via interactive CLI +2. **URL Construction**: Builds the GitHub profile URL from the username +3. **AI Extraction**: Uses Hyperbrowser's Extract API with a structured Zod schema to: + - Navigate to the user's GitHub profile page + - Extract structured information about repositories and tech stack + - Parse languages, frameworks, tools, and repository details +4. **Output**: Returns formatted JSON with comprehensive profile analysis + +## Code Structure + +**Main file**: `github-summarizer.ts` + +Key components: +- **Schema Definition**: Zod schema for structured data extraction + ```typescript + { + username: string, + primaryLanguages: string[], + frameworks: string[], + tools: string[], + repositories: Array<{ name: string, summary?: string }> + } + ``` +- **Interactive CLI**: Uses Node.js `readline` for user input +- **Hyperbrowser Integration**: Extract API with proxy and captcha solving enabled +- **Error Handling**: Graceful error handling for network issues or invalid usernames + +## Configuration Options + +The extraction can be customized by modifying the Hyperbrowser session options: + +```typescript +sessionOptions: { + useProxy: true, // Use proxy for reliable access + solveCaptchas: true, // Automatically solve captchas if encountered +} ``` -npm start + +You can also customize the extraction prompt to gather different information: + +```typescript +prompt: "Summarize their tech stack, their main languages, frameworks, tools, contributions, and top repositories." ``` -You'll be prompted to enter a GitHub username, and the tool will analyze their profile and display the results. +## Dependencies -## How It Works +- **[@hyperbrowser/sdk](https://www.npmjs.com/package/@hyperbrowser/sdk)** - Official Hyperbrowser SDK for web scraping and extraction +- **[zod](https://www.npmjs.com/package/zod)** - TypeScript-first schema validation for structured data +- **[dotenv](https://www.npmjs.com/package/dotenv)** - Environment variable management -This tool uses HyperBrowser's extraction capabilities to analyze GitHub profiles by: +## Error Handling -1. Navigating to the user's GitHub page -2. Using AI to extract structured information about their repositories and tech stack -3. Returning the data in a structured JSON format +The tool handles common errors gracefully: +- Empty username validation +- Network connectivity issues +- Invalid GitHub usernames +- Rate limiting (via proxy support) +- Captcha challenges (automatic solving) -## Dependencies +## Important Notes + +- Only analyzes **public** GitHub profiles +- Requires an active Hyperbrowser API key with available credits +- Respects GitHub's rate limits through proxy rotation +- Extraction quality depends on the profile's public information visibility + +## Resources + +- Hyperbrowser Documentation: [https://docs.hyperbrowser.ai](https://docs.hyperbrowser.ai) +- Hyperbrowser Discord: [https://discord.gg/zsYzsgVRjh](https://discord.gg/zsYzsgVRjh) +- Support: info@hyperbrowser.ai + +--- -- [@hyperbrowser/sdk](https://www.npmjs.com/package/@hyperbrowser/sdk) - HyperBrowser SDK -- [dotenv](https://www.npmjs.com/package/dotenv) - Environment variable management -- [zod](https://www.npmjs.com/package/zod) - Schema validation +Follow [@hyperbrowser](https://x.com/hyperbrowser) for updates. diff --git a/Internet-zip/README.md b/Internet-zip/README.md index 5bf071b..25e97de 100644 --- a/Internet-zip/README.md +++ b/Internet-zip/README.md @@ -1,53 +1,184 @@ **Built with [Hyperbrowser](https://hyperbrowser.ai)** -### Internet Zip (kzip) -Minimal CLI that scrapes any URL with Hyperbrowser and compresses it into a semantic knowledge shard (`.kzip.json`). Great for building growth-data archives from live pages (e.g., auto-curated summaries, link graphs, and key points to power sign-ups and content workflows). +# Internet Zip (kzip) -### Get an API key -- Get your key at `https://hyperbrowser.ai` +Minimal CLI tool that scrapes any URL with Hyperbrowser and compresses it into a semantic knowledge shard (`.kzip.json`). Perfect for building growth-data archives from live pages - think auto-curated summaries, link graphs, and key points to power sign-ups and content workflows. -### Setup +## Features + +- Web scraping via Hyperbrowser SDK +- AI-powered semantic compression and extraction +- Automatic link discovery and cataloging +- Structured JSON output with metadata +- Configurable output paths +- Compression ratio reporting + +## Use Cases + +- Building knowledge bases from web content +- Creating searchable content archives +- Extracting key insights from documentation +- Monitoring and tracking page changes over time +- Generating structured data for LLM workflows +- Content summarization pipelines + +## Prerequisites + +- Node.js (v18 or later) +- **Hyperbrowser API Key**: Get yours at [hyperbrowser.ai](https://hyperbrowser.ai) + +## Quick Start + +1. **Install dependencies:** + ```bash + npm install + ``` + +2. **Set up environment variables:** + Create a `.env` file: + ```env + HYPERBROWSER_API_KEY=your_hyperbrowser_api_key + ``` + +3. **Run the tool:** + ```bash + npx ts-node kzip.ts [--out file.kzip.json] + ``` + +## Usage + +### Basic Usage + +Scrape a URL and save with default filename (based on hostname): ```bash -npm i -npm i -D @types/node +npx ts-node kzip.ts https://news.ycombinator.com +# Output: news.ycombinator.com.kzip.json ``` -Create a `.env` in this folder: +### Custom Output Path + +Specify a custom output filename: ```bash -echo "HYPERBROWSER_API_KEY=YOUR_KEY_HERE" > .env +npx ts-node kzip.ts https://example.com/blog/post --out my-article.kzip.json ``` -### Quick start +### More Examples + ```bash -npx ts-node kzip.ts [--out file.kzip.json] +# Scrape documentation +npx ts-node kzip.ts https://docs.hyperbrowser.ai --out hyperbrowser-docs.kzip.json -# Example -npx ts-node kzip.ts https://news.ycombinator.com --out hn.kzip.json +# Archive a blog post +npx ts-node kzip.ts https://blog.example.com/article-title --out blog-archive.kzip.json + +# Save news articles +npx ts-node kzip.ts https://news.ycombinator.com --out hn-frontpage.kzip.json ``` -### What it does -- **Scrape**: Uses Hyperbrowser SDK `scrape.startAndWait(...)` to fetch HTML and metadata -- **Compress**: Calls `extract.startAndWait(...)` to generate summary, 5 key points, and outbound links -- **Save**: Writes `.kzip.json` (or `--out`), printing compression ratio +## How It Works + +1. **Scrape**: Uses Hyperbrowser SDK `scrape.startAndWait()` to fetch HTML content and metadata +2. **Extract**: Calls `extract.startAndWait()` with AI to generate: + - Page title + - Comprehensive summary paragraph + - 5 key bullet points + - All outbound links (deduplicated and sorted) +3. **Compress**: Saves structured JSON output and calculates compression ratio +4. **Report**: Displays file location and compression statistics + +## Output Format + +Each `.kzip.json` file contains: -### Output format (.kzip.json) ```json { "url": "https://example.com/", "scrapedAt": "2025-01-01T00:00:00.000Z", - "title": "...", - "summary": "...", - "keyPoints": ["..."], - "outboundLinks": ["https://..."], + "title": "Example Page Title", + "summary": "A comprehensive summary of the page content...", + "keyPoints": [ + "First key insight", + "Second key insight", + "Third key insight", + "Fourth key insight", + "Fifth key insight" + ], + "outboundLinks": [ + "https://example.com/link1", + "https://example.com/link2" + ], "rawSize": 12345, "compressedSize": 1111 } ``` -### Notes -- Uses only official Hyperbrowser SDK methods (`@hyperbrowser/sdk`). No mock data. -- Default output name is `.kzip.json` if `--out` is omitted. +### Field Descriptions + +- `url`: Original URL that was scraped +- `scrapedAt`: ISO timestamp of when scraping occurred +- `title`: Extracted page title +- `summary`: AI-generated paragraph summarizing the content +- `keyPoints`: Array of 5 key insights from the page +- `outboundLinks`: Deduplicated array of all outbound URLs found on the page +- `rawSize`: Size of raw HTML in bytes +- `compressedSize`: Size of final JSON file in bytes + +## Code Structure + +**Main file**: `kzip.ts` + +Key components: +- CLI parser using `commander` for argument handling +- Hyperbrowser SDK integration for scraping and extraction +- JSON schema definition for structured AI extraction +- File system operations for output management + +**Dependencies**: +- `@hyperbrowser/sdk` - Official Hyperbrowser SDK for web scraping +- `commander` - CLI argument parsing +- `dotenv` - Environment variable management +- `typescript` - TypeScript support + +## Command Line Options + +```bash +kzip.ts [options] + +Arguments: + url URL to scrape (required) + +Options: + --out Output filename (must end with .kzip.json) + Default: .kzip.json + -h, --help Display help information +``` + +## Error Handling + +The tool includes comprehensive error handling for: +- Missing API key validation +- Invalid URL format detection +- Failed scraping attempts +- Missing HTML content +- Invalid output filename extensions +- File system write errors + +## Performance + +- Typical compression ratios: 10-50x (depending on page content) +- Processing time: 5-15 seconds per URL (varies by page size and AI processing) +- Output file sizes: Usually 1-5 KB per page + +## Important Notes + +- Uses only official Hyperbrowser SDK methods - no mock data or workarounds +- Output filename must end with `.kzip.json` extension if using `--out` option +- Default filename is automatically generated from hostname if `--out` is omitted +- Links are automatically deduplicated and sorted alphabetically +- AI extraction uses structured schema for consistent output format + +--- -Follow @hyperbrowser for updates. +Follow [@hyperbrowser](https://twitter.com/hyperbrowser) for updates diff --git a/Maps-lead-finder/README.md b/Maps-lead-finder/README.md index 88e0825..1c74d19 100644 --- a/Maps-lead-finder/README.md +++ b/Maps-lead-finder/README.md @@ -1,67 +1,166 @@ # Maps Lead Finder -This project is a command-line tool that uses HyperAgent and OpenAI's GPT-4o to find business leads from Google Maps for a specified area and business type. It returns a list of businesses with their name, address, and contact information. +**Built with [Hyperbrowser](https://hyperbrowser.ai)** + +An intelligent CLI tool that automates business lead generation from Google Maps using HyperAgent AI. Simply specify a location and business type, and let AI navigate Google Maps to extract structured business information including names, addresses, and contact details. ## Features -- Interactive CLI for user input (area and business type) -- Uses HyperAgent and OpenAI LLM to automate Google Maps search -- Outputs a structured list of business leads +- 🤖 **AI-Powered Automation**: Uses HyperAgent with OpenAI GPT-4o for intelligent browser control +- 🗺️ **Google Maps Integration**: Automatically searches and extracts business data from Google Maps +- 💼 **Lead Extraction**: Returns structured business information (name, address, contact details) +- 🎯 **Interactive CLI**: Simple prompts for location and business type input +- ✅ **Schema Validation**: Uses Zod for type-safe output validation +- 📊 **Structured Output**: Returns clean, validated business data ready for CRM import -## Requirements +## Prerequisites - Node.js (v18 or higher recommended) -- An OpenAI API key +- API keys for: + - **Hyperbrowser**: Get yours at [hyperbrowser.ai](https://hyperbrowser.ai) + - **OpenAI**: Create at [platform.openai.com](https://platform.openai.com) -## Installation +## Quick Start -1. Clone this repository: +1. **Install dependencies:** ```bash - git clone - cd Maps-lead-finder + npm install ``` -2. Install dependencies: + +2. **Set up environment variables:** + + Create a `.env` file (use `.env.example` as template): + ```env + HYPERBROWSER_API_KEY=your_hyperbrowser_api_key + OPENAI_API_KEY=your_openai_api_key + ``` + +3. **Run the lead finder:** ```bash - npm install + npx tsx maps-lead-finder.ts ``` -## Environment Variables +## Usage -Create a `.env` file in the root directory with the following variable: +When you run the script, you'll be prompted for two inputs: +```bash +Enter the area to search (e.g. 'San Francisco, CA'): New York, NY +Enter the business type to find (e.g. 'restaurants'): coffee shops ``` -OPENAI_API_KEY=your_openai_api_key_here + +The AI agent will then: +1. Navigate to Google Maps +2. Search for the specified business type in the given location +3. Extract data from at least 5 businesses +4. Return structured results + +## Example Output + +```json +[ + { + "name": "Blue Bottle Coffee", + "address": "160 Berry St, Brooklyn, NY 11249", + "contact": "(510) 653-3394" + }, + { + "name": "La Colombe Coffee Roasters", + "address": "400 Lafayette St, New York, NY 10003", + "contact": "(212) 677-5834" + }, + // ... more businesses +] ``` -You can use the provided `.env.example` as a template. +## How It Works -## Usage +1. **User Input**: Interactive prompts collect location and business type +2. **AI Navigation**: HyperAgent controls a browser to search Google Maps +3. **Data Extraction**: AI identifies and extracts business information from search results +4. **Schema Validation**: Zod ensures output matches expected structure +5. **Results Display**: Validated business leads are displayed in the terminal -Run the script using: +## Code Structure -```bash -npx tsx maps-lead-finder.ts -``` +**Main file**: `maps-lead-finder.ts` -Or, if you have `ts-node` installed: +Key components: +- `getUserInput()` - Interactive CLI prompts using readline +- `schema` - Zod validation schema for business data structure +- `HyperAgent` - AI-powered browser automation agent +- `ChatOpenAI` - OpenAI GPT-4o model for intelligent task execution -```bash -npx ts-node maps-lead-finder.ts -``` +**Dependencies**: +- `@hyperbrowser/agent` - AI agent for browser automation +- `@langchain/openai` - OpenAI integration via LangChain +- `zod` - Runtime type validation and schema enforcement +- `dotenv` - Environment variable management +- `readline` - Node.js built-in for CLI input + +## Configuration -You will be prompted to enter the area and business type you want to search for. +### Modify Search Parameters -## Example +You can adjust the number of results or customize the search prompt: +```typescript +const result = await agent.executeTask( + `Navigate to google maps, find the businesses in the area of ${area} and ${businessType}. Return a list of at least 10 businesses with their name, address and contact information.`, // Change "5" to "10" + { + outputSchema: schema, + } +); ``` -Enter the area to search (e.g. 'San Francisco, CA'): New York, NY -Enter the business type to find (e.g. 'restaurants'): coffee shops + +### Customize Output Schema + +Extend the Zod schema to capture additional fields: + +```typescript +const schema = z.object({ + businesses: z.array( + z.object({ + name: z.string(), + address: z.string(), + contact: z.string(), + website: z.string().optional(), // Add website + rating: z.number().optional(), // Add rating + }), + ), +}); ``` -## Output +## Use Cases + +- 🎯 **Lead Generation**: Find potential business clients in specific industries +- 📍 **Market Research**: Discover competitor locations and contact information +- 📧 **Sales Prospecting**: Build targeted outreach lists for cold emails +- 🗺️ **Location Analysis**: Map business density in different areas +- 📊 **Database Building**: Populate CRM systems with local business data + +## Important Notes + +- The script waits for complete execution before returning results +- Progress is displayed step-by-step via the `onStep` callback +- Agent automatically closes after task completion +- Ensure stable internet connection for reliable Google Maps access +- Results depend on what's publicly available on Google Maps + +## Troubleshooting + +**API Key Issues:** +- Verify both API keys are set correctly in `.env` +- Check that your OpenAI account has available credits + +**No Results Returned:** +- Try more specific location (e.g., "Manhattan, New York" vs "New York") +- Use common business type terms (e.g., "restaurants" not "eateries") -A list of at least 5 businesses with their name, address, and contact information will be displayed. +**Agent Timeout:** +- Google Maps may be slow to load; be patient +- Check your internet connection -## License +--- -MIT +Follow [@hyperbrowser](https://twitter.com/hyperbrowser) for updates diff --git a/README.md b/README.md index 5e8771a..8ba3071 100644 --- a/README.md +++ b/README.md @@ -1,21 +1,164 @@ -# Examples Repository +# Hyperbrowser Examples -This directory contains examples of how to use the various examples with Hyperbrowser. +A comprehensive collection of example projects demonstrating various use cases for the [Hyperbrowser SDK](https://docs.hyperbrowser.ai). Each example is a standalone TypeScript project showcasing different patterns for web automation, scraping, and AI-powered workflows. -The examples contains the following: +## 🚀 Quick Start -- Article TTS -- Product Finder +All examples require a Hyperbrowser API key. Get yours at [Hyperbrowser](https://app.hyperbrowser.ai). -## Usage: +```bash +# Set your API key +export HYPERBROWSER_API_KEY="your-key-here" -All examples require a Hyperbrowser API key. You can get one by signing up at [Hyperbrowser](https://app.hyperbrowser.ai). +# Many examples also require OpenAI +export OPENAI_API_KEY="your-openai-key" -## Support +# Navigate to any example and run +cd +npm install +npx tsx .ts +``` -If you have questions or need help with the examples: +## 📚 Examples by Category -- Check the documentation at [Hyperbrowser Docs](https://docs.hyperbrowser.ai) -- Open an issue in this repository -- Contact us at [info@hyperbrowser.ai](mailto:info@hyperbrowser.ai) -- Join our [Discord server](https://discord.gg/zsYzsgVRjh) +### 🤖 AI-Powered Agents +- **[agents](./agents)** - Base agent implementations and patterns +- **[agi-newsletter](./agi-newsletter)** - AI newsletter generator +- **[hb-intern-bot](./hb-intern-bot)** - Multi-source tech news aggregator with scoring +- **[research-bot](./research-bot)** - Automated research and data gathering +- **[vibe-posting-bot](./vibe-posting-bot)** - Social media content generation + +### 🔍 Web Scraping & Data Extraction +- **[llm-crawl](./llm-crawl)** - CLI tool combining crawling with LLM processing +- **[site2prompt](./site2prompt)** - Convert websites to prompt-ready formats +- **[site2rag](./site2rag)** - Website content for RAG applications +- **[meta-scraper](./meta-scraper)** - Extract metadata from web pages +- **[dataset-assmbler](./dataset-assmbler)** - Build datasets from web sources +- **[oss-web-extractor](./oss-web-extractor)** - Extract open-source project information +- **[Extract-github-analyzer](./Extract-github-analyzer)** - Analyze GitHub repositories + +### 🛒 E-commerce & Business +- **[product-search](./product-search)** - Product discovery and comparison +- **[real-estate-finder](./real-estate-finder)** - Real estate listing aggregation +- **[Maps-lead-finder](./Maps-lead-finder)** - Lead generation from Maps data +- **[company-researcher](./company-researcher)** - Company information gathering +- **[competitor-analyzer-bot](./competitor-analyzer-bot)** - Competitive intelligence + +### 📰 Content & Media +- **[article-tts](./article-tts)** - Article to text-to-speech conversion +- **[tweet-fetcher](./tweet-fetcher)** - Twitter/X content retrieval +- **[crypto-news-bot](./crypto-news-bot)** - Cryptocurrency news aggregation +- **[Trend-summary](./Trend-summary)** - Trending topics summarization +- **[resource-summary](./resource-summary)** - Web resource summarization + +### 🧪 Testing & Analysis +- **[CUA-CTA-Validator](./CUA-CTA-Validator)** - Call-to-action validation +- **[dark-pattern-finder](./dark-pattern-finder)** - Detect deceptive UI patterns +- **[scam-scanner-bot](./scam-scanner-bot)** - Identify potential scams +- **[SEO-Analyzer](./SEO-Analyzer)** - SEO analysis and recommendations +- **[hb-ui-bot](./hb-ui-bot)** - UI testing and validation + +### 🗂️ Data Management +- **[ragzip](./ragzip)** - RAG-optimized content packaging +- **[Internet-zip](./Internet-zip)** - Web content archival +- **[dataflow-tree](./dataflow-tree)** - Data flow visualization +- **[site-graph](./site-graph)** - Website structure mapping + +### 💬 Chat & Interaction +- **[chat-with](./chat-with)** - Interactive chat with web content +- **[ChatWithWebsite-Scrape](./ChatWithWebsite-Scrape)** - Conversational web scraping +- **[github-chatbot](./github-chatbot)** - GitHub repository chatbot + +### 🔧 Utilities & Tools +- **[deep-form](./deep-form)** - Complex form filling automation +- **[changelog-builder](./changelog-builder)** - Automated changelog generation +- **[hb-changelog-tracker](./hb-changelog-tracker)** - Track product changes +- **[hb-headers](./hb-headers)** - HTTP header analysis +- **[hb-predict](./hb-predict)** - Predictive analytics +- **[hyper-train](./hyper-train)** - Model training data collection +- **[link-sniper-bot](./link-sniper-bot)** - Link monitoring and capture +- **[down-detector-bot](./down-detector-bot)** - Service availability monitoring +- **[o3-pro-extractor](./o3-pro-extractor)** - Specialized data extraction + +### 📖 Cookbook +- **[cookbook](./cookbook)** - Collection of code recipes and patterns + +## 🏗️ Architecture Patterns + +### Basic Pattern +Most examples follow this structure: +```typescript +import { Hyperbrowser } from '@hyperbrowser/sdk'; + +const hbClient = new Hyperbrowser({ + apiKey: process.env.HYPERBROWSER_API_KEY +}); + +// Your automation logic here +``` + +### Complex Examples +Some examples demonstrate advanced patterns: +- **Pipeline architectures** (hb-intern-bot): scrape → normalize → score → summarize +- **CLI tools** (llm-crawl): argument parsing, multiple output formats +- **Watch modes** (hb-intern-bot): continuous monitoring +- **Multi-source aggregation**: combining data from various APIs + +## 📦 Common Dependencies + +- `@hyperbrowser/sdk` - Core browser automation +- `@hyperbrowser/agent` - Agent-based automation +- `openai` - OpenAI API integration +- `@langchain/openai` - LangChain integration +- `dotenv` - Environment management +- `typescript` + `tsx`/`ts-node` - TypeScript execution +- `zod` - Schema validation +- `cheerio` - HTML parsing + +## 🔑 Environment Variables + +Required for all examples: +```bash +HYPERBROWSER_API_KEY=your-hyperbrowser-key +``` + +Many examples also need: +```bash +OPENAI_API_KEY=your-openai-key +``` + +Store these in a `.env` file (never commit this file). + +## 📝 Development + +Each example is independent with its own: +- `package.json` - Dependencies and scripts +- `README.md` - Specific usage instructions +- Entry point (usually `*.ts` file) + +Run examples with: +```bash +npm run dev +# or +npx tsx .ts +``` + +Build (if applicable): +```bash +npm run build +``` + +## 🤝 Support + +- **Documentation**: [docs.hyperbrowser.ai](https://docs.hyperbrowser.ai) +- **Discord**: [Join our community](https://discord.gg/zsYzsgVRjh) +- **Email**: [info@hyperbrowser.ai](mailto:info@hyperbrowser.ai) +- **Issues**: Open an issue in this repository + +## 📄 License + +See [LICENSE](./LICENSE) file for details. + +--- + +**Note**: These examples demonstrate patterns and concepts. For production use, add proper error handling, logging, rate limiting, and security measures. \ No newline at end of file diff --git a/SEO-Analyzer/README.md b/SEO-Analyzer/README.md index e98cfb1..f078d56 100644 --- a/SEO-Analyzer/README.md +++ b/SEO-Analyzer/README.md @@ -1,179 +1,272 @@ -# SEO Analyzer Tool 🔍 +# SEO Analyzer -A powerful SEO analysis tool that uses AI to analyze websites and provide actionable recommendations for improving search engine optimization. +**Built with [Hyperbrowser](https://hyperbrowser.ai)** -## Features ✨ +An intelligent SEO analysis tool that uses AI-powered web extraction to analyze websites and provide actionable recommendations for improving search engine optimization. -- **Comprehensive SEO Analysis**: Analyzes title tags, meta descriptions, headings, content, images, links, and technical SEO factors -- **AI-Powered Insights**: Uses OpenAI's GPT-4 to provide intelligent recommendations -- **Severity Classification**: Issues are categorized as Critical, High, Medium, or Low priority -- **Quick Wins**: Identifies easy improvements for immediate impact -- **Detailed Reports**: Generates JSON reports with timestamps for tracking improvements -- **Web Scraping**: Uses Hyperbrowser to extract webpage content and structure +## Features -## Prerequisites 📋 +- 🔍 **Comprehensive SEO Analysis**: Analyzes title tags, meta descriptions, headings, content, images, links, and technical SEO factors +- 🤖 **AI-Powered Insights**: Uses Hyperbrowser's advanced extraction with structured schema validation +- 📊 **Severity Classification**: Issues categorized as Critical, High, Medium, or Low priority +- ⚡ **Quick Wins**: Identifies easy improvements for immediate impact +- 🎨 **Interactive CLI**: Beautiful command-line interface with colored output +- 🔐 **Type-Safe**: Built with TypeScript and Zod schema validation -1. **Node.js** (v16 or higher) -2. **API Keys**: - - [Hyperbrowser API Key](https://hyprbrowser.ai) - for web scraping - - [OpenAI API Key](https://platform.openai.com) - for AI analysis +## What It Does -## Setup 🛠️ +This tool allows you to perform comprehensive SEO audits on any website by: + +1. Taking a website URL as input via interactive prompt +2. Extracting and analyzing page content, structure, and metadata +3. Using AI to identify SEO issues and opportunities +4. Presenting findings in a clear, organized format with severity levels + +Perfect for: + +- SEO audits and optimization +- Website performance monitoring +- Content optimization +- Technical SEO analysis +- Competitive SEO research + +## Get an API Key + +Get your Hyperbrowser API key at **[https://hyperbrowser.ai](https://hyperbrowser.ai)** + +## Quick Start -1. **Install dependencies**: ```bash +# Install dependencies npm install + +# Set up environment variables +export HYPERBROWSER_API_KEY="your_api_key_here" + +# Run the analyzer +npx tsx SEO-Analyzer.ts ``` -2. **Environment Setup**: +## Prerequisites + +- Node.js (v16 or higher) +- npm or yarn +- Hyperbrowser API key + +## Environment Variables + Create a `.env` file in the project root: -```env -HYPERBROWSER_API_KEY=your_hyperbrowser_api_key_here -OPENAI_API_KEY=your_openai_api_key_here + +```bash +HYPERBROWSER_API_KEY=your_api_key_here ``` -## Usage 🚀 +## Usage -### Command Line Usage +### Interactive Mode -```bash -# Analyze any website -npx tsx SEO-Analyzer.ts https://example.com +The tool uses an interactive CLI to prompt for the website URL: -# Analyze your own website -npx tsx SEO-Analyzer.ts https://yourwebsite.com +```bash +npx tsx SEO-Analyzer.ts ``` -### Example Output +**Example Session:** ``` -🔍 Starting SEO analysis for: https://example.com -📄 Scraping webpage content... -🤖 Analyzing SEO with AI... -✅ SEO analysis completed! +Enter the website URL to analyze: example.com +Analyzing SEO for: https://example.com... -================================================================================ -🎯 SEO ANALYSIS REPORT FOR: https://example.com -================================================================================ +============================================================ +SEO ANALYSIS RESULTS FOR: https://example.com +============================================================ -📊 OVERALL SEO SCORE: 75/100 +OVERALL SEO SCORE: 75/100 -📋 SUMMARY: -The website has a solid foundation but needs improvements in meta descriptions and image optimization... +SUMMARY: +The website has a solid foundation but needs improvements in +meta descriptions and image optimization... -🚨 CRITICAL ISSUES (1): +ISSUES FOUND (5): -1. TITLE: Missing or duplicate title tags detected - 💡 Recommendation: Add unique, descriptive title tags (50-60 characters) - 🎯 Priority: 9/10 +1. CRITICAL - TITLE + Issue: Missing or duplicate title tags detected + Recommendation: Add unique, descriptive title tags (50-60 characters) -⚠️ HIGH ISSUES (2): +2. HIGH - META_DESCRIPTION + Issue: Meta description is too short + Recommendation: Expand meta description to 150-160 characters -1. META_DESCRIPTION: Meta description is too short - 💡 Recommendation: Expand meta description to 150-160 characters with compelling copy - 🎯 Priority: 8/10 +3. MEDIUM - IMAGES + Issue: 5 images missing alt text + Recommendation: Add descriptive alt text to all images -✅ STRENGTHS: +STRENGTHS: 1. Strong heading structure with proper H1-H6 hierarchy 2. Fast loading times and good technical performance -⚡ QUICK WINS: +QUICK WINS: 1. Add alt text to 5 images missing descriptions 2. Optimize meta description length for better click-through rates +============================================================ ``` -## SEO Analysis Categories 📊 +### Alternative Commands + +```bash +# Development mode +npm run dev + +# Using ts-node +npm start + +# Direct execution +npx tsx SEO-Analyzer.ts +``` + +## SEO Analysis Categories The tool analyzes these key SEO factors: | Category | What it Checks | |----------|----------------| -| **Title Tags** | Length, uniqueness, keyword inclusion | -| **Meta Descriptions** | Length, compelling copy, keyword usage | +| **Title Tags** | Length (50-60 chars), uniqueness, keyword inclusion | +| **Meta Descriptions** | Length (150-160 chars), compelling copy, keywords | | **Headings** | H1-H6 hierarchy, keyword usage, structure | | **Content** | Quality, length, keyword density, readability | | **Images** | Alt text, file names, optimization | | **Links** | Internal/external linking, anchor text | -| **Technical** | Page structure, schema markup, accessibility | +| **Technical SEO** | Page structure, schema markup, accessibility | + +## Issue Severity Levels -## Issue Severity Levels 🚨 +- **Critical**: Severely impacts SEO (missing titles, broken structure) +- **High**: Important issues to fix soon (poor meta descriptions, missing H1) +- **Medium**: Optimization opportunities (image alt text, internal linking) +- **Low**: Minor improvements (keyword density, content length) -- **🚨 Critical**: Severely impacts SEO (missing titles, broken structure) -- **⚠️ High**: Important issues to fix soon (poor meta descriptions, missing H1) -- **🔶 Medium**: Optimization opportunities (image alt text, internal linking) -- **ℹ️ Low**: Minor improvements (keyword density, content length) +## Output Structure -## Output Files 📁 +The tool provides structured output including: -Analysis results are saved as JSON files: -- **Filename**: `seo-analysis-{domain}-{timestamp}.json` -- **Content**: Complete analysis data including URL, timestamp, and all findings -- **Use Case**: Track improvements over time, share with team members +- **Overall SEO Score**: Numerical score from 0-100 +- **Summary**: High-level assessment of the website's SEO status +- **Issues**: Categorized list of problems with severity, description, and recommendations +- **Strengths**: What the website is doing well +- **Quick Wins**: Easy improvements for immediate impact -## Integration Options 🔧 +## API Reference -### Use as a Module +Uses **Hyperbrowser's official API methods**: ```typescript -import { analyzeSEO } from './SEO-Analyzer'; +import { Hyperbrowser } from "@hyperbrowser/sdk"; +import { z } from "zod"; + +// Initialize Hyperbrowser client +const client = new Hyperbrowser({ + apiKey: process.env.HYPERBROWSER_API_KEY, +}); + +// Define schema for structured SEO extraction +const schema = z.object({ + url: z.string(), + overallScore: z.number().min(0).max(100), + summary: z.string(), + issues: z.array(z.object({ + type: z.string(), + severity: z.enum(["critical", "high", "medium", "low"]), + issue: z.string(), + recommendation: z.string(), + })), + strengths: z.array(z.string()), + quickWins: z.array(z.string()), +}); + +// Extract structured SEO data +const result = await client.extract.startAndWait({ + urls: [url], + prompt: `Perform a comprehensive SEO analysis...`, + schema: schema, +}); +``` + +## Dependencies + +- **@hyperbrowser/sdk**: Hyperbrowser SDK for AI-powered web extraction and SEO analysis +- **dotenv**: Environment variable management +- **zod**: Schema validation for structured data extraction +- **readline**: Interactive command-line interface +- **TypeScript**: Type-safe development + +## Development -const analysis = await analyzeSEO('https://example.com'); -if (analysis) { - console.log(`SEO Score: ${analysis.overall_score}/100`); - console.log(`Issues found: ${analysis.issues.length}`); -} +### Project Structure + +``` +SEO-Analyzer/ +├── SEO-Analyzer.ts # Main application file +├── package.json # Project dependencies and scripts +├── tsconfig.json # TypeScript configuration +├── .env # Environment variables (create this) +└── README.md # This file ``` -### Web Interface Integration +### Architecture + +Single-file TypeScript implementation (~135 LOC): -The tool can be easily integrated into: -- Next.js applications -- Express.js APIs -- React dashboards -- Chrome extensions +- **Hyperbrowser SDK**: AI-powered web extraction via `extract.startAndWait()` +- **Zod Schema Validation**: Type-safe structured SEO data extraction +- **Interactive CLI**: readline-based user input with colored output +- **Smart URL Handling**: Automatically adds `https://` protocol if missing +- **Error Handling**: Graceful error handling with informative messages -## Troubleshooting 🐛 +## Troubleshooting -**Common Issues:** +### Common Issues -1. **"API Key not found"** - - Ensure `.env` file exists with correct API keys - - Check that environment variables are properly loaded +1. **API Key Error**: Make sure your Hyperbrowser API key is correctly set in the `.env` file or environment variables +2. **Network Issues**: Ensure you have a stable internet connection +3. **TypeScript Errors**: Run `npm install` to ensure all dependencies are installed +4. **Scraping Failed**: Website might be blocking automated access or require authentication +5. **No Content Found**: Site might be heavily JavaScript-based or have dynamic content loading -2. **"Scrape failed"** - - Website might be blocking scrapers - - Check if URL is accessible and valid - - Some sites require authentication +### Getting Help -3. **"No content found"** - - Website might be heavily JavaScript-based - - Content might be dynamically loaded - - Try a different URL or page +- Check the [Hyperbrowser documentation](https://docs.hyperbrowser.ai) for API-related issues +- Ensure your API key has sufficient credits +- Verify that the URL is accessible and valid +- Join the [Discord community](https://discord.gg/zsYzsgVRjh) for support -## Examples 💡 +## Use Cases -**Good for analyzing:** -- Blog posts and articles -- Product pages -- Landing pages -- Company websites -- E-commerce pages +**SEO Professionals:** +- Perform comprehensive website audits +- Track SEO improvements over time +- Identify technical SEO issues -**Best practices:** -- Analyze multiple pages from your site -- Run analysis before and after SEO changes -- Focus on critical and high-priority issues first -- Monitor scores over time +**Web Developers:** +- Validate SEO implementation +- Catch SEO issues during development +- Optimize page structure and metadata -## API Costs 💰 +**Content Creators:** +- Optimize blog posts and articles +- Improve meta descriptions and titles +- Enhance content discoverability -- **Hyperbrowser**: ~$0.01-0.05 per analysis (depending on page size) -- **OpenAI GPT-4**: ~$0.03-0.10 per analysis (depending on content length) +**Marketing Teams:** +- Monitor website SEO health +- Competitive SEO analysis +- Content optimization strategy -## Contributing 🤝 +## Learn More -Feel free to submit issues, feature requests, or pull requests to improve the tool! +- **Hyperbrowser Documentation:** [https://docs.hyperbrowser.ai](https://docs.hyperbrowser.ai) +- **Hyperbrowser Discord:** [https://discord.gg/zsYzsgVRjh](https://discord.gg/zsYzsgVRjh) +- **Support:** info@hyperbrowser.ai -## License 📝 +--- -MIT License - feel free to use this tool for your SEO analysis needs! +Follow [@hyperbrowser](https://x.com/hyperbrowser) for updates. diff --git a/Trend-summary/README.md b/Trend-summary/README.md index f20da92..67dd4c9 100644 --- a/Trend-summary/README.md +++ b/Trend-summary/README.md @@ -1,62 +1,144 @@ +**Built with [Hyperbrowser](https://hyperbrowser.ai)** + # Trend Summary Tool -A tool that aggregates trending topics from Hacker News and finds related discussions on Reddit to provide comprehensive trend summaries. +Automated trend analysis tool that discovers the top post from Hacker News and finds related discussions on Reddit, providing comprehensive cross-platform summaries of trending topics. + +## Features + +- Multi-page browser automation using HyperAgent +- Parallel page execution for efficient data gathering +- AI-powered content extraction from Hacker News +- Automated Reddit search for related discussions +- Intelligent summarization of community conversations -## What it does +## What It Does -This tool: +This tool demonstrates advanced multi-page browser automation: -1. Opens Hacker News and finds the top post published today -2. Searches Reddit for discussions related to that Hacker News post -3. Identifies recent conversations and provides a summary of the discussions +1. **Discovers Trending Content** - Opens Hacker News and extracts the top post published today (title, URL, and key information) +2. **Cross-Platform Search** - Simultaneously opens a second browser page to search Reddit for related discussions +3. **Community Analysis** - Finds top posts and comments about the HN topic on Reddit +4. **Intelligent Summarization** - Provides an overall summary of recent conversations and community sentiment -## Requirements +## Prerequisites - Node.js (v16 or higher recommended) -- A HyperBrowser API key (get your free key at [hyperbrowser.ai](https://hyperbrowser.ai)) -- An OpenAI API key +- API keys for: + - **Hyperbrowser**: Get yours at [hyperbrowser.ai](https://hyperbrowser.ai) + - **OpenAI**: Required by HyperAgent for AI-powered browser automation -## Setup +## Quick Start -1. Clone this repository +1. **Install dependencies:** + ```bash + npm install + ``` -```bash -git clone -cd Multipage-tool -``` +2. **Set up environment variables:** + Create a `.env` file: + ```env + HYPERBROWSER_API_KEY=your_hyperbrowser_api_key + OPENAI_API_KEY=your_openai_api_key + ``` + +3. **Run the tool:** + ```bash + npx ts-node TrendSummary.ts + ``` -2. Install dependencies +## Example Output -```bash -npm install ``` +===Starting HyperAgent=== -3. Create a `.env` file in the root directory with your API keys: +Opening first page... +Executing first task... +First destination found: [Title of top HN post] - [URL] - [Key details] +Opening second page... +Searching for information about [HN post title]... on Reddit +=== Summary of Reddit discussions === +[AI-generated summary of Reddit conversations, top comments, and community sentiment] + +Closing agent... +Agent closed successfully. ``` -HYPERBROWSER_API_KEY=your_hyperbrowser_key_here -OPENAI_API_KEY=your_openai_key_here -``` -## Usage +## How It Works + +### Multi-Page Architecture + +The tool uses HyperAgent's multi-page capabilities to run parallel browser sessions: + +```typescript +// Create agent with multiple page support +const agent = new HyperAgent(); -Run the script with: +// First page: Hacker News extraction +const page1 = await agent.newPage(); +const page1Response = await page1.ai( + "Open Hacker News front page, find the top post..." +); -```bash -npx ts-node TrendSummary.ts +// Second page: Reddit search (runs independently) +const page2 = await agent.newPage(); +const page2Response = await page2.ai( + `Search Reddit for: ${page1Response.output}...` +); ``` -## How it works +### AI-Powered Navigation + +HyperAgent handles complex browser interactions automatically: +- Navigating to websites +- Identifying relevant content +- Extracting structured information +- Following search workflows +- Summarizing findings + +## Code Structure + +**Main file**: `TrendSummary.ts` + +**Key components:** +- HyperAgent initialization with multi-page support +- Page 1: HN scraping task with AI-driven content extraction +- Page 2: Reddit search and discussion analysis +- Error handling with graceful cleanup + +**Dependencies:** +- `@hyperbrowser/agent` (v0.3.1) - Multi-page AI browser automation +- `@hyperbrowser/sdk` (v0.48.1) - Hyperbrowser SDK for session management +- `@langchain/openai` (v0.5.10) - LangChain integration for AI capabilities +- `dotenv` (v16.5.0) - Environment variable management +- `zod` - Schema validation + +## Use Cases + +- **Trend Research**: Quickly understand what's trending and why +- **Community Sentiment**: Gauge Reddit's reaction to HN topics +- **Content Discovery**: Find discussions across multiple platforms +- **Market Research**: Track emerging technologies and product launches +- **Competitive Intelligence**: Monitor competitor mentions and community feedback + +## Important Notes + +- Sessions are automatically cleaned up using `try-finally` blocks +- Multiple pages can run independently without blocking each other +- HyperAgent requires OpenAI API key for AI-powered browser automation +- The tool focuses on posts published "today" for freshness +- Error handling ensures proper agent cleanup even if tasks fail -The tool uses HyperBrowser's AI agent capabilities to: +## Extending the Tool -1. Navigate to Hacker News and identify the top trending post -2. Open a second browser page to search Reddit for related discussions -3. Analyze and summarize the discussions to provide insights +You can easily modify this tool to: +- Search additional platforms (Twitter, LinkedIn, etc.) +- Track specific keywords or topics +- Run on a schedule for continuous monitoring +- Export results to different formats (JSON, CSV, etc.) +- Integrate with notification services (Slack, email, etc.) -## Dependencies +--- -- `@hyperbrowser/agent` (v0.3.1) & `@hyperbrowser/sdk` (v0.48.1): For browser automation with AI -- `@langchain/openai` (v0.5.10): For AI-powered content analysis -- `dotenv` (v16.5.0): For environment variable management -- `zod`: For schema validation +Follow [@hyperbrowser](https://twitter.com/hyperbrowser) for updates diff --git a/agents/README.md b/agents/README.md new file mode 100644 index 0000000..6481ff8 --- /dev/null +++ b/agents/README.md @@ -0,0 +1,93 @@ +# Agents + +**Built with [Hyperbrowser](https://hyperbrowser.ai)** + +A collection of AI-powered agent examples demonstrating how to combine Hyperbrowser's browser automation capabilities with Large Language Models to create intelligent web automation workflows. + +## Overview + +These examples showcase agent-based patterns where LLMs guide browser automation, analyze web content, and make intelligent decisions. Each agent demonstrates a different use case for combining web automation with AI reasoning. + +## Examples + +### Budget Travel Agent + +An intelligent travel planning agent that automates searches on Google Travel Explore and uses OpenAI's Vision API to analyze and extract structured travel data. + +**Features:** +- Automated Google Travel search with configurable parameters +- Visual analysis using OpenAI GPT-4 Vision +- Structured data extraction (destinations, prices, dates, travel times) +- Interactive Streamlit UI with pagination +- Location validation using geocoding +- Direct links to Google Flights for booking + +**[View Example →](./budget-travel-agent/)** + +## Common Architecture Patterns + +### Agent-Based Automation +All examples in this directory follow an agent pattern: + +1. **Task Planning**: LLM receives user intent and plans actions +2. **Browser Automation**: Hyperbrowser executes the planned actions +3. **Data Extraction**: Content is captured (screenshots, HTML, etc.) +4. **Intelligent Analysis**: LLM analyzes the extracted data +5. **Structured Output**: Results are formatted and presented to the user + +### Key Technologies + +- **Hyperbrowser SDK**: Browser automation and session management +- **OpenAI API**: LLM reasoning and vision capabilities +- **Playwright**: Browser control via CDP protocol +- **Pydantic**: Data validation and structured outputs +- **Streamlit**: Interactive web interfaces (where applicable) + +## Getting Started + +Each agent example has its own directory with: +- Dedicated README with specific instructions +- Independent dependencies and configuration +- Example code demonstrating the pattern + +Navigate to individual agent directories for detailed setup and usage instructions. + +## Prerequisites + +Most agents require: +- **Hyperbrowser API Key**: Get one at [hyperbrowser.ai](https://hyperbrowser.ai) +- **OpenAI API Key**: Get one at [platform.openai.com](https://platform.openai.com) +- **Python 3.8+** or **Node.js 18+** (depending on the example) + +## Environment Variables + +Common environment variables across agents: + +```bash +HYPERBROWSER_API_KEY="your_hyperbrowser_key" +OPENAI_API_KEY="your_openai_key" +``` + +Store these in a `.env` file in the specific agent directory. + +## Use Cases + +Agent-based automation is ideal for: + +- **Research Automation**: Gather and analyze information from multiple sources +- **Price Monitoring**: Track and compare prices with intelligent filtering +- **Content Curation**: Find and summarize relevant content based on criteria +- **Form Automation**: Fill out complex forms with AI-guided decisions +- **Travel Planning**: Search, compare, and analyze travel options +- **Market Intelligence**: Competitive analysis with reasoning and insights + +## Resources + +- **Documentation**: [docs.hyperbrowser.ai](https://docs.hyperbrowser.ai) +- **Discord Community**: [discord.gg/zsYzsgVRjh](https://discord.gg/zsYzsgVRjh) +- **Support**: info@hyperbrowser.ai +- **Twitter**: [@hyperbrowser](https://x.com/hyperbrowser) + +## Contributing + +Each agent is designed as a standalone example. Feel free to use these as templates for your own agent-based automation workflows. \ No newline at end of file diff --git a/agi-newsletter/README.md b/agi-newsletter/README.md index bfcf0cc..207fc8f 100644 --- a/agi-newsletter/README.md +++ b/agi-newsletter/README.md @@ -1,55 +1,48 @@ -# AI Newsletter Automation 🤖📧 +**Built with [Hyperbrowser](https://hyperbrowser.ai)** -An automated newsletter system that scrapes the latest AI news from top sources, generates personalized content using OpenAI, and sends customized newsletters to subscribers via email. +# AI Newsletter Automation -## 🌟 Features +Automated newsletter system that scrapes the latest AI news from top sources (Anthropic, OpenAI, DeepMind, Hacker News), generates personalized content using OpenAI, and delivers customized newsletters to subscribers via email. -- **Multi-source Scraping**: Automatically scrapes content from major AI news sources -- **AI-Powered Content Generation**: Uses OpenAI GPT-4 to create engaging newsletter content -- **Personalized Delivery**: Sends customized newsletters to each subscriber -- **Email Integration**: Reliable email delivery through Resend API -- **TypeScript**: Full type safety and modern development experience +## Features -## 📰 News Sources +- Multi-source scraping from major AI news sources +- AI-powered content generation with OpenAI GPT-4 +- Personalized delivery with recipient names +- Email integration through Resend API +- TypeScript with full type safety + +## News Sources -The system automatically scrapes content from: - [Anthropic News](https://www.anthropic.com/news) - [OpenAI Blog](https://openai.com/blog) - [DeepMind Blog](https://deepmind.com/blog) - [Hacker News Front Page](https://news.ycombinator.com/front) -## 🚀 Quick Start - -### Prerequisites +## Prerequisites - Node.js (v18 or later) -- npm or yarn package manager - API keys for: - - Hyperbrowser SDK - - OpenAI - - Resend - -### Installation + - **Hyperbrowser**: Get yours at [hyperbrowser.ai](https://hyperbrowser.ai) + - **OpenAI**: Create at [platform.openai.com](https://platform.openai.com) + - **Resend**: Sign up at [resend.com](https://resend.com) (domain verification required) -1. **Clone and navigate to the project:** - ```bash - cd agi-newsletter - ``` +## Quick Start -2. **Install dependencies:** +1. **Install dependencies:** ```bash npm install ``` -3. **Set up environment variables:** - Create a `.env` file in the project root: +2. **Set up environment variables:** + Create a `.env` file: ```env HYPERBROWSER_API_KEY=your_hyperbrowser_api_key OPENAI_API_KEY=your_openai_api_key RESEND_API_KEY=your_resend_api_key ``` -4. **Configure subscribers:** +3. **Configure subscribers:** Edit the `users` array in `trend-newsletter.ts`: ```typescript const users = [ @@ -58,16 +51,16 @@ The system automatically scrapes content from: ]; ``` -5. **Run the newsletter:** +4. **Run the newsletter:** ```bash npx ts-node trend-newsletter.ts ``` -## 🔧 Configuration +## Configuration -### Adding New News Sources +### Adding News Sources -To add new sources, modify the `urls` array: +Modify the `urls` array in `trend-newsletter.ts`: ```typescript const urls = [ @@ -79,15 +72,15 @@ const urls = [ ### Customizing Newsletter Content -Modify the `SYSTEM_PROMPT` to change the newsletter style: +Adjust the `SYSTEM_PROMPT` to change style and tone: ```typescript const SYSTEM_PROMPT = `Your custom instructions for the AI...`; ``` -### Email Customization +### Email Settings -Update the email configuration in the Resend section: +Update email configuration (requires verified Resend domain): ```typescript const emailResponse = await resend.emails.send({ @@ -98,33 +91,15 @@ const emailResponse = await resend.emails.send({ }); ``` -## 📋 How It Works - -1. **Web Scraping**: The system uses Hyperbrowser SDK to scrape markdown content from configured news sources -2. **Content Processing**: All scraped content is combined and processed -3. **AI Generation**: OpenAI GPT-4 generates a friendly, engaging newsletter from the scraped content -4. **Personalization**: Each newsletter is personalized with the recipient's name -5. **Email Delivery**: Newsletters are sent via Resend to all configured subscribers - -## 🔑 API Keys Setup - -### Hyperbrowser SDK -1. Visit [Hyperbrowser](https://hyperbrowser.ai) -2. Sign up and get your API key -3. Add to `.env` as `HYPERBROWSER_API_KEY` - -### OpenAI -1. Visit [OpenAI Platform](https://platform.openai.com) -2. Create an API key -3. Add to `.env` as `OPENAI_API_KEY` +## How It Works -### Resend -1. Visit [Resend](https://resend.com) -2. Sign up and verify your domain -3. Create an API key -4. Add to `.env` as `RESEND_API_KEY` +1. **Scrape**: Uses Hyperbrowser SDK to scrape markdown content from configured news sources +2. **Process**: Combines all scraped content into a single markdown document +3. **Generate**: OpenAI GPT-4 creates a friendly, engaging newsletter from the content +4. **Personalize**: Each newsletter is customized with the recipient's name +5. **Deliver**: Sends personalized newsletters via Resend to all subscribers -## 📊 Output Example +## Example Output ``` 🔍 Starting to scrape pages... @@ -137,47 +112,30 @@ const emailResponse = await resend.emails.send({ 📨 All newsletters generated and sent successfully! ``` -## 🛠️ Development +## Code Structure -### Scripts +**Main file**: `trend-newsletter.ts` -```bash -# Run the newsletter -npx ts-node trend-newsletter.ts +Key components: +- `urls` - Array of news sources to scrape +- `users` - Subscriber list with names and emails +- `SYSTEM_PROMPT` - Instructions for OpenAI newsletter generation +- `main()` - Orchestrates scraping, generation, and delivery -# Install dependencies -npm install - -# Type checking -npx tsc --noEmit -``` - -### Dependencies - -- `@hyperbrowser/sdk` - Web scraping +**Dependencies**: +- `@hyperbrowser/sdk` - Web scraping via official SDK - `openai` - AI content generation -- `resend` - Email delivery +- `resend` - Email delivery service - `zod` - Schema validation -- `dotenv` - Environment variables - -## 🚨 Important Notes - -- **Rate Limits**: Be mindful of API rate limits for all services -- **Email Verification**: Ensure your Resend domain is verified before sending -- **Content Quality**: Monitor generated content for accuracy and tone -- **Error Handling**: The system includes robust error handling for failed scrapes - -## 📝 License - -MIT License - Feel free to modify and distribute as needed. +- `dotenv` - Environment variable management -## 🤝 Contributing +## Important Notes -1. Fork the repository -2. Create a feature branch -3. Make your changes -4. Submit a pull request +- Be mindful of API rate limits for all services +- Resend domain must be verified before sending emails +- Monitor generated content for accuracy and appropriate tone +- Includes error handling for failed scrapes and email delivery --- -**Happy Newslettering!** 🎉 +Follow [@hyperbrowser](https://twitter.com/hyperbrowser) for updates diff --git a/article-tts/README.md b/article-tts/README.md index 82f7647..cf9c130 100644 --- a/article-tts/README.md +++ b/article-tts/README.md @@ -1,66 +1,160 @@ -# Article TTS +# WebWhisper - Web to Voice Converter -A Python tool for converting articles to speech using text-to-speech technology. This tool helps users transform written content into audio format for better accessibility and convenience. +**Built with [Hyperbrowser](https://hyperbrowser.ai) and [ElevenLabs](https://elevenlabs.io)** + +A Streamlit web application that extracts article content from any URL and converts it into natural-sounding speech. Perfect for accessibility, productivity, and consuming content on the go. ## Features -- Convert articles from various sources (text files, URLs, etc.) to speech -- Support for multiple TTS engines -- Customizable voice options and speech parameters -- Easy-to-use command line interface -- Save output as MP3, WAV, or other audio formats +- Web Content Extraction: Uses Hyperbrowser SDK to extract clean article content from any URL +- AI-Powered Speech: Converts text to natural-sounding audio using ElevenLabs TTS +- Multiple Voices: Choose from 9 different voice options (Rachel, Domi, Bella, Antoni, and more) +- Multilingual Support: Select between monolingual and multilingual models +- Interactive UI: Clean Streamlit interface with real-time extraction and conversion +- Audio Download: Save generated audio files as MP3 -## Installation +## Prerequisites -Requires Python 3.13 or higher. +- Python 3.13 or higher +- API keys for: + - Hyperbrowser SDK (get at [hyperbrowser.ai](https://hyperbrowser.ai)) + - ElevenLabs (get at [elevenlabs.io](https://elevenlabs.io)) -```bash -# Clone the repository -git clone https://github.com/yourusername/article-tts.git -cd article-tts +## Installation -# Create a virtual environment (optional but recommended) -python -m venv venv -source venv/bin/activate # On Windows: venv\Scripts\activate +1. **Navigate to the project directory:** + ```bash + cd article-tts + ``` -# Install dependencies -pip install -r requirements.txt -``` +2. **Install dependencies:** + ```bash + pip install -r requirements.txt + ``` + +3. **Set up environment variables:** + Create a `.env` file in the project root: + ```env + HYPERBROWSER_API_KEY=your_hyperbrowser_api_key + ELEVENLABS_API_KEY=your_elevenlabs_api_key + ``` ## Usage -### Basic Usage +### Running the Application ```bash -python article_tts.py --input article.txt --output speech.mp3 +streamlit run app.py ``` -### Advanced Options +The application will open in your default web browser at `http://localhost:8501`. + +### Using the Interface + +1. **Configure API Keys**: Enter your Hyperbrowser and ElevenLabs API keys in the sidebar (or use environment variables) +2. **Enter URL**: Paste the URL of the article you want to convert +3. **Select Voice Options** (optional): + - Choose your preferred voice (e.g., Rachel, Josh, Bella) + - Select TTS model (monolingual or multilingual) +4. **Extract Content**: Click "Extract" to pull article content from the URL +5. **Convert to Speech**: Click "Convert to speech" to generate audio +6. **Listen or Download**: Play the audio in-browser or download the MP3 file + +### Example Workflow ```bash -python article_tts.py --input https://example.com/article --output speech.mp3 --voice female --rate 175 --format mp3 +# Start the application +streamlit run app.py + +# In the browser: +# 1. Enter URL: https://example.com/interesting-article +# 2. Click "Extract" +# 3. Review extracted content +# 4. Choose voice: "Rachel" +# 5. Click "Convert to speech" +# 6. Download the generated audio ``` +## How It Works + +1. **Content Extraction**: Hyperbrowser SDK extracts structured article data (title, author, abstract, full content) using AI-powered extraction with a defined schema +2. **Stealth Mode**: Uses stealth settings and cookie acceptance to reliably access content +3. **Speech Generation**: ElevenLabs API converts the extracted text into natural-sounding speech with customizable voices +4. **Audio Delivery**: Audio is streamed back to the browser for immediate playback or download + +## Configuration + +### Extraction Schema + +The app extracts the following fields from web pages: +- `title`: Article title +- `author`: Article author (optional) +- `abstract`: Article summary (optional) +- `fullContent`: Full article text + +### Voice Options + +Available voices: +- Rachel, Domi, Bella (female voices) +- Antoni, Josh, Arnold, Adam, Sam (male voices) + +### TTS Models + +- `eleven_monolingual_v1`: Optimized for English +- `eleven_multilingual_v1`: Supports multiple languages + ## Dependencies -- Python 3.13+ -- [Required dependencies will be listed in requirements.txt] +- `streamlit==1.42.2` - Web interface framework +- `hyperbrowser==0.30.0` - Web scraping and extraction +- `elevenlabs==1.52.0` - Text-to-speech conversion +- `python-dotenv==1.0.1` - Environment variable management + +## API Credits -## License +ElevenLabs API usage is tracked and displayed in the sidebar. The app shows remaining character credits for your account. -This project is licensed under the MIT License - see the LICENSE file for details. +## Error Handling -## Contributing +The application includes robust error handling for: +- Invalid API keys +- Failed content extraction +- Speech generation errors +- Network connectivity issues + +## Development + +### Project Structure + +``` +article-tts/ +├── app.py # Main Streamlit application +├── requirements.txt # Python dependencies +├── pyproject.toml # Project metadata (uv package manager) +├── uv.lock # Dependency lock file +├── .env # Environment variables (not tracked) +└── README.md # This file +``` + +### Running with UV + +This project uses `uv` for dependency management: + +```bash +# Install with uv +uv pip install -r requirements.txt + +# Run the app +streamlit run app.py +``` -Contributions are welcome! Please feel free to submit a Pull Request. +## Notes -1. Fork the repository -2. Create your feature branch (`git checkout -b feature/amazing-feature`) -3. Commit your changes (`git commit -m 'Add some amazing feature'`) -4. Push to the branch (`git push origin feature/amazing-feature`) -5. Open a Pull Request +- ElevenLabs API has character limits based on your subscription plan +- Hyperbrowser SDK requires an active API key for web extraction +- Large articles may take longer to process and consume more API credits +- Audio files are generated in MP3 format by default -## Acknowledgements +--- -- List any libraries, APIs, or other resources that were used or inspired this project -- Credit any contributors or sources of inspiration +**Built with Hyperbrowser SDK** - Follow [@hyperbrowser](https://x.com/hyperbrowser) for updates. diff --git a/changelog-builder/README.md b/changelog-builder/README.md index 8c1917e..9bd378b 100644 --- a/changelog-builder/README.md +++ b/changelog-builder/README.md @@ -1,107 +1,180 @@ -# GitHub Changelog Generator +# Changelog Builder 📝 -A powerful Streamlit application that generates comprehensive changelogs by analyzing git commit differences between two references. +**Built with [Hyperbrowser](https://hyperbrowser.ai)** -## 🌟 Features +A powerful Streamlit web application that automatically generates comprehensive changelogs by analyzing git commit differences between any two references in a GitHub repository. Perfect for release management, documentation, and tracking project evolution. -- **Visual Git Comparison**: Compare any two commits or references from a GitHub repository -- **Detailed Change Information**: Extract detailed information about commits and file changes -- **Multiple Changelog Formats**: - - **Standard Changelog**: A traditional chronological listing of commits and changes - - **Categorized Changelog**: Changes organized by type (features, bugfixes, docs, etc.) - - **AI-Processed Changelog**: An intelligent, structured changelog created using OpenAI - - Clearly separates additions, removals, changes, and fixes - - Focuses on human-readable summaries over technical details -- **Easy to Use**: Simple interface with download options for all generated changelogs +## ✨ Features -## 📋 Requirements +- 🔍 **Smart Git Comparison**: Compare any two commits, branches, or tags from a GitHub repository +- 📊 **Visual Commit Exploration**: Browse commits and file changes with expandable sections +- 🤖 **AI-Powered Changelogs**: Uses OpenAI to generate human-readable summaries +- 📂 **Multiple Formats**: + - **Commit Diff View**: Traditional chronological listing with detailed file changes + - **Categorized Changelog**: Auto-categorized by commit type (features, bugfixes, docs, tests, etc.) + - **AI-Processed Changelog**: Intelligent changelog clearly separating additions, removals, changes, and fixes + - **Raw Data View**: Complete JSON data for custom processing +- 💾 **Download Options**: Export any changelog format as markdown +- 🎯 **Web Scraping Powered**: Uses Hyperbrowser's Extract API to reliably fetch GitHub comparison data -- Python 3.8+ -- Streamlit -- Hyperbrowser API key -- OpenAI API key (for AI-processed changelogs) +## 🔧 Installation -## 🛠️ Installation +1. Install dependencies using uv (recommended): +```bash +uv pip install -e . +``` -1. Clone this repository: - ```bash - git clone - cd github-changelog-generator - ``` +Or using pip: +```bash +pip install -r requirements.txt +``` -2. Install the required dependencies: - ```bash - pip install -r requirements.txt - ``` +2. **Get API keys**: + - Hyperbrowser: [https://hyperbrowser.ai](https://hyperbrowser.ai) + - OpenAI (optional, for AI changelogs): [https://platform.openai.com](https://platform.openai.com) 3. Set up environment variables: - Create a `.env` file in the project root with the following content: - ``` - HYPERBROWSER_API_KEY=your_hyperbrowser_key_here - OPENAI_API_KEY=your_openai_key_here - ``` +```bash +# Create .env file +cat > .env << EOF +HYPERBROWSER_API_KEY=your_hyperbrowser_key_here +OPENAI_API_KEY=your_openai_key_here +EOF +``` + +## 🚀 Quick Start + +```bash +# Install dependencies +uv pip install -e . + +# Run the Streamlit app +streamlit run main.py +``` + +Then in your browser: +1. Enter a GitHub repository URL (e.g., `https://github.com/facebook/react`) +2. Enter starting reference (e.g., `v18.2.0` or a commit hash) +3. Enter ending reference (e.g., `v18.3.0` or `main`) +4. Click "View Comparison" to generate changelogs + +## 💡 Usage Examples + +### Compare Release Tags +``` +Repository: https://github.com/facebook/react +Start: v18.2.0 +End: v18.3.0 +``` +→ Generates changelog showing all changes between these releases + +### Compare Branch to Main +``` +Repository: https://github.com/yourusername/yourproject +Start: develop +End: main +``` +→ Shows what changes are in main that aren't in develop + +### Compare Commit Ranges +``` +Repository: https://github.com/yourorg/yourrepo +Start: abc1234 +End: def5678 +``` +→ Detailed diff between any two commit hashes -## 🚀 Usage +## 📊 Changelog Formats -1. Start the Streamlit app: - ```bash - streamlit run app.py - ``` +The tool generates four different views of your changelog: -2. In your browser, enter: - - A GitHub repository URL (e.g., `https://github.com/username/repo`) - - Starting commit hash/reference (e.g., `v1.0` or a commit hash) - - Ending commit hash/reference (e.g., `main` or a commit hash) +### 1. AI-Processed Changelog (Recommended) +Uses OpenAI GPT-4 to generate a human-readable changelog with clear structure: +- **Added**: New features, files, or functionality +- **Removed**: Deleted functionality, deprecated features, or removed files +- **Changed**: Updates to existing features or refactoring +- **Fixed**: Bug fixes and error corrections -3. Click "View Comparison" to generate changelogs +Each commit is presented with a clickable link to the full GitHub comparison view. + +### 2. Commit Diff View +Traditional chronological view showing: +- Commit messages and descriptions +- Committer names and verification status +- Detailed file-by-file changes +- Line additions/deletions per file +- Code diff previews + +### 3. Categorized Changelog +Auto-categorizes commits using heuristics based on commit message keywords: +- ✨ Features (feat, feature, add, new) +- 🐛 Bug Fixes (fix, bug, issue, error, resolv) +- 📄 Documentation (doc, readme, comment) +- 🔨 Refactoring (refactor, clean, restructure) +- 🧪 Tests (test, spec, assert) +- 🔄 Other Changes -4. Navigate between the different tabs to view different changelog formats: - - **Detailed View**: Browse commits and file changes with expandable sections - - **Standard Changelog**: A traditional chronological changelog - - **Categorized Changelog**: Changes organized by type (features, bugfixes, etc.) - - **AI-Processed Changelog**: An intelligent changelog with additions, removals, changes, and fixes clearly separated - - **Raw Data**: View the raw JSON data +### 4. Raw Data +Complete JSON data extracted from GitHub for custom processing or integration. -5. Download any changelog format using the provided download buttons +## 🏗️ Project Structure -## 📊 Changelog Formats +``` +changelog-builder/ +├── main.py # Streamlit web app entrypoint +├── changelog_generator.py # Core changelog generation logic +├── pyproject.toml # Python dependencies (uv/pip) +├── uv.lock # Lockfile for reproducible builds +└── .streamlit/ # Streamlit configuration +``` -### Standard Changelog -A traditional changelog that lists all commits chronologically with their associated file changes. +## ⚙️ How It Works -### Categorized Changelog -Organizes commits into categories based on their type: -- ✨ Features -- 🐛 Bug Fixes -- 📄 Documentation -- 🔨 Refactoring -- 🧪 Tests -- 🔄 Other Changes +1. **Extract**: Uses Hyperbrowser's Extract API to scrape GitHub comparison pages + - Fetches commit messages, descriptions, and committer info + - Extracts file changes with additions/deletions count + - Captures code diffs for detailed analysis -### AI-Processed Changelog -Uses OpenAI to generate a human-readable changelog that clearly separates: -- **Added**: New features, files, or functionality -- **Removed**: Deleted functionality, deprecated features, or files -- **Changed**: Updates to existing features or refactoring -- **Fixed**: Bug fixes and error corrections +2. **Process**: Transforms raw data into structured formats + - Parses commits and file changes using Pydantic models + - Categorizes commits based on message keywords + - Formats data for different changelog styles -## 🧩 Project Structure +3. **Generate**: Creates multiple changelog formats + - Standard chronological view + - Heuristic-based categorization + - AI-powered intelligent summaries (optional) -- `app.py`: Main Streamlit application -- `changelog_generator.py`: Core logic for generating changelogs -- `requirements.txt`: Project dependencies +4. **Export**: Provides download options for all formats as markdown files -## ⚙️ How It Works +## 🔑 Environment Variables + +```bash +HYPERBROWSER_API_KEY # Required - Get at https://hyperbrowser.ai +OPENAI_API_KEY # Optional - Required only for AI-processed changelogs +``` + +## 🐛 Troubleshooting + +**"HYPERBROWSER_API_KEY environment variable not set"** +- Make sure you have a `.env` file in the project directory +- Verify the API key is correct and not expired + +**"OpenAI API key not found"** +- This is optional - categorized and commit diff views work without it +- Add `OPENAI_API_KEY` to `.env` file to enable AI-processed changelogs -1. The application uses Hyperbrowser to extract git comparison data from GitHub -2. It processes this data to extract commits, file changes, additions, and deletions -3. Various changelog formats are generated based on this data -4. For AI-processed changelogs, the data is sent to OpenAI's API to generate more human-friendly descriptions +**No data extracted** +- Verify the GitHub repository is public +- Check that commit references (tags/branches/hashes) exist in the repository +- Ensure the comparison isn't empty (start and end references are different) -## 📝 License +## 📚 Resources -[MIT License](LICENSE) +- Hyperbrowser Documentation: [https://docs.hyperbrowser.ai](https://docs.hyperbrowser.ai) +- Hyperbrowser Discord: [https://discord.gg/zsYzsgVRjh](https://discord.gg/zsYzsgVRjh) +- Support: info@hyperbrowser.ai -## 🤝 Contributing +--- -Contributions, issues, and feature requests are welcome! Feel free to check the [issues page](link-to-issues). +Follow [@hyperbrowser](https://x.com/hyperbrowser) for updates. diff --git a/chat-with/README.md b/chat-with/README.md new file mode 100644 index 0000000..d726a7e --- /dev/null +++ b/chat-with/README.md @@ -0,0 +1,69 @@ +# Chat With... Examples + +A collection of interactive chat applications powered by Hyperbrowser and AI that enable users to have conversations with various content sources. + +## Overview + +These examples demonstrate how to build conversational interfaces that extract, process, and interact with content from different sources using the Hyperbrowser SDK combined with Large Language Models (LLMs). + +## Available Examples + +### [Chat with YouTube](./chat-with-youtube) + +A Streamlit web application that enables users to chat with YouTube video content using AI. The application automatically extracts video transcripts and uses OpenAI to answer questions about the video content in a conversational interface. + +**Key Features:** +- Extract transcripts directly from YouTube videos using Playwright and Hyperbrowser +- Chat with AI about video content with maintained conversation history +- View raw transcript data and API interactions +- Clean, user-friendly Streamlit interface + +**Tech Stack:** Python, Streamlit, Playwright, OpenAI, Hyperbrowser SDK + +**Use Case:** Perfect for understanding long videos, extracting key information from educational content, or summarizing video discussions without watching the entire video. + +[View Example →](./chat-with-youtube) + +## Common Architecture + +All examples in this collection follow similar patterns: + +1. **Content Extraction**: Use Hyperbrowser SDK to navigate and extract content from web sources +2. **Content Processing**: Parse and format extracted content for AI consumption +3. **Conversational Interface**: Maintain chat history and context for natural interactions +4. **AI-Powered Responses**: Use LLMs (primarily OpenAI) to generate intelligent responses based on extracted content + +## Prerequisites + +Most examples require: +- Python 3.7+ or Node.js 16+ (depending on the example) +- Hyperbrowser API key (get yours at [hyperbrowser.ai](https://hyperbrowser.ai)) +- OpenAI API key (from [platform.openai.com](https://platform.openai.com)) + +## Getting Started + +Each example has its own directory with: +- Detailed README with setup instructions +- Complete source code +- Dependencies configuration +- Environment variable requirements + +Navigate to the specific example directory and follow its README for installation and usage instructions. + +## About Hyperbrowser + +[Hyperbrowser](https://hyperbrowser.ai) provides managed browser automation infrastructure that makes web scraping and automation reliable and scalable. It's purpose-built for AI agents and developers who need to interact with web content programmatically. + +## Resources + +- **Documentation**: [docs.hyperbrowser.ai](https://docs.hyperbrowser.ai) +- **Discord Community**: [discord.gg/zsYzsgVRjh](https://discord.gg/zsYzsgVRjh) +- **Support**: info@hyperbrowser.ai + +## Contributing + +Have an idea for a new "Chat With..." example? Feel free to submit a pull request or open an issue! + +## License + +Each example is provided for educational and demonstration purposes. Check individual example directories for specific licensing information. \ No newline at end of file diff --git a/company-researcher/README.md b/company-researcher/README.md index c60fdf2..1fc7749 100644 --- a/company-researcher/README.md +++ b/company-researcher/README.md @@ -1,14 +1,17 @@ # Company Researcher +**Built with [Hyperbrowser](https://hyperbrowser.ai)** + An intelligent company research tool that uses AI-powered web extraction to gather comprehensive information about any company based on your specific research topics. ## Features - 🔍 **Smart Company Research**: Enter any company name and research topic to get detailed insights -- 🤖 **AI-Powered Extraction**: Uses HyperBrowser's advanced extraction capabilities to gather structured data +- 🤖 **AI-Powered Extraction**: Uses Hyperbrowser's advanced extraction capabilities with structured schema validation - 📊 **Structured Output**: Returns organized information including company overview, research findings, and key points - 🎨 **Interactive CLI**: Beautiful command-line interface with colored output -- ⚡ **Fast & Reliable**: Leverages Google search and AI extraction for accurate results +- ⚡ **Fast & Reliable**: Leverages Google search and Hyperbrowser's `extract.startAndWait()` for accurate results +- 🔐 **Type-Safe**: Built with TypeScript and Zod schema validation ## What It Does @@ -27,29 +30,30 @@ Perfect for: - Investment research - Business intelligence -## Prerequisites - -- Node.js (v16 or higher) -- npm or yarn -- HyperBrowser API key - -## Setup - -### 1. Get Your HyperBrowser API Key +## Get an API Key -Get your HyperBrowser API key at **[hyperbrowser.ai](https://hyperbrowser.ai)** +Get your Hyperbrowser API key at **[https://hyperbrowser.ai](https://hyperbrowser.ai)** -### 2. Clone and Install +## Quick Start ```bash -# Navigate to the project directory -cd company-researcher - # Install dependencies npm install + +# Set up environment variables +export HYPERBROWSER_API_KEY="your_api_key_here" + +# Run the application +npx tsx company-researcher.ts ``` -### 3. Environment Configuration +## Prerequisites + +- Node.js (v16 or higher) +- npm or yarn +- Hyperbrowser API key + +## Environment Variables Create a `.env` file in the project root: @@ -57,16 +61,8 @@ Create a `.env` file in the project root: HYPERBROWSER_API_KEY=your_api_key_here ``` -Replace `your_api_key_here` with your actual HyperBrowser API key from [hyperbrowser.ai](https://hyperbrowser.ai). - ## Usage -### Running the Application - -```bash -npx ts-node company-researcher.ts -``` - ### Interactive Process 1. **Enter Company Name**: Type the name of the company you want to research @@ -113,11 +109,41 @@ The tool provides structured output including: - **Key Points**: Bullet-pointed insights and highlights - **Additional Information**: Supplementary relevant details +## API Reference + +Uses **Hyperbrowser's official API methods**: + +```typescript +import { Hyperbrowser } from "@hyperbrowser/sdk"; +import { z } from "zod"; + +// Initialize Hyperbrowser client +const client = new Hyperbrowser({ + apiKey: process.env.HYPERBROWSER_API_KEY, +}); + +// Define schema for structured extraction +const schema = z.object({ + companyName: z.string(), + companyOverview: z.string(), + researchFindings: z.string(), + keyPoints: z.array(z.string()), + additionalInfo: z.string().optional(), +}); + +// Extract structured data from search results +const result = await client.extract.startAndWait({ + urls: [searchUrl], + prompt: `Research and extract information about ${companyName}...`, + schema: schema, +}); +``` + ## Dependencies -- **@hyperbrowser/sdk**: HyperBrowser SDK for AI-powered web extraction +- **@hyperbrowser/sdk**: Hyperbrowser SDK for AI-powered web extraction - **dotenv**: Environment variable management -- **zod**: Schema validation for structured data +- **zod**: Schema validation for structured data extraction - **readline**: Interactive command-line interface - **TypeScript**: Type-safe development @@ -135,38 +161,32 @@ company-researcher/ └── README.md # This file ``` -### TypeScript Configuration +### Architecture -The project uses TypeScript with strict type checking. The main logic is in `company-researcher.ts` which: +Single-file TypeScript implementation (~112 LOC): -1. Sets up the HyperBrowser client -2. Creates an interactive CLI interface -3. Processes user input -4. Performs AI-powered web extraction -5. Displays formatted results +- **Hyperbrowser SDK**: AI-powered web extraction via `extract.startAndWait()` +- **Zod Schema Validation**: Type-safe structured data extraction +- **Interactive CLI**: readline-based user input with colored output +- **Google Search Integration**: Leverages search results for comprehensive research +- **Error Handling**: Graceful error handling with informative messages ## Troubleshooting ### Common Issues -1. **API Key Error**: Make sure your HyperBrowser API key is correctly set in the `.env` file +1. **API Key Error**: Make sure your Hyperbrowser API key is correctly set in the `.env` file or environment variables 2. **Network Issues**: Ensure you have a stable internet connection for web searches 3. **TypeScript Errors**: Run `npm install` to ensure all dependencies are installed +4. **No Data Found**: Try rephrasing your research topic to be more specific ### Getting Help -- Check the [HyperBrowser documentation](https://hyperbrowser.ai) for API-related issues +- Check the [Hyperbrowser documentation](https://docs.hyperbrowser.ai) for API-related issues - Ensure your API key has sufficient credits - Verify that the company name is spelled correctly - -## License - -ISC - -## Contributing - -Feel free to submit issues and enhancement requests! +- Join the [Discord community](https://discord.gg/zsYzsgVRjh) for support --- -**Note**: This tool requires a valid HyperBrowser API key. Get yours at [hyperbrowser.ai](https://hyperbrowser.ai) to start researching companies with AI-powered intelligence. +Follow [@hyperbrowser](https://x.com/hyperbrowser) for updates. diff --git a/competitor-analyzer-bot/README.md b/competitor-analyzer-bot/README.md index cee7c08..7413f43 100644 --- a/competitor-analyzer-bot/README.md +++ b/competitor-analyzer-bot/README.md @@ -1,29 +1,60 @@ # Competitor Analyzer Bot -A CLI tool that scrapes and compares 2 competitor websites, generating AI-powered competitive analysis reports. +**Built with [Hyperbrowser](https://hyperbrowser.ai)** -## Setup +> An intelligent competitive analysis tool that scrapes competitor websites and generates AI-powered insights comparing features, pricing, and unique value propositions. + +## What It Does + +Automate your competitive intelligence workflow with: + +- **Automated Web Scraping** - Extract clean content from any competitor website +- **AI-Powered Analysis** - GPT-4 identifies headlines, features, pricing, and USPs +- **Structured Insights** - Get consistent, comparable data across all competitors +- **Interactive CLI** - Simple prompts guide you through the analysis process +- **Report Generation** - Timestamped markdown reports for easy sharing and archival + +## Quick Start + +### 1. Get API Keys + +**Hyperbrowser API Key:** Sign up at [https://hyperbrowser.ai](https://hyperbrowser.ai) + +**OpenAI API Key:** Get one at [https://platform.openai.com](https://platform.openai.com) + +### 2. Installation -1. Install dependencies: ```bash +cd competitor-analyzer-bot npm install ``` -2. Set up environment variables: -Create a `.env` file with your API keys: +### 3. Environment Setup + +Create a `.env` file in the project root: + +```env +HYPERBROWSER_API_KEY=your_hyperbrowser_api_key_here +OPENAI_API_KEY=your_openai_api_key_here ``` -HYPERBROWSER_API_KEY=your_hyperbrowser_api_key (Get your API keys from https://hyperbrowser.ai) -OPENAI_API_KEY=your_openai_api_key + +### 4. Run the Analyzer + +```bash +npm start ``` ## Usage ### Interactive Mode + +The bot prompts you to enter competitor URLs one by one: + ```bash npm start ``` -The program will start and prompt you to enter 2 URLs interactively: +**Example Session:** ``` 🚀 Welcome to Competitor Analyzer Bot! @@ -33,49 +64,215 @@ The program will start and prompt you to enter 2 URLs interactively: Enter URL 1 of 2 (or 'quit'): stripe.com ✅ Added: https://stripe.com + Enter URL 2 of 2 (or 'quit'): square.com ✅ Added: https://square.com ✅ Got 2 URLs. Starting analysis... + +🔎 Scraping: https://stripe.com +✅ Successfully scraped: https://stripe.com + +🔎 Scraping: https://square.com +✅ Successfully scraped: https://square.com + +🤖 Analyzing 2 scraped websites with AI... + +📊 COMPETITOR ANALYSIS REPORT +================================ + +🧩 https://stripe.com +Headline: Online payment processing for internet businesses +Features: Accept payments, Send payouts, Automate financial processes, Built-in fraud prevention +Pricing: Pay as you go pricing - 2.9% + 30¢ per successful card charge +USP: Complete payment platform with developer-friendly APIs and global reach + +🧩 https://square.com +Headline: Tools to run and grow your business +Features: Point of Sale, Online Store, Invoices, Marketing tools, Team management +Pricing: Free plan available, 2.6% + 10¢ per transaction +USP: All-in-one commerce solution designed for small businesses and sellers + +✅ Report saved to competitor-report-2025-09-29T14-30-00-000Z.md ``` ### Alternative Commands + ```bash -# Using dev script +# Development mode npm run dev -# Using ts-node directly -npx ts-node competitor-analysis.ts +# Direct execution with tsx +npx tsx competitor-analysis.ts + +# Build TypeScript +npm run build ``` ## Features -- **URL Validation**: Automatically validates and filters invalid URLs -- **Web Scraping**: Uses Hyperbrowser to scrape website content in markdown format -- **AI Analysis**: Leverages OpenAI's GPT-4 to extract competitive insights -- **Structured Output**: Generates reports with: - - Website headlines - - Key features - - Pricing models - - Unique Selling Propositions (USPs) -- **Report Generation**: Saves timestamped reports in markdown format +### Smart URL Handling +- Automatically adds `https://` protocol if missing +- Validates URL format before processing +- Graceful error handling for invalid URLs +- Type 'quit' anytime to exit + +### Robust Web Scraping +- Uses Hyperbrowser SDK for reliable content extraction +- Converts websites to clean markdown format +- Handles JavaScript-heavy sites and SPAs +- Continues analysis even if some sites fail -## Output +### AI-Powered Insights +- **GPT-4o** extracts structured competitive intelligence +- **Zod schemas** ensure consistent, validated output format +- Identifies key differentiators automatically +- Structured data includes: + - Main headline/value proposition + - Core features list + - Pricing model summary + - Unique Selling Proposition (USP) + +### Report Generation +- **Console output** for immediate review +- **Markdown files** with ISO timestamps for archival +- Easy to share via Slack, email, or documentation +- Structured format enables further processing + +## Output Format + +### Console Report + +``` +📊 COMPETITOR ANALYSIS REPORT +================================ + +🧩 https://competitor1.com +Headline: Their main value proposition +Features: Feature A, Feature B, Feature C +Pricing: Pricing model summary +USP: What makes them unique + +🧩 https://competitor2.com +Headline: Their main value proposition +Features: Feature X, Feature Y, Feature Z +Pricing: Pricing model summary +USP: What makes them unique +``` -The tool generates: -1. Console output with competitive analysis -2. A timestamped markdown file (e.g., `competitor-report-2024-01-15T10-30-00-000Z.md`) +### Generated File -## Error Handling +`competitor-report-2025-09-29T14-30-00-000Z.md` -- Validates URL formats before processing -- Handles failed scraping attempts gracefully -- Continues analysis even if some websites fail to scrape -- Provides detailed error messages and status updates +```markdown +# Competitor Analysis Report + +Generated on: 9/29/2025, 2:30:00 PM + +🧩 https://competitor1.com +Headline: Their main value proposition +Features: Feature A, Feature B, Feature C +Pricing: Pricing model summary +USP: What makes them unique + +🧩 https://competitor2.com +Headline: Their main value proposition +Features: Feature X, Feature Y, Feature Z +Pricing: Pricing model summary +USP: What makes them unique +``` + +## Project Structure + +``` +competitor-analyzer-bot/ +├── competitor-analysis.ts # Main application logic +├── package.json # Dependencies and scripts +├── tsconfig.json # TypeScript configuration +├── .env # Environment variables (create this) +├── competitor-report-*.md # Generated reports (timestamped) +└── README.md # This file +``` + +## How It Works + +1. **User Input** - Interactive CLI prompts for 2 competitor URLs +2. **URL Validation** - Normalizes and validates each URL +3. **Web Scraping** - Hyperbrowser extracts content as markdown +4. **AI Analysis** - OpenAI GPT-4o processes content with structured output +5. **Report Generation** - Formats insights and saves to markdown file + +## Use Cases + +**Product Managers:** +- Compare feature sets across competitors +- Track pricing changes over time +- Identify market positioning gaps + +**Marketing Teams:** +- Analyze competitor messaging and USPs +- Research value proposition differentiation +- Gather insights for positioning strategy + +**Sales Teams:** +- Prepare competitive battle cards +- Understand competitor pricing models +- Identify competitive advantages + +**Startups:** +- Market research and competitive landscape analysis +- Feature benchmarking for roadmap planning +- Pricing strategy research + +## Troubleshooting + +### Common Issues + +**Missing API Keys:** +```bash +# Verify your .env file exists and contains both keys +cat .env +``` + +**Import Errors:** +```bash +# Reinstall dependencies +rm -rf node_modules package-lock.json +npm install +``` + +**Scraping Failures:** +- Some websites block automated scraping +- Try different competitor URLs +- Check Hyperbrowser API quota limits + +**TypeScript Errors:** +```bash +# Ensure TypeScript dependencies are installed +npm install --save-dev @types/node typescript +``` + +## Dependencies + +- **[@hyperbrowser/sdk](https://www.npmjs.com/package/@hyperbrowser/sdk)** - Web scraping and browser automation +- **[openai](https://www.npmjs.com/package/openai)** - OpenAI API client with GPT-4 support +- **[zod](https://www.npmjs.com/package/zod)** - TypeScript-first schema validation +- **[dotenv](https://www.npmjs.com/package/dotenv)** - Environment variable management +- **[readline](https://nodejs.org/api/readline.html)** - Built-in Node.js for CLI input ## Requirements -- Node.js -- TypeScript -- Hyperbrowser API key -- OpenAI API key +- **Node.js** 18+ or later +- **TypeScript** 5.0+ (installed via npm) +- **Hyperbrowser API Key** (get at [hyperbrowser.ai](https://hyperbrowser.ai)) +- **OpenAI API Key** (get at [platform.openai.com](https://platform.openai.com)) + +## Learn More + +- **Hyperbrowser Documentation:** [https://docs.hyperbrowser.ai](https://docs.hyperbrowser.ai) +- **Hyperbrowser Discord:** [https://discord.gg/zsYzsgVRjh](https://discord.gg/zsYzsgVRjh) +- **Support:** info@hyperbrowser.ai + +--- + +**Ready to analyze your competition? Get started in minutes!** diff --git a/cookbook/README.md b/cookbook/README.md index 1004d1c..2d45e70 100644 --- a/cookbook/README.md +++ b/cookbook/README.md @@ -1,44 +1,258 @@ -# YouTube Video Chat - Jupyter Notebook +# Hyperbrowser Cookbook -This Jupyter notebook allows you to analyze and chat with the content of any YouTube video using AI. It extracts the transcript from a YouTube video and uses OpenAI to generate responses to questions about the video content. +A collection of Jupyter notebook recipes demonstrating practical applications of the Hyperbrowser SDK for web automation, AI-powered scraping, and intelligent workflows. -## Requirements +## Overview -Before using this notebook, you'll need to install the following dependencies: +This cookbook contains hands-on examples that show how to build real-world applications using Hyperbrowser's browser automation capabilities combined with AI models. Each notebook is self-contained and focuses on a specific use case, from simple web scraping to complex multi-agent systems. + +## Prerequisites + +Before running these notebooks, you'll need: + +1. **Hyperbrowser API Key** - Sign up at [hyperbrowser.ai](https://hyperbrowser.ai) to get your API key +2. **OpenAI API Key** - Required for notebooks that use AI analysis (most examples) +3. **Python 3.9+** installed on your system +4. **Jupyter** environment (JupyterLab, Jupyter Notebook, or VS Code with Jupyter extension) + +### Installation + +Install all required dependencies: ```bash -pip install openai playwright python-dotenv jupyter hyperbrowser +pip install -r requirements.txt ``` -You'll also need to install Playwright's browser dependencies: +Or install them individually: ```bash +pip install openai playwright python-dotenv jupyter hyperbrowser ipywidgets notebook python -m playwright install chromium ``` -## Configuration +### Environment Setup -Create a `.env` file in the same directory as the notebook with the following environment variables: +Create a `.env` file in the cookbook directory with your API keys: +```bash +HYPERBROWSER_API_KEY=your_hyperbrowser_api_key_here +OPENAI_API_KEY=your_openai_api_key_here ``` -OPENAI_API_KEY=your_openai_api_key -HYPERBROWSER_API_KEY=your_hyperbrowser_api_key + +## Notebook Categories + +### Browser Automation & Agent Workflows + +Learn how to use Hyperbrowser's autonomous browser agents to navigate websites and extract information. + +- **[feature-recognition.ipynb](./feature-recognition.ipynb)** - How agents automatically handle login pages, cookie prompts, and other obstacles +- **[browser-use-authed-flows.ipynb](./browser-use-authed-flows.ipynb)** - Using persistent browser profiles for authenticated sessions (Instagram example) +- **[browser-use-hybrid-flows.ipynb](./browser-use-hybrid-flows.ipynb)** - Combining autonomous agents with traditional scraping methods +- **[hacker-news-summarizer.ipynb](./hacker-news-summarizer.ipynb)** - Basic agent example for extracting and summarizing HN posts + +### Search & Discovery + +Examples for finding and analyzing information across the web. + +- **[apartment-finder.ipynb](./apartment-finder.ipynb)** - Multi-step apartment search with filtering and AI-powered ranking +- **[concert-ticket-finder.ipynb](./concert-ticket-finder.ipynb)** - Searching for concert tickets and availability +- **[flight-ticket-search.ipynb](./flight-ticket-search.ipynb)** - Finding flight options with price comparison +- **[local-events-finder.ipynb](./local-events-finder.ipynb)** - Discovering events in your area +- **[steam-special-suggestor.ipynb](./steam-special-suggestor.ipynb)** - Finding Steam game deals based on preferences + +### E-commerce & Shopping + +Intelligent shopping assistants and product analysis. + +- **[shopping-assistant.ipynb](./shopping-assistant.ipynb)** - AI-powered shopping helper +- **[shopping-agents-with-vision.ipynb](./shopping-agents-with-vision.ipynb)** - Visual product analysis and recommendations +- **[doordash-location-recommender.ipynb](./doordash-location-recommender.ipynb)** - Restaurant recommendations based on location and preferences +- **[menu-recommendations.ipynb](./menu-recommendations.ipynb)** - Menu item suggestions based on dietary preferences +- **[ingredients-based-recipe-suggestor.ipynb](./ingredients-based-recipe-suggestor.ipynb)** - Recipe suggestions based on available ingredients + +### Content Analysis & Research + +Deep analysis of web content using AI. + +- **[company-researcher.ipynb](./company-researcher.ipynb)** - Comprehensive company research and analysis +- **[news-analyst.ipynb](./news-analyst.ipynb)** - News article analysis and summarization +- **[movie-review-researcher.ipynb](./movie-review-researcher.ipynb)** - Aggregating and analyzing movie reviews +- **[review-analyzer.ipynb](./review-analyzer.ipynb)** - Product review sentiment analysis +- **[twitter-profile-analyzer.ipynb](./twitter-profile-analyzer.ipynb)** - Social media profile analysis +- **[social-media-post-finder.ipynb](./social-media-post-finder.ipynb)** - Finding and analyzing social media content + +### Documentation & Development Tools + +Tools for working with documentation and code. + +- **[docs-qna.ipynb](./docs-qna.ipynb)** - Question answering over documentation +- **[documentation-based-coding-agent.ipynb](./documentation-based-coding-agent.ipynb)** - Building coding agents that reference documentation +- **[code-solver.ipynb](./code-solver.ipynb)** - Solving coding challenges with AI +- **[code-solver-browser-use.ipynb](./code-solver-browser-use.ipynb)** - Browser-based code challenge solver +- **[changelog-builder.ipynb](./changelog-builder.ipynb)** - Automated changelog generation + +### Model Context Protocol (MCP) Servers + +Building MCP servers to extend AI model capabilities with real-time data. + +- **[news-mcp-server.ipynb](./news-mcp-server.ipynb)** - Real-time news extraction server for AI models +- **[wikipedia-mcp-server.ipynb](./wikipedia-mcp-server.ipynb)** - Wikipedia integration via MCP +- **[youtube-mcp-server.ipynb](./youtube-mcp-server.ipynb)** - YouTube data extraction server + +### Interactive & Fun Projects + +Creative applications and games. + +- **[youtube_video_chat.ipynb](./youtube_video_chat.ipynb)** - Chat with YouTube video transcripts +- **[comic-trip-planner.ipynb](./comic-trip-planner.ipynb)** - Plan trips with a creative twist +- **[wiki-racer.ipynb](./wiki-racer.ipynb)** - Wikipedia racing game automation +- **[next-chess-move.ipynb](./next-chess-move.ipynb)** - Chess move analysis and suggestions + +## How to Use This Cookbook + +1. **Start with the basics**: Begin with `feature-recognition.ipynb` or `hacker-news-summarizer.ipynb` to understand core concepts +2. **Choose your use case**: Browse the categories above to find examples relevant to your needs +3. **Run the notebooks**: Open any notebook in Jupyter and run cells sequentially +4. **Customize**: Modify parameters and prompts to adapt examples to your specific requirements +5. **Combine patterns**: Mix techniques from multiple notebooks to build more complex applications + +## Key Concepts + +### Browser Use Agent + +The Browser Use agent is Hyperbrowser's autonomous browser automation system that: +- Understands natural language instructions +- Navigates websites visually like a human +- Handles obstacles (logins, cookie prompts, CAPTCHAs) automatically +- Works with dynamic content and modern web applications + +### Persistent Profiles + +Browser profiles maintain state across sessions: +- Store authentication cookies +- Remember user preferences +- Enable authenticated workflows without re-login +- Essential for social media and personalized content automation + +### Vision-Enabled Agents + +Agents with vision capabilities can: +- Analyze images and screenshots +- Make decisions based on visual content +- Handle complex UI layouts +- Provide more reliable automation for visual-heavy sites + +### Structured Extraction + +Define schemas (using Pydantic) to extract structured data: +- Specify exactly what fields you need +- Get consistent, typed data back +- Works across different website layouts +- Enables downstream processing and analysis + +## Common Patterns + +### Pattern 1: Simple Autonomous Agent + +```python +from hyperbrowser import AsyncHyperbrowser +from hyperbrowser.models import StartBrowserUseTaskParams + +hb = AsyncHyperbrowser(api_key="your-api-key") + +resp = await hb.agents.browser_use.start_and_wait( + StartBrowserUseTaskParams( + task="Go to example.com and extract the main heading" + ) +) +``` + +### Pattern 2: Authenticated Session + +```python +from hyperbrowser.models import CreateSessionParams, CreateSessionProfile + +resp = await hb.agents.browser_use.start_and_wait( + StartBrowserUseTaskParams( + task="Your task here", + session_options=CreateSessionParams( + profile=CreateSessionProfile(id=profile_id) + ) + ) +) +``` + +### Pattern 3: Structured Data Extraction + +```python +from pydantic import BaseModel +from hyperbrowser.models.extract import StartExtractJobParams + +class Product(BaseModel): + name: str + price: float + url: str + +resp = hb.extract.start_and_wait( + StartExtractJobParams( + urls=["https://example.com/products"], + schema=Product + ) +) +``` + +### Pattern 4: Multi-Stage Pipeline + +```python +# Stage 1: Extract raw data with agent +search_results = await search_apartments(location, filters) + +# Stage 2: Analyze and rank with LLM +ranked_results = await rank_with_ai(search_results, preferences) + +# Stage 3: Present results +display(Markdown(ranked_results)) ``` -You can get a Hyperbrowser API key from [hyperbrowser.ai](https://hyperbrowser.ai). +## Tips & Best Practices + +1. **Start simple**: Begin with clear, specific instructions to the agent +2. **Use vision when needed**: Enable `use_vision=True` for sites with complex visual layouts +3. **Handle authentication**: Use persistent profiles for authenticated workflows +4. **Structure your data**: Define Pydantic schemas for consistent extraction +5. **Chain operations**: Combine autonomous agents with LLM analysis for powerful workflows +6. **Test incrementally**: Run notebook cells one at a time to debug issues +7. **Respect rate limits**: Be mindful of website terms of service and rate limits +8. **Use proxies when needed**: Set `use_proxy=True` for geo-restricted content + +## Troubleshooting + +**Agent not completing task** +- Make instructions more specific and step-by-step +- Enable vision mode if the site is visually complex +- Check if authentication is required (use persistent profiles) + +**Extraction returning incomplete data** +- Verify your Pydantic schema matches available data +- Check if the page requires scrolling or interaction first +- Try using an agent-based approach instead of direct extraction + +**Rate limiting or blocking** +- Use `use_proxy=True` in session options +- Add delays between requests +- Use persistent profiles to appear more like a regular user + +## Resources + +- **Documentation**: [docs.hyperbrowser.ai](https://docs.hyperbrowser.ai) +- **Discord Community**: [discord.gg/zsYzsgVRjh](https://discord.gg/zsYzsgVRjh) +- **Website**: [hyperbrowser.ai](https://hyperbrowser.ai) +- **Support**: info@hyperbrowser.ai -## Usage +## Contributing -1. Run the notebook cells in sequence -2. Enter your YouTube URL in the designated cell -3. The notebook will extract the video transcript -4. You can then ask questions about the video content in the question cells -5. Add new cells as needed for additional questions +Found a bug or have a recipe idea? Open an issue or submit a pull request to the main repository. -## Features +## License -- Extracts YouTube video transcripts with timestamps -- Uses OpenAI to answer questions about video content -- Maintains conversation context through chat history -- Displays the full transcript for reference -- Minimizes interactivity with user parameters at the start +These examples are provided for educational and demonstration purposes. When using these patterns in production, ensure you comply with the terms of service of the websites you're automating. \ No newline at end of file diff --git a/crypto-news-bot/README.md b/crypto-news-bot/README.md index 491d869..0adc74e 100644 --- a/crypto-news-bot/README.md +++ b/crypto-news-bot/README.md @@ -1,112 +1,106 @@ -# 🚀 Crypto News Bot +# Crypto News Bot + +**Built with [Hyperbrowser](https://hyperbrowser.ai)** An intelligent cryptocurrency news aggregator that automatically scrapes, summarizes, and delivers breaking crypto news to your Slack workspace using AI-powered analysis. -## ✨ Features +## Features -- 🔍 **Smart News Scraping**: Automatically scrapes content from top crypto news sources -- 🤖 **AI-Powered Summaries**: Uses OpenAI GPT-4 to generate concise, impactful news summaries -- 📅 **Scheduled Updates**: Daily digest at 9 AM + periodic updates throughout the day -- 🔄 **Change Detection**: Intelligent content comparison to avoid spam notifications -- 💬 **Slack Integration**: Seamless delivery to your Slack workspace via webhooks -- 🧠 **Caching System**: Efficient content caching to minimize API calls +- **Smart News Scraping**: Automatically scrapes content from top crypto news sources using Hyperbrowser +- **AI-Powered Summaries**: Uses OpenAI GPT-4o-mini to generate concise, impactful news summaries +- **Scheduled Updates**: Daily digest at 9 AM + periodic updates throughout the day (1 PM, 5 PM, 9 PM) +- **Change Detection**: Intelligent content comparison to avoid spam notifications +- **Slack Integration**: Seamless delivery to your Slack workspace via webhooks +- **Caching System**: Efficient content caching to minimize API calls and track changes -## 📰 News Sources +## News Sources - **CoinDesk** - Leading cryptocurrency news and analysis -- **Decrypt** - Blockchain and crypto technology coverage +- **Decrypt** - Blockchain and crypto technology coverage - **Cointelegraph** - Comprehensive crypto market news -## 🛠️ Setup +## Get an API Key + +**Get your Hyperbrowser API key** at https://hyperbrowser.ai -### Prerequisites +## Prerequisites -- Node.js 18+ +- Node.js (v16 or higher) - npm or yarn +- Hyperbrowser API key +- OpenAI API key - Slack workspace with webhook access -- OpenAI API account -- Hyperbrowser API account - -### Installation - -1. **Clone the repository** - ```bash - git clone - cd crypto-news-bot - ``` - -2. **Install dependencies** - ```bash - npm install - ``` - -3. **Set up environment variables** - - Create a `.env` file in the root directory: - ```env - HYPERBROWSER_API_KEY=your_hyperbrowser_api_key_here - OPENAI_API_KEY=your_openai_api_key_here - SLACK_WEBHOOK_URL=your_slack_webhook_url_here - ``` - -### 🔑 API Keys Setup - -#### Hyperbrowser API Key -Get your Hyperbrowser API key at [hyperbrowser.ai](https://hyperbrowser.ai) - -1. Sign up for a Hyperbrowser account -2. Navigate to your dashboard -3. Generate an API key -4. Add it to your `.env` file - -#### OpenAI API Key -1. Visit [OpenAI Platform](https://platform.openai.com/) -2. Create an account and navigate to API keys -3. Generate a new API key -4. Add it to your `.env` file - -#### Slack Webhook URL -1. Go to your Slack workspace settings -2. Navigate to **Apps** → **Custom Integrations** → **Incoming Webhooks** -3. Create a new webhook for your desired channel -4. Copy the webhook URL to your `.env` file - -## 🚀 Usage -### Run Once +## Installation + ```bash -npm run start -# or +# Install dependencies +npm install + +# Set up environment variables +export HYPERBROWSER_API_KEY="your_api_key_here" +export OPENAI_API_KEY="your_openai_api_key_here" +export SLACK_WEBHOOK_URL="your_slack_webhook_url_here" + +# Run the bot npx tsx crypto-news-bot.ts ``` -### Run in Development Mode +## Environment Variables + +Create a `.env` file in the project root: + ```bash -npm run dev +HYPERBROWSER_API_KEY=your_hyperbrowser_api_key_here +OPENAI_API_KEY=your_openai_api_key_here +SLACK_WEBHOOK_URL=your_slack_webhook_url_here ``` -### Production Deployment +### Getting API Keys + +**Hyperbrowser API Key** +- Sign up at [hyperbrowser.ai](https://hyperbrowser.ai) +- Navigate to your dashboard +- Generate an API key + +**OpenAI API Key** +- Visit [OpenAI Platform](https://platform.openai.com/) +- Create an account and navigate to API keys +- Generate a new API key + +**Slack Webhook URL** +- Go to your Slack workspace settings +- Navigate to **Apps** → **Custom Integrations** → **Incoming Webhooks** +- Create a new webhook for your desired channel +- Copy the webhook URL + +## Usage + +### Run Once ```bash -npm run build -npm run start:prod +npx tsx crypto-news-bot.ts ``` -## ⏰ Schedule +The bot will immediately fetch and post news, then continue running on the scheduled intervals. + +## Schedule The bot runs automatically on the following schedule: -- **📊 Daily Digest**: Every day at 9:00 AM -- **🔄 News Updates**: At 1:00 PM, 5:00 PM, and 9:00 PM (only when significant changes detected) +- **Daily Digest**: Every day at 9:00 AM (always posts) +- **News Updates**: At 1:00 PM, 5:00 PM, and 9:00 PM (only when significant changes detected) + +## How It Works -## 🎯 How It Works +1. **Scraping**: Uses Hyperbrowser SDK to scrape content from crypto news websites in markdown format +2. **Processing**: Extracts relevant articles using `scrape.startAndWait()` with stealth mode and captcha solving +3. **Analysis**: OpenAI GPT-4o-mini analyzes content for significant changes and generates summaries +4. **Delivery**: Posts formatted summaries to your Slack channel via webhook +5. **Caching**: Stores content in `crypto_news_cache.json` to enable intelligent change detection -1. **Scraping**: Uses Hyperbrowser to scrape content from crypto news websites -2. **Processing**: Extracts relevant articles and content in markdown format -3. **Analysis**: OpenAI analyzes the content for significant changes and generates summaries -4. **Delivery**: Posts formatted summaries to your Slack channel -5. **Caching**: Stores content locally to enable intelligent change detection +## Sample Output -## 📋 Sample Output +The bot posts messages to Slack in this format: ``` 🚨 Daily Crypto Digest @@ -117,7 +111,14 @@ The bot runs automatically on the following schedule: • Altcoin Season Indicators: Market analysis shows potential rotation incoming ``` -## 🔧 Configuration +Updates throughout the day use the format: +``` +🔄 Crypto News Update + +• [Recent news items when significant changes are detected] +``` + +## Configuration ### Customizing News Sources @@ -126,7 +127,7 @@ Edit the `sources` array in `crypto-news-bot.ts`: ```typescript const sources = [ "https://www.coindesk.com", - "https://decrypt.co", + "https://decrypt.co", "https://cointelegraph.com", // Add your preferred crypto news sources ]; @@ -134,13 +135,13 @@ const sources = [ ### Adjusting Schedule -Modify the cron expressions: +Modify the cron expressions at the bottom of `crypto-news-bot.ts`: ```typescript // Daily digest at 9 AM cron.schedule("0 9 * * *", () => main(true)); -// Updates at 1 PM, 5 PM, 9 PM +// Updates at 1 PM, 5 PM, 9 PM cron.schedule("0 13,17,21 * * *", () => main()); ``` @@ -152,58 +153,103 @@ Update the `SYSTEM_PROMPT` variable to change how news is summarized: const SYSTEM_PROMPT = `Your custom summarization instructions here...`; ``` -## 📁 Project Structure +## Project Structure ``` crypto-news-bot/ -├── crypto-news-bot.ts # Main application logic -├── package.json # Dependencies and scripts -├── tsconfig.json # TypeScript configuration -├── .env # Environment variables (create this) -├── crypto_news_cache.json # Auto-generated cache file -└── README.md # This file +├── crypto-news-bot.ts # Main application logic +├── package.json # Dependencies and scripts +├── tsconfig.json # TypeScript configuration +├── .env # Environment variables (create this) +├── crypto_news_cache.json # Auto-generated cache file +└── README.md # This file ``` -## 🐛 Troubleshooting +### Architecture -### Common Issues +Single-file TypeScript implementation (~117 LOC): -**Import Errors**: Make sure all dependencies are installed: -```bash -npm install -npm install --save-dev @types/node +- **Hyperbrowser SDK**: Web scraping via `scrape.startAndWait()` with stealth mode and captcha solving +- **OpenAI GPT-4o-mini**: AI-powered summarization and change detection +- **node-cron**: Scheduled task execution +- **File-based caching**: Persistent content storage for change detection +- **Slack webhooks**: Team notifications + +## API Reference + +```typescript +import { Hyperbrowser } from "@hyperbrowser/sdk"; +import OpenAI from "openai"; + +// Initialize clients +const client = new Hyperbrowser({ + apiKey: process.env.HYPERBROWSER_API_KEY!, +}); + +const openai = new OpenAI({ + apiKey: process.env.OPENAI_API_KEY! +}); + +// Scrape news content +const scrape = await client.scrape.startAndWait({ + url: "https://www.coindesk.com", + sessionOptions: { + solveCaptchas: true, + useStealth: true + }, + scrapeOptions: { + formats: ["markdown"], + includeTags: ["article", "main"], + excludeTags: ["img"] + }, +}); + +// Generate summary with OpenAI +const chat = await openai.chat.completions.create({ + model: "gpt-4o-mini", + messages: [ + { role: "system", content: SYSTEM_PROMPT }, + { role: "user", content: markdownContent }, + ], +}); ``` -**API Rate Limits**: The bot includes intelligent caching to minimize API calls +## Dependencies -**Slack Delivery Issues**: Verify your webhook URL is correct and the channel permissions allow bot posts +- **@hyperbrowser/sdk** (^0.51.0) - Browser automation and web scraping +- **openai** (^5.7.0) - OpenAI API client for AI summaries +- **axios** (^1.10.0) - HTTP client for Slack webhooks +- **node-cron** (^4.1.1) - Task scheduling +- **dotenv** (^16.5.0) - Environment variable management +- **typescript** (^5.8.3) - TypeScript support -### Debug Mode +## Troubleshooting -Add console logs or set environment variables for debugging: +### Common Issues + +**Import Errors** +Make sure all dependencies are installed: ```bash -DEBUG=true npm start +npm install ``` -## 🤝 Contributing - -1. Fork the repository -2. Create a feature branch (`git checkout -b feature/amazing-feature`) -3. Commit your changes (`git commit -m 'Add amazing feature'`) -4. Push to the branch (`git push origin feature/amazing-feature`) -5. Open a Pull Request +**API Rate Limits** +The bot includes intelligent caching to minimize API calls. Cached content is stored in `crypto_news_cache.json`. -## 📄 License +**Slack Delivery Issues** +- Verify your webhook URL is correct +- Ensure the channel permissions allow bot posts +- Test your webhook with a curl command -This project is licensed under the ISC License. +**Bot Not Running** +The bot runs continuously once started. Ensure your terminal/process stays active or deploy to a server. -## 🙏 Acknowledgments +### Getting Help -- [Hyperbrowser](https://hyperbrowser.ai) for web scraping capabilities -- [OpenAI](https://openai.com) for AI-powered summarization -- [Slack](https://slack.com) for communication platform -- The crypto community for inspiration +- Check the [Hyperbrowser documentation](https://docs.hyperbrowser.ai) +- Join the [Discord community](https://discord.gg/zsYzsgVRjh) +- Email support: info@hyperbrowser.ai --- -**⚡ Ready to stay ahead of crypto news? Get started in minutes!** 🚀 +Follow [@hyperbrowser](https://x.com/hyperbrowser) for updates. diff --git a/dark-pattern-finder/README.md b/dark-pattern-finder/README.md index 02712f0..45f7159 100644 --- a/dark-pattern-finder/README.md +++ b/dark-pattern-finder/README.md @@ -2,45 +2,208 @@ **Built with [Hyperbrowser](https://hyperbrowser.ai)** -A CLI tool that scans websites for dark patterns using AI-powered analysis. Detects deceptive UX practices like fake scarcity, hidden fees, obstruction tactics, and more. +A powerful CLI tool that automatically scans websites for dark patterns using AI-powered analysis. Detects deceptive UX practices like fake scarcity, hidden fees, obstruction tactics, and more. Perfect for UX audits, ethical design reviews, and consumer protection research. -## Quick Start +## Features -1. **Get an API key** at [hyperbrowser.ai](https://hyperbrowser.ai) -2. Set up environment variables: +- **AI-Powered Detection**: Uses Groq's Llama 3.3 70B model for accurate dark pattern identification +- **Real Browser Automation**: Leverages Hyperbrowser SDK with Playwright for authentic page rendering +- **Multi-Site Scanning**: Analyze multiple websites in a single command +- **Detailed Evidence**: Provides specific examples and explanations for each detected pattern +- **Rich CLI Output**: Color-coded results with emojis for easy interpretation +- **Comprehensive Reports**: Summary statistics and pattern frequency analysis + +## Prerequisites + +- Node.js (v16 or higher) +- npm or yarn +- Hyperbrowser API key +- Groq API key (free tier available) + +## Installation + +1. Clone or navigate to this directory: ```bash -export HYPERBROWSER_API_KEY="your_key_here" -export GROQ_API_KEY="your_groq_key_here" +cd dark-pattern-finder ``` -3. Install and run: +2. Install dependencies: ```bash npm install -npx tsx dark-pattern-finder.ts scan https://example.com ``` +3. Set up environment variables: +Create a `.env` file in the root directory: +```env +HYPERBROWSER_API_KEY=your_hyperbrowser_api_key +GROQ_API_KEY=your_groq_api_key +``` + +Or export them directly: +```bash +export HYPERBROWSER_API_KEY="your_key_here" +export GROQ_API_KEY="your_groq_key_here" +``` + +## Getting API Keys + +### Hyperbrowser API Key +Get your Hyperbrowser API key at **[hyperbrowser.ai](https://hyperbrowser.ai)** + +### Groq API Key +1. Visit [Groq's platform](https://console.groq.com/) +2. Sign up for a free account +3. Navigate to API Keys section +4. Create a new API key + ## Usage -Scan single or multiple websites: +### Basic Scanning + +Scan a single website: ```bash -# Single site npx tsx dark-pattern-finder.ts scan https://example.com +``` + +Scan multiple websites: +```bash +npx tsx dark-pattern-finder.ts scan https://site1.com https://site2.com https://site3.com +``` + +### Example Output -# Multiple sites -npx tsx dark-pattern-finder.ts scan https://site1.com https://site2.com ``` +🔍 Dark Pattern Scan Results + +🌐 Site: https://example.com +⚠️ Found 3 dark patterns: + +⏰ Scarcity - fake countdown timer + The site displays a countdown timer claiming "Only 2 hours left for this deal!" but the timer resets on page refresh, indicating false urgency to pressure users into quick purchases. -## What It Detects +🥷 Sneaking - pre-checked marketing consent + The signup form has a pre-checked checkbox for marketing emails buried in the terms section, automatically opting users into promotional communications without explicit consent. -- **Scarcity**: Fake urgency and countdown timers -- **Obstruction**: Difficult cancellation flows -- **Sneaking**: Hidden costs and pre-checked boxes -- **Misdirection**: Misleading buttons and buried info -- **Forced Action**: Required signups and sharing -- **Hidden Fees**: Surprise charges at checkout +💰 Hidden Fees - surprise checkout charges + A "service fee" of $4.99 only appears at the final checkout step after users have entered payment information, with no prior disclosure of this additional cost. + +📊 Summary +Sites scanned: 1 +Sites with dark patterns: 1 +Total patterns found: 3 + +Most common patterns: + 🥷 Sneaking: 1 + ⏰ Scarcity: 1 + 💰 Hidden: 1 +``` + +## Dark Pattern Categories + +The tool detects six major categories of dark patterns: + +1. **Scarcity (⏰)**: False urgency and limited time offers + - Fake countdown timers that reset + - Misleading stock availability claims + - Artificial scarcity tactics + +2. **Obstruction (🚧)**: Making cancellation or opt-out difficult + - Hidden unsubscribe buttons + - Multi-step cancellation processes + - Roach motel patterns + +3. **Sneaking (🥷)**: Hidden costs and deceptive practices + - Pre-checked consent boxes + - Hidden subscription renewals + - Last-minute additional items in cart + +4. **Misdirection (🎯)**: Misleading visual design + - Confusing button placements + - Buried important information + - Visual tricks to guide unwanted actions + +5. **Forced Action (🔒)**: Requiring unnecessary actions + - Forced account creation + - Mandatory social media sharing + - Required personal information for basic access + +6. **Hidden Fees (💰)**: Surprise charges at checkout + - Undisclosed service fees + - Hidden shipping costs + - Unexpected taxes or surcharges + +## Technical Details + +### How It Works + +1. **Browser Automation**: Creates a Hyperbrowser session and connects via Chrome DevTools Protocol +2. **Page Analysis**: Navigates to the target URL and extracts DOM content +3. **Element Detection**: Identifies UI elements like buttons, checkboxes, modals, and timers +4. **AI Classification**: Sends page data to Groq's Llama model for dark pattern analysis +5. **Result Formatting**: Normalizes findings and presents them in a color-coded CLI format + +### Code Structure + +- **Main Entry Point**: Command-line interface with Commander.js +- **analyzeUrl()**: Core scanning logic with browser automation +- **classifyWithGroq()**: AI-powered pattern detection using function calling +- **printColoredResults()**: Rich terminal output with Chalk +- **Element Extractors**: Specialized functions for buttons, checkboxes, modals, and timers + +### Technologies Used + +- **Hyperbrowser SDK**: Browser automation infrastructure +- **Playwright Core**: Browser control and CDP connection +- **Groq API**: Fast LLM inference with Llama 3.3 70B +- **Commander.js**: CLI framework +- **Chalk**: Terminal styling +- **TypeScript**: Type-safe development + +## Limitations + +- Requires active API keys for both Hyperbrowser and Groq +- Analysis limited to publicly accessible websites +- Results depend on AI model interpretation +- May not detect all subtle dark patterns +- JavaScript-heavy sites may require longer load times + +## Troubleshooting + +**Missing API keys error**: +- Ensure both `HYPERBROWSER_API_KEY` and `GROQ_API_KEY` are set +- Check `.env` file is in the correct directory + +**Session creation failed**: +- Verify your Hyperbrowser API key is valid +- Check your Hyperbrowser account has available credits + +**Groq API errors**: +- Confirm your Groq API key is active +- Check for rate limiting on free tier + +**Timeout errors**: +- Some websites take longer to load +- The tool has built-in retry logic with 45-second timeouts + +## Use Cases + +- **UX Audits**: Evaluate your own or competitor websites for ethical design +- **Consumer Protection**: Identify deceptive practices on e-commerce sites +- **Regulatory Compliance**: Check for patterns that violate consumer protection laws +- **Academic Research**: Study prevalence of dark patterns across industries +- **Ethical Design Reviews**: Ensure your products follow user-centric principles ## Documentation -Full API documentation: [docs.hyperbrowser.ai](https://docs.hyperbrowser.ai) +- Full Hyperbrowser API documentation: [docs.hyperbrowser.ai](https://docs.hyperbrowser.ai) +- Groq API documentation: [console.groq.com/docs](https://console.groq.com/docs) + +## Community + +- Follow [@hyperbrowser](https://x.com/hyperbrowser) for updates +- Join the discussion: [Discord](https://discord.gg/zsYzsgVRjh) +- Support: info@hyperbrowser.ai + +## License -Follow [@hyperbrowser](https://x.com/hyperbrowser) for updates. +ISC diff --git a/dataflow-tree/README.md b/dataflow-tree/README.md index b7c890f..23dbb75 100644 --- a/dataflow-tree/README.md +++ b/dataflow-tree/README.md @@ -2,20 +2,22 @@ **Built with [Hyperbrowser](https://hyperbrowser.ai)** -A CLI tool that analyzes websites for PII data collection and privacy compliance. Uses AI to detect forms, tracking scripts, and personal data flows. +A CLI tool that analyzes websites for PII (Personally Identifiable Information) data collection and privacy compliance. Uses Hyperbrowser's AI agent to detect forms, tracking scripts, and personal data flows, then visualizes them as a colored tree structure. ## Features -🔍 **Detects PII collection** - Forms collecting email, phone, addresses, payment info -📊 **Analytics tracking** - Google Analytics, Segment, Facebook pixels -🌳 **Visual flow trees** - Colored ASCII output showing data flows -🚨 **CI mode** - Alerts on new PII endpoints for compliance monitoring +- **PII Detection** - Identifies forms collecting email, phone, addresses, payment info, passwords, and more +- **Analytics Tracking** - Detects Google Analytics, Segment, Mixpanel, Facebook pixels, and other tracking scripts +- **Visual Flow Trees** - Colored ASCII tree output showing data flow hierarchy +- **CI/CD Integration** - Alert mode that exits with status 1 when new PII endpoints are detected +- **JSON Output** - Machine-readable export for automation and further processing +- **Historical Comparison** - Tracks changes in PII collection over time -## Setup +## Installation -1. **Get an API key** at https://hyperbrowser.ai +1. **Get an API key** at [https://hyperbrowser.ai](https://hyperbrowser.ai) -2. **Create environment file:** +2. **Set up environment variables:** ```bash echo "HYPERBROWSER_API_KEY=your_api_key_here" > .env ``` @@ -25,30 +27,47 @@ A CLI tool that analyzes websites for PII data collection and privacy compliance npm install ``` -## Usage +## Quick Start -**Analyze a website:** ```bash +# Analyze a website npx ts-node index.ts --url https://example.com + +# Get JSON output +npx ts-node index.ts --url https://github.com --json + +# CI mode (for continuous monitoring) +npx ts-node index.ts --url https://openai.com --ci ``` -**JSON output:** +## Usage Examples + +### Basic Analysis ```bash -npx ts-node index.ts --url https://github.com --json +npx ts-node index.ts --url https://example.com ``` +Analyzes the website and displays a visual tree of data flows with color-coded categories. -**CI mode (exits 1 on new PII):** +### JSON Export ```bash -npx ts-node index.ts --url https://openai.com --ci +npx ts-node index.ts --url https://github.com --json > report.json +``` +Outputs machine-readable JSON for integration with other tools. + +### CI/CD Pipeline Integration +```bash +npx ts-node index.ts --url https://yoursite.com --ci ``` +Exits with code 1 if new PII endpoints are detected, perfect for automated compliance monitoring in CI/CD pipelines. -**Help:** +### View Help ```bash npx ts-node index.ts --help ``` -## Output +## Output Format +### Visual Tree ``` 🌳 Data Flow Tree: https://github.com @@ -59,10 +78,113 @@ https://github.com 🔴 PII requests: 1 🔵 Analytics requests: 1 ⚪ Business requests: 1 + +💾 Results saved to: out/flows.json +``` + +### Color Coding +- **🔴 Red** - PII data collection endpoints +- **🔵 Blue** - Analytics and tracking requests +- **⚪ White** - Standard business logic requests + +### JSON Output +Results are automatically saved to `out/flows.json`: +```json +{ + "timestamp": "2025-07-30T17:14:06.520Z", + "targetUrl": "https://example.com", + "flows": [ + { + "url": "https://example.com/api/form", + "method": "POST", + "category": "PII", + "piiFields": ["email", "password", "name"] + } + ] +} +``` + +## How It Works + +1. **Agent Analysis** - Uses Hyperbrowser's browser-use agent to navigate and analyze the target website +2. **Detection** - Scans for: + - Forms collecting personal data (email, name, phone, address, payment) + - Analytics/tracking scripts (Google Analytics, Segment, Facebook, etc.) + - Token and authentication flows (JWT, sessions, cookies) +3. **Classification** - Categorizes each request as PII, Analytics, or Business +4. **Visualization** - Generates a colored tree showing the data flow hierarchy +5. **Export** - Saves results to `out/flows.json` for historical comparison + +## PII Keywords Detected + +The tool detects these common PII patterns: +- `email`, `phone`, `ssn`, `jwt` +- `card`, `password`, `token` +- `name`, `address` + +## Analytics Platforms Detected + +Common analytics and tracking platforms: +- Google Analytics +- Segment +- Mixpanel +- Snowplow +- Facebook Pixel + +## CLI Options + +```bash +Options: + --url Target URL to analyze (required) + --json Output machine-readable JSON + --ci CI mode: exit 1 on new PII endpoints + --help Show help information +``` + +## CI/CD Integration + +Perfect for compliance monitoring in continuous integration: + +```yaml +# GitHub Actions example +- name: Check for new PII endpoints + run: | + npx ts-node index.ts --url https://yoursite.com --ci + env: + HYPERBROWSER_API_KEY: ${{ secrets.HYPERBROWSER_API_KEY }} ``` -Results are saved to `out/flows.json` for further analysis. +When new PII endpoints are detected: +- Exits with code 1 +- Writes details to `out/alert.txt` +- Displays which endpoints were added + +## Architecture + +- **Entry Point**: `index.ts` - Main CLI application +- **Agent Integration**: Uses `@hyperbrowser/sdk` browser-use agent for analysis +- **Classification**: Keyword-based pattern matching for PII and analytics detection +- **Output**: JSON storage in `out/` directory with historical comparison + +## Use Cases + +- **Privacy Audits** - Regular compliance checks for GDPR/CCPA +- **Penetration Testing** - Identify data collection points +- **Development** - Monitor data flows during feature development +- **CI/CD** - Automated alerts when new PII collection is added +- **Security Reviews** - Audit third-party scripts and tracking + +## Requirements + +- Node.js 14 or higher +- Hyperbrowser API key (get at [https://hyperbrowser.ai](https://hyperbrowser.ai)) +- TypeScript and ts-node + +## Documentation + +- Full API docs: [https://docs.hyperbrowser.ai](https://docs.hyperbrowser.ai) +- SDK Reference: [@hyperbrowser/sdk](https://www.npmjs.com/package/@hyperbrowser/sdk) --- -Follow @hyperbrowser_ai for updates. +Follow [@hyperbrowser](https://x.com/hyperbrowser) for updates. diff --git a/dataset-assmbler/README.md b/dataset-assmbler/README.md index 16222a0..ff6aa58 100644 --- a/dataset-assmbler/README.md +++ b/dataset-assmbler/README.md @@ -1,73 +1,134 @@ -# Dataset Assembler CLI - **Built with [Hyperbrowser](https://hyperbrowser.ai)** -A single-file TypeScript CLI tool that assembles datasets by: -1. Searching the web for topic-relevant pages using Serper.dev -2. Extracting structured text using Hyperbrowser's official scraping methods -3. Cleaning, deduplicating, and splitting into train/eval sets -4. Exporting to JSONL or CSV format +# Dataset Assembler -## Installation +Rapidly assemble high-quality training datasets from web content. Search, scrape, clean, and split data into ready-to-use train/eval sets for LLM fine-tuning. -```bash -# Clone the repository -git clone https://github.com/hyperbrowserai/examples -cd dataset-assembler +## Why Hyperbrowser? -# Install dependencies -npm install +[Hyperbrowser](https://hyperbrowser.ai) is the **Internet for AI** — purpose-built for developers creating AI agents and automating web tasks. Skip the infrastructure headaches and focus on building. -# TypeScript file runs directly with ts-node (no build step needed) -``` +## What It Does -## Get an API key +This CLI tool automates the entire dataset creation pipeline: -To use this tool, you'll need API keys from: -- [Hyperbrowser](https://hyperbrowser.ai) - For web scraping -- [Serper.dev](https://serper.dev) - For web search +1. **Search** - Queries the web using Serper.dev API to find relevant pages +2. **Scrape** - Extracts clean, structured content using Hyperbrowser's SDK (supports both batch and individual scraping) +3. **Process** - Deduplicates content and removes empty entries +4. **Split** - Divides data into train/eval sets with configurable ratios +5. **Export** - Outputs to JSONL or CSV format -## Environment Setup +Perfect for building domain-specific datasets for fine-tuning, RAG systems, or knowledge bases. -Create a `.env` file in the project directory with your API keys: +## Quick Start -``` -SERPER_API_KEY=your_serper_api_key -HYPERBROWSER_API_KEY=your_hyperbrowser_api_key -``` +1. **Get API keys**: + - [Hyperbrowser](https://hyperbrowser.ai) - For web scraping + - [Serper.dev](https://serper.dev) - For web search + +2. **Install dependencies**: + ```bash + npm install + ``` + +3. **Configure environment** - Create a `.env` file: + ```env + HYPERBROWSER_API_KEY=your_hyperbrowser_api_key + SERPER_API_KEY=your_serper_api_key + ``` + +4. **Run the tool**: + ```bash + npx ts-node dataset-assembler.ts --topic "machine learning security" --max 100 + ``` ## Usage +### Basic Example + ```bash -npx ts-node dataset-assembler.ts --topic "retrieval augmented generation security" --sources "arxiv.org,ai.googleblog.com" +npx ts-node dataset-assembler.ts \ + --topic "retrieval augmented generation security" \ + --sources "arxiv.org,ai.googleblog.com" \ + --max 200 ``` -### CLI Arguments +### Advanced Example -- `--topic` (required): Query string for search (e.g., "retrieval augmented generation security") -- `--sources` (optional): Comma-separated domains to bias results (e.g., arxiv.org,ai.googleblog.com) -- `--max` (default: 200): Total records to collect -- `--format` (default: jsonl): Output format (jsonl or csv) -- `--out` (default: dataset): Output file prefix -- `--train-split` (default: 0.9): Proportion of data for training -- `--fields` (default: url,title,content): Fields to include in the output -- `--concurrency` (default: 5): Number of concurrent requests +```bash +npx ts-node dataset-assembler.ts \ + --topic "AI safety research" \ + --sources "openai.com,anthropic.com" \ + --max 100 \ + --format csv \ + --train-split 0.8 \ + --out safety-dataset +``` -## Example +This will: +- Search for "AI safety research" on specified domains +- Scrape up to 100 pages with markdown content +- Deduplicate based on content hash +- Create 80/20 train/eval split +- Output to `safety-dataset.train.csv` and `safety-dataset.eval.csv` + +## CLI Options + +| Option | Type | Default | Description | +|--------|------|---------|-------------| +| `--topic` | string | *required* | Search query (e.g., "LLM fine-tuning") | +| `--sources` | string | - | Comma-separated domains (e.g., "arxiv.org,github.com") | +| `--max` | number | 200 | Maximum records to collect | +| `--format` | string | jsonl | Output format: `jsonl` or `csv` | +| `--out` | string | dataset | Output file prefix | +| `--train-split` | number | 0.9 | Training set proportion (0.0-1.0) | + +## Features + +- **Batch Scraping** - Uses Hyperbrowser's batch API (Ultra plan) with automatic fallback to individual scraping +- **Smart Deduplication** - SHA-256 content hashing to eliminate duplicates +- **Markdown Extraction** - Clean, LLM-ready content with main content extraction +- **Flexible Output** - JSONL or CSV formats with customizable fields +- **Domain Targeting** - Focus searches on specific authoritative sources +- **Train/Eval Split** - Automatic dataset splitting with shuffling + +## Output Format + +### JSONL (default) +```jsonl +{"url":"https://example.com/page1","title":"Article Title","content":"Clean markdown content..."} +{"url":"https://example.com/page2","title":"Another Article","content":"More content..."} +``` -```bash -npx ts-node dataset-assembler.ts --topic "AI safety research" --sources "openai.com,anthropic.com" --max 100 --format csv +### CSV +```csv +url,title,content +"https://example.com/page1","Article Title","Clean markdown content..." +"https://example.com/page2","Another Article","More content..." ``` -This command will: -1. Search for "AI safety research" on openai.com and anthropic.com -2. Scrape up to 100 pages -3. Clean and deduplicate the content -4. Split into training and evaluation sets (90/10 by default) -5. Export to CSV format as dataset.train.csv and dataset.eval.csv +## Use Cases + +- **LLM Fine-tuning** - Build domain-specific training datasets +- **RAG Systems** - Create knowledge bases from authoritative sources +- **Search Indexes** - Assemble content collections for semantic search +- **Research** - Gather domain-specific corpora for analysis + +## Requirements + +- Node.js >= 18.0.0 +- Hyperbrowser API key (Free tier available) +- Serper.dev API key + +## Architecture -## Use Case +Single-file TypeScript CLI built with: +- `@hyperbrowser/sdk` - Browser automation and scraping +- `yargs` - CLI argument parsing +- `axios` - HTTP requests to Serper API +- `chalk` - Terminal output formatting +- `dotenv` - Environment variable management -This tool is perfect for quickly assembling high-quality training datasets for fine-tuning language models, creating search indexes, or building knowledge bases from specific domains. +--- -Follow [@hyperbrowser](https://x.com/hyperbrowser) for updates. \ No newline at end of file +Follow [@hyperbrowser](https://x.com/hyperbrowser) for updates | [Documentation](https://docs.hyperbrowser.ai) | [Discord Community](https://discord.gg/zsYzsgVRjh) \ No newline at end of file diff --git a/deep-form/README.md b/deep-form/README.md index 8818a8c..18478fd 100644 --- a/deep-form/README.md +++ b/deep-form/README.md @@ -1,76 +1,64 @@ -# 🕵️ DeepForm +# DeepForm -**Automatically reverse-engineer any website's form flows with AI-powered analysis.** +**Built with [Hyperbrowser](https://hyperbrowser.ai)** -DeepForm is a CLI app that uses Hyperbrowser to automatically reverse-engineer any website's form flows — identifying input fields, validation rules, submission logic, and UI patterns — so developers can understand, replicate, or debug them instantly without inspecting code manually. +A CLI tool that automatically reverse-engineers website form structures using AI-powered analysis. Detects all form elements (inputs, textareas, selects, buttons) and provides security insights to identify potential phishing patterns and suspicious behaviors. -## ✨ Features +## Quick Start -- 🔍 **Intelligent Form Detection** - Automatically discovers all input fields on any webpage -- 🛡️ **Security Analysis** - AI-powered detection of phishing patterns and suspicious form behaviors -- 🚀 **Fast Scanning** - Powered by Hyperbrowser's headless browser technology -- 🎯 **Developer-Friendly** - Clean, actionable insights for form structure analysis -- 🎨 **Beautiful CLI** - Color-coded output with emoji indicators +1. **Get your API keys:** + - Hyperbrowser: [hyperbrowser.ai](https://hyperbrowser.ai) + - OpenAI: [openai.com](https://openai.com) -## 🚀 Quick Start - -### Prerequisites - -- Node.js 16+ installed -- Hyperbrowser API key from [hyperbrowser.ai](https://hyperbrowser.ai) -- OpenAI API key from [openai.com](https://openai.com) - -### Installation - -1. **Clone or download this project** -2. **Install dependencies:** - ```bash - npm install - ``` - -3. **Set up environment variables:** - Create a `.env` file in the project root: - ```env - HYPERBROWSER_API_KEY=your_hyperbrowser_api_key_here - OPENAI_API_KEY=your_openai_api_key_here - ``` +2. **Set up environment variables:** +```bash +export HYPERBROWSER_API_KEY="your_key_here" +export OPENAI_API_KEY="your_openai_key_here" +``` -### Getting API Keys +Or create a `.env` file: +```env +HYPERBROWSER_API_KEY=your_hyperbrowser_api_key_here +OPENAI_API_KEY=your_openai_api_key_here +``` -- **Hyperbrowser API Key**: Sign up at [hyperbrowser.ai](https://hyperbrowser.ai) to get your API key -- **OpenAI API Key**: Get your API key from [openai.com](https://openai.com) +3. **Install and run:** +```bash +npm install +npx tsx index.ts +``` -## 🎯 Usage +## Usage -Run DeepForm with: +When you run the tool, you'll be prompted to enter a URL: ```bash npx tsx index.ts ``` -Then enter any URL when prompted: - +Then enter the URL you want to scan: ``` 🔗 Enter URL to scan: https://example.com ``` -DeepForm will: -1. 🕷️ Scrape the webpage using Hyperbrowser -2. 🔎 Extract all form input fields -3. 🧠 Analyze the form structure with AI -4. 📋 Provide security insights and recommendations +The tool will: +1. Scrape the webpage using Hyperbrowser +2. Extract all form elements (inputs, textareas, selects, buttons) +3. Analyze the form structure with OpenAI GPT-4 +4. Provide security analysis and identify potential threats -## 📊 Example Output +## Example Output ``` 🔗 Enter URL to scan: https://login.example.com ⚙️ Scraping with Hyperbrowser... -✅ Found 3 inputs: -1. ... -2. ... -3. ... +✅ Found 4 form elements: +1. [INPUT] ... +2. [INPUT] ... +3. [INPUT] ... +4. [BUTTON] ... 🧠 Analyzing form structure with OpenAI... @@ -83,53 +71,39 @@ This appears to be a standard login form with proper security measures: - Standard field naming conventions used ``` -## 🛠️ How It Works - -1. **Web Scraping**: Uses Hyperbrowser's powerful browser automation to access any website -2. **Form Extraction**: Intelligently parses HTML to find all input elements -3. **AI Analysis**: Leverages OpenAI's GPT models to analyze form patterns and identify potential security issues -4. **Actionable Insights**: Provides clear, developer-friendly analysis of form structure and security - -## 🔒 Security & Privacy +## How It Works -- All web scraping is done through Hyperbrowser's secure infrastructure -- No sensitive data is stored locally -- API keys are kept in environment variables -- Form analysis helps identify potential security vulnerabilities +1. **Web Scraping**: Uses Hyperbrowser's cloud browser automation to fetch the full rendered HTML +2. **Form Extraction**: Parses HTML to find all form elements (input, textarea, select, button) +3. **AI Analysis**: Leverages OpenAI GPT-4 to analyze form patterns and identify security issues +4. **Security Report**: Provides actionable insights about potential phishing or malicious patterns -## 💡 Use Cases +## What It Detects -- **Security Auditing**: Identify phishing attempts and suspicious form patterns -- **Competitive Analysis**: Understand how other websites structure their forms -- **Development Research**: Learn form best practices from successful sites -- **QA Testing**: Verify form implementations across different websites -- **Accessibility Review**: Analyze form field labeling and structure +- **Phishing Patterns**: Suspicious form behaviors and deceptive practices +- **Hidden Fields**: Unusual hidden inputs and tracking elements +- **Security Issues**: Missing CSRF tokens, insecure field configurations +- **Validation Rules**: Input types, required fields, and constraints +- **Suspicious Behaviors**: Unusual form submission logic or redirects -## 🚦 Requirements +## Use Cases -- Node.js 16 or higher -- Valid Hyperbrowser API key -- Valid OpenAI API key -- Internet connection for API calls +- **Security Auditing**: Identify phishing attempts and malicious forms +- **Competitive Research**: Understand form structures on competitor sites +- **Development**: Learn form best practices and patterns +- **QA Testing**: Verify form implementations across different sites +- **Accessibility**: Analyze form field labeling and structure -## 🎨 Tech Stack +## Tech Stack - **TypeScript** - Type-safe development -- **Hyperbrowser SDK** - Web scraping and browser automation -- **OpenAI GPT-4** - AI-powered form analysis -- **Chalk** - Beautiful terminal colors -- **Dotenv** - Environment variable management - -## 🆘 Troubleshooting - -**"Cannot find module" errors**: Run `npm install` to install dependencies - -**API key errors**: Make sure your `.env` file is in the project root with valid API keys - -**Scraping fails**: Some websites may block automated access - try different URLs +- **@hyperbrowser/sdk** - Cloud browser automation +- **OpenAI GPT-4** - AI-powered security analysis +- **Chalk** - Terminal styling +- **Dotenv** - Environment management -**No forms found**: The website might use dynamic forms loaded with JavaScript +## Documentation ---- +Full API documentation: [docs.hyperbrowser.ai](https://docs.hyperbrowser.ai) -**Ready to analyze forms like never before?** Get started at [hyperbrowser.ai](https://hyperbrowser.ai) 🚀 \ No newline at end of file +Follow [@hyperbrowser](https://x.com/hyperbrowser) for updates. \ No newline at end of file diff --git a/down-detector-bot/README.md b/down-detector-bot/README.md index 8713944..eb461db 100644 --- a/down-detector-bot/README.md +++ b/down-detector-bot/README.md @@ -1,74 +1,60 @@ -# Down Detector Bot 🚨 +**Built with [Hyperbrowser](https://hyperbrowser.ai)** -An AI-powered infrastructure monitoring bot that tracks cloud service outages and sends intelligent Slack alerts only when status changes occur. +# Down Detector Bot -## What it does +AI-powered infrastructure monitoring bot that tracks cloud service outages and sends intelligent Slack alerts only when status changes occur. Automatically checks major cloud providers every hour and notifies you of new outages or recoveries. -This bot monitors major cloud providers (AWS, Google Cloud, Cloudflare, Azure) by: -- Scraping their DownDetector status pages using [Hyperbrowser](https://hyperbrowser.ai) -- Using OpenAI to intelligently analyze the content for **real current outages** -- Tracking status changes and only sending alerts when there are **new issues** or **recoveries** -- Running automated checks every hour via cron +## Why Hyperbrowser? -## Features - -✅ **Smart Detection**: Uses AI to distinguish between minor reports and major outages -✅ **Change-Based Alerts**: Only notifies on status changes, not repeated issues -✅ **Multiple Providers**: Monitors AWS, Google Cloud, Cloudflare, and Azure -✅ **Slack Integration**: Sends formatted alerts to your Slack channel -✅ **Automated Scheduling**: Runs hourly checks automatically - -## Setup - -### 1. Get API Keys - -**Hyperbrowser API Key** -- Go to [hyperbrowser.ai](https://hyperbrowser.ai) -- Sign up for an account -- Get your API key from the dashboard - -**OpenAI API Key** -- Go to [platform.openai.com](https://platform.openai.com) -- Create an account and get your API key - -**Slack Webhook URL** -- Go to your Slack workspace settings -- Create a new webhook for the channel you want alerts in -- Copy the webhook URL - -### 2. Install Dependencies - -```bash -npm install -``` +[Hyperbrowser](https://hyperbrowser.ai) is the **Internet for AI** — purpose-built for developers creating AI agents and automating web tasks. This bot uses Hyperbrowser to reliably scrape DownDetector status pages and OpenAI to intelligently analyze outage patterns. -### 3. Set Environment Variables - -Create a `.env` file in the project root: - -```env -HYPERBROWSER_API_KEY=your_hyperbrowser_api_key_here -OPENAI_API_KEY=your_openai_api_key_here -SLACK_WEBHOOK_URL=your_slack_webhook_url_here -``` - -### 4. Run the Bot - -```bash -npm run start -# or -ts-node down-detector.ts -``` - -## How it Works +## Features -1. **Scraping**: Uses Hyperbrowser to scrape DownDetector pages for each provider -2. **AI Analysis**: OpenAI analyzes the content with strict criteria for real outages -3. **Status Tracking**: Compares current status with previous runs to detect changes -4. **Smart Alerting**: Only sends notifications for: - - 🚨 **New outages detected** - - 🎉 **Services recovered** - - ⏳ **Ongoing issues** (logged but not alerted) +- **Smart AI Detection**: Uses GPT-4 to distinguish between minor user reports and real widespread outages +- **Change-Based Alerts**: Only notifies on status changes (new issues or recoveries), not repeated issues +- **Multi-Provider Monitoring**: Tracks AWS, Google Cloud, Cloudflare, and Microsoft Azure +- **Slack Integration**: Sends formatted alerts directly to your Slack channel +- **Automated Scheduling**: Runs hourly checks automatically via cron +- **Status Persistence**: Tracks previous states to avoid duplicate alerts + +## Prerequisites + +- Node.js (v18 or later) +- TypeScript and ts-node (installed via npm) +- API keys for: + - **Hyperbrowser**: Get yours at [hyperbrowser.ai](https://hyperbrowser.ai) + - **OpenAI**: Create at [platform.openai.com](https://platform.openai.com) + - **Slack Webhook**: Create an incoming webhook for your workspace at [api.slack.com/messaging/webhooks](https://api.slack.com/messaging/webhooks) + +## Quick Start + +1. **Install dependencies:** + ```bash + npm install + ``` + +2. **Set up environment variables:** + Create a `.env` file in the project root: + ```env + HYPERBROWSER_API_KEY=your_hyperbrowser_api_key_here + OPENAI_API_KEY=your_openai_api_key_here + SLACK_WEBHOOK_URL=your_slack_webhook_url_here + ``` + +3. **Run the bot:** + ```bash + npx ts-node down-detector.ts + ``` + +## How It Works + +1. **Scraping**: Hyperbrowser SDK scrapes DownDetector pages in markdown format for each provider +2. **AI Analysis**: OpenAI GPT-4 analyzes the content with strict criteria to identify real widespread outages +3. **Status Tracking**: Compares current status with previous check results to detect changes +4. **Smart Alerting**: Only sends Slack notifications for: + - 🚨 **New outages detected** - Service just went down + - 🎉 **Services recovered** - Previously down service is now operational + - ⏳ **Ongoing issues** - Already reported, logged but not alerted again ## Monitored Services @@ -77,31 +63,98 @@ ts-node down-detector.ts - Cloudflare - Microsoft Azure -## Sample Alerts +## Example Output -**New Outage:** ``` +🎯 Monitoring 4 services for outages + +🤖 Running AI outage check: 2025-09-29T10:00:00.000Z +✅ AWS is OK +🆕 NEW OUTAGE: Google Cloud +✅ Cloudflare is OK +✅ Microsoft Azure is OK + 🚨 NEW OUTAGES DETECTED: -🔴 AWS: Major outage affecting EC2 and RDS services in us-east-1 +🔴 Google Cloud: Major outage affecting Compute Engine in us-central1 region ``` -**Recovery:** +**Recovery Alert:** ``` 🎉 SERVICES RECOVERED: ✅ Google Cloud: Service restored ``` -## Customization +## Configuration + +### Adding More Services -To monitor different services, edit the `TARGETS` array in `down-detector.ts`: +Edit the `TARGETS` array in `down-detector.ts` to monitor additional services: ```typescript const TARGETS = [ - "https://downdetector.com/status/your-service/", - // Add more URLs here + "https://downdetector.com/status/aws-amazon-web-services/", + "https://downdetector.com/status/google-cloud/", + "https://downdetector.com/status/your-service/", // Add here ]; ``` -## License +### Customizing AI Analysis + +Modify the `SYSTEM_PROMPT` in `down-detector.ts` to adjust outage detection criteria: + +```typescript +const SYSTEM_PROMPT = ` +You're analyzing DownDetector pages for ACTIVE WIDESPREAD OUTAGES only. +// Customize criteria here... +`; +``` + +### Adjusting Check Frequency + +Change the cron schedule in `down-detector.ts`: + +```typescript +// Current: Every hour at minute 0 +cron.schedule("0 * * * *", () => { + runSmartCheck().catch(console.error); +}); + +// Example: Every 30 minutes +cron.schedule("*/30 * * * *", () => { + runSmartCheck().catch(console.error); +}); +``` + +## Code Structure + +**Main file**: `down-detector.ts` + +Key components: +- `TARGETS` - Array of DownDetector URLs to monitor +- `SYSTEM_PROMPT` - AI instructions for outage detection +- `StatusSchema` - Zod schema for structured AI responses +- `previousStatus` - Map tracking service states across checks +- `checkProviderStatus()` - Scrapes and analyzes a single provider +- `alertSlack()` - Sends formatted alerts to Slack +- `runSmartCheck()` - Main orchestration function + +**Dependencies**: +- `@hyperbrowser/sdk` - Web scraping with official SDK +- `openai` - AI-powered outage analysis +- `axios` - Slack webhook HTTP requests +- `node-cron` - Automated scheduling +- `zod` - Schema validation for AI responses +- `dotenv` - Environment variable management + +## Important Notes + +- The bot runs immediately on start, then every hour via cron +- Status tracking is in-memory, resets on restart +- OpenAI uses `gpt-4o-mini` model with structured output via Zod +- Strict AI criteria minimize false positives from minor user reports +- Slack webhook must be properly configured for alerts to send +- Consider rate limits when adding many services or increasing frequency + +--- -MIT \ No newline at end of file +Follow [@hyperbrowser](https://twitter.com/hyperbrowser) for updates \ No newline at end of file diff --git a/github-chatbot/README.md b/github-chatbot/README.md index 95a2974..0d4bab3 100644 --- a/github-chatbot/README.md +++ b/github-chatbot/README.md @@ -1,127 +1,102 @@ -# GitHub Repository Chatbot 🤖 +**Built with [Hyperbrowser](https://hyperbrowser.ai)** -An intelligent chatbot that scrapes GitHub repositories and answers questions about their content using AI. Simply provide a GitHub repository URL, and the bot will analyze the codebase, documentation, and repository structure to answer your questions. +# GitHub Repository Chatbot -## Features - -- 🔍 **Repository Scraping**: Automatically scrapes GitHub repositories including README files, code structure, and metadata -- 🤖 **AI-Powered Responses**: Uses OpenAI's GPT-4o-mini to provide intelligent answers about the repository -- 💬 **Interactive Chat**: Real-time question and answer interface -- 📊 **Comprehensive Analysis**: Analyzes file structure, documentation, issues, pull requests, and more - -## Prerequisites - -- Node.js (v16 or higher) -- npm or yarn -- OpenAI API key -- Hyperbrowser API key +Chat with any GitHub repository using AI. Ask questions about code, documentation, structure, and more - the bot scrapes the repository and provides intelligent answers in real-time. -## Installation +## Why Hyperbrowser? -1. Clone this repository: -```bash -git clone -cd github-chatbot -``` - -2. Install dependencies: -```bash -npm install -``` +[Hyperbrowser](https://hyperbrowser.ai) is the **Internet for AI** — purpose-built for developers creating AI agents and automating web tasks. Skip the infrastructure headaches and focus on building. -3. Set up environment variables: -Create a `.env` file in the root directory: -```env -HYPERBROWSER_API_KEY=your_hyperbrowser_api_key -OPENAI_API_KEY=your_openai_api_key -``` +## Quick Start -## Getting API Keys +1. **Get your API key**: https://hyperbrowser.ai +2. **Install**: `npm install` +3. **Configure**: Add `HYPERBROWSER_API_KEY` and `OPENAI_API_KEY` to `.env` +4. **Run**: `npx tsx github-chatbot.ts` -### Hyperbrowser API Key -Get your Hyperbrowser API key at **[hyperbrowser.ai](https://hyperbrowser.ai)** +## Features -### OpenAI API Key -1. Visit [OpenAI's platform](https://platform.openai.com/) -2. Sign up or log in to your account -3. Navigate to API Keys section -4. Create a new API key +- **Smart GitHub Scraping**: Automatically extracts README files, code structure, language stats, issues, and PRs +- **Interactive Q&A**: Real-time conversational interface powered by GPT-4o-mini +- **Comprehensive Analysis**: Understands repository metadata, file organization, commits, and documentation +- **Clean Formatting**: Simple, readable responses without markdown clutter ## Usage -1. Compile the TypeScript code: ```bash -npx tsc -``` +# Run the chatbot +npx tsx github-chatbot.ts -2. Run the chatbot: -```bash -node github-chatbot.js -``` - -3. Follow the prompts: - - Enter a GitHub repository URL when prompted - - Wait for the repository to be scraped and analyzed - - Start asking questions about the repository! - -### Example Usage - -``` +# Enter any GitHub repository URL 🔗 Enter GitHub repository URL: https://github.com/microsoft/vscode -🔎 Scraping GitHub repository: https://github.com/microsoft/vscode -🤖 Ready to chat about this repository! +# Ask questions about the repository 💬 Ask a question (or type 'exit' to quit): What is this repository about? -🤖 This is the Visual Studio Code repository, a free and open-source code editor developed by Microsoft... +🤖 This is the Visual Studio Code repository, a free and open-source code editor... 💬 Ask a question (or type 'exit' to quit): What programming languages are used? -🤖 The repository primarily uses TypeScript (78.2%), JavaScript (12.1%), CSS (4.8%)... +🤖 The repository primarily uses TypeScript (78.2%), JavaScript (12.1%)... ``` ## What Can You Ask? The chatbot can answer questions about: -- Repository purpose and description -- Programming languages used -- File structure and organization -- Recent commits and changes -- Issues and pull requests -- Documentation content +- Repository purpose and main features +- Programming languages and technology stack +- File structure and code organization +- Recent commits and development activity +- Open issues and pull requests - Installation and setup instructions -- Code functionality and features +- Documentation and usage guidelines +- Contributors and project history -## Technical Details +## Environment Variables -- **Web Scraping**: Uses Hyperbrowser SDK for intelligent GitHub scraping -- **AI Processing**: Leverages OpenAI's GPT-4o-mini for natural language processing -- **Data Extraction**: Focuses on markdown content, file structures, and repository metadata -- **Interactive Interface**: Built with readline-sync for seamless command-line interaction +Create a `.env` file: + +```env +HYPERBROWSER_API_KEY=your_hyperbrowser_api_key +OPENAI_API_KEY=your_openai_api_key +``` + +## How It Works + +1. **Scraping**: Uses Hyperbrowser SDK to scrape GitHub pages with targeted selectors for README, file tree, language stats, commits, issues, and PRs +2. **Processing**: Converts scraped HTML to clean markdown format +3. **AI Analysis**: Sends markdown content + your question to OpenAI GPT-4o-mini +4. **Response**: Returns formatted, contextual answers based on repository content + +## Technical Stack + +- **[@hyperbrowser/sdk](https://www.npmjs.com/package/@hyperbrowser/sdk)**: Web scraping and automation +- **OpenAI GPT-4o-mini**: Natural language processing +- **readline-sync**: Interactive CLI interface +- **TypeScript**: Type-safe development ## Limitations -- Requires valid API keys for both Hyperbrowser and OpenAI -- Scraping large repositories may take some time -- Responses are based on publicly available repository information -- Rate limits may apply based on your API plan +- Works with public GitHub repositories only +- Large repositories may take longer to scrape +- Response quality depends on scraped content completeness +- Rate limits apply based on your API plans ## Troubleshooting -**Scraping failed error**: -- Ensure the GitHub URL is valid and publicly accessible -- Check your Hyperbrowser API key is correct +**Scraping failed error**: +- Verify the GitHub URL is valid and publicly accessible +- Check your `HYPERBROWSER_API_KEY` is correct **OpenAI errors**: -- Verify your OpenAI API key is valid -- Ensure you have sufficient API credits - -**TypeScript compilation errors**: -- Run `npm install` to ensure all dependencies are installed -- Check that you're using Node.js v16 or higher +- Ensure `OPENAI_API_KEY` is valid and has credits +- Check your API quota limits -## Contributing +**Installation issues**: +- Use Node.js v16 or higher +- Run `npm install` to install all dependencies -Feel free to submit issues, feature requests, or pull requests to improve this chatbot! +--- -## License +**Perfect for**: Understanding new codebases, researching open-source projects, analyzing repository activity, automated code exploration. -This project is licensed under the ISC License. +🚀 **Scale your AI development** with [Hyperbrowser](https://hyperbrowser.ai) | Follow @hyperbrowser diff --git a/hb-changelog-tracker/README.md b/hb-changelog-tracker/README.md index 92f0c67..0e76134 100644 --- a/hb-changelog-tracker/README.md +++ b/hb-changelog-tracker/README.md @@ -1,61 +1,190 @@ # Changelog Tracker -A Node.js application that monitors various tech blogs for updates and sends summaries to Slack. Built with [Hyperbrowser](https://hyperbrowser.ai) for web scraping and OpenAI's GPT-4 for summarization. +An automated monitoring tool that tracks updates from tech blogs and sends AI-generated summaries to Slack. Built with [Hyperbrowser](https://hyperbrowser.ai) for reliable web scraping and OpenAI's GPT-4 for intelligent summarization. ## Features -- Monitors multiple tech blogs including OpenAI, Anthropic, DeepMind, Y Combinator, and HuggingFace -- Automatically scrapes new content using Hyperbrowser's reliable scraping API -- Generates concise, changelog-style summaries using GPT-4 -- Sends notifications to Slack when updates are detected +- **Multi-Source Monitoring**: Tracks updates from OpenAI, Anthropic, DeepMind, Y Combinator, HuggingFace, and more +- **Intelligent Scraping**: Uses Hyperbrowser's API to reliably extract content from dynamic websites +- **AI Summarization**: Generates concise, changelog-style summaries using GPT-4o-mini +- **Slack Integration**: Automatically sends formatted notifications when updates are detected +- **Customizable**: Easy to add or remove blog sources ## Prerequisites - Node.js (v16 or higher) - npm or yarn -- [Hyperbrowser API key](https://hyperbrowser.ai) - Sign up and get your API key +- Hyperbrowser API key - OpenAI API key - Slack Webhook URL -## Setup +## Installation -1. Install dependencies: - ```bash - npm install - ``` +1. Navigate to the project directory: +```bash +cd hb-changelog-tracker +``` + +2. Install dependencies: +```bash +npm install +``` + +3. Create a `.env` file in the project root: +```env +HYPERBROWSER_API_KEY=your_hyperbrowser_key_here +OPENAI_API_KEY=your_openai_key_here +SLACK_WEBHOOK_URL=your_slack_webhook_url +``` + +## Getting API Keys + +### Hyperbrowser API Key +Get your Hyperbrowser API key at **[hyperbrowser.ai](https://hyperbrowser.ai)** -2. Create a `.env` file in the project root with the following variables: - ``` - HYPERBROWSER_API_KEY=your_hyperbrowser_key_here - OPENAI_API_KEY=your_openai_key_here - SLACK_WEBHOOK_URL=your_slack_webhook_url - ``` +For documentation and API reference, visit **[docs.hyperbrowser.ai](https://docs.hyperbrowser.ai)** - - Get your Hyperbrowser API key from [hyperbrowser.ai](https://hyperbrowser.ai) - - View Hyperbrowser documentation at [docs.hyperbrowser.ai](https://docs.hyperbrowser.ai) - - Get your OpenAI API key from [OpenAI's platform](https://platform.openai.com) - - Create a Slack webhook URL from your Slack workspace settings +### OpenAI API Key +1. Visit [OpenAI's platform](https://platform.openai.com/) +2. Sign up or log in to your account +3. Navigate to API Keys section +4. Create a new API key + +### Slack Webhook URL +1. Go to your Slack workspace settings +2. Navigate to "Apps" > "Incoming Webhooks" +3. Create a new webhook for your desired channel +4. Copy the webhook URL ## Usage -Run the script: +Run the tracker: ```bash -npm start +npx tsx changelog.ts ``` +Or compile TypeScript first: +```bash +npx tsc +node changelog.js +``` + +### What It Does + The script will: -1. Scrape the configured tech blogs -2. Generate summaries of any new content -3. Send notifications to your configured Slack channel +1. Iterate through all configured blog URLs +2. Scrape each website using Hyperbrowser's intelligent extraction +3. Pass the scraped content to GPT-4o-mini for summarization +4. Send formatted notifications to your Slack channel for any detected updates + +### Example Output + +```bash +🔍 Scraping https://openai.com/blog +✅ Update found. Sending to Slack... +🔍 Scraping https://www.anthropic.com/news +⚠️ No updates or failed scrape. +🔍 Scraping https://deepmind.google/discover/blog +✅ Update found. Sending to Slack... +``` + +## Code Structure + +The application consists of a single TypeScript file (`changelog.ts`) with three main functions: + +### `scrapeAndSummarize(url: string)` +- Scrapes the target URL using Hyperbrowser SDK +- Configures scraping options to focus on article content +- Sends scraped markdown to OpenAI for summarization +- Returns the AI-generated summary or null if scraping fails + +### `notifySlack(text: string, url: string)` +- Posts formatted messages to Slack using the webhook URL +- Includes the source URL and summarized content + +### `main()` +- Orchestrates the entire workflow +- Iterates through all configured URLs +- Handles errors and logs progress + +## Configuration + +### Monitored Blogs + +By default, the tool monitors these tech blogs: +- OpenAI Blog +- Anthropic News +- Google DeepMind Blog +- Y Combinator Blog +- HuggingFace Blog + +### Adding or Removing Sources + +Edit the `urls` array in `changelog.ts`: + +```typescript +const urls = [ + "https://openai.com/blog", + "https://www.anthropic.com/news", + "https://deepmind.google/discover/blog", + "https://www.ycombinator.com/blog", + "https://huggingface.co/blog", + // Add your own URLs here +]; +``` + +### Customizing Scraping Behavior + +Modify the `scrapeOptions` object to adjust what content is extracted: + +```typescript +scrapeOptions: { + formats: ["markdown"], // Output format + includeTags: ["article", "main", ".changelog", ".post", ".blog"], // HTML elements to include + excludeTags: ["img", "script", "style"], // HTML elements to exclude +} +``` + +### Customizing AI Summaries + +Modify the `SYSTEM_PROMPT` to change how summaries are generated: + +```typescript +const SYSTEM_PROMPT = `You're an AI assistant that summarizes new updates from product or research blogs. Return a crisp, professional changelog-style summary of what's new.` +``` + +## Scheduling (Optional) + +To run this script periodically, you can use: + +**Cron (Linux/Mac)**: +```bash +# Run every hour +0 * * * * cd /path/to/hb-changelog-tracker && npx tsx changelog.ts +``` + +**Task Scheduler (Windows)** or deploy to a cloud function with scheduled triggers. + +## Troubleshooting -## Customization +**Environment variable errors**: +- Ensure all required variables are set in your `.env` file +- Check for typos in variable names -You can modify the `urls` array in `changelog.ts` to monitor different blogs or websites. The script uses Hyperbrowser's smart scraping to focus on article content while ignoring navigation, ads, and other irrelevant elements. +**Scraping failures**: +- Verify your Hyperbrowser API key is valid +- Some websites may require different scraping configurations +- Check the website is publicly accessible -## Documentation +**OpenAI errors**: +- Ensure you have sufficient API credits +- Verify your API key is valid and active -For more information about Hyperbrowser's capabilities and API reference, visit [docs.hyperbrowser.ai](https://docs.hyperbrowser.ai). +**Slack notification failures**: +- Confirm your webhook URL is correct +- Check the webhook hasn't been revoked +- Verify the target Slack channel exists ## License -MIT +ISC diff --git a/hb-headers/README.md b/hb-headers/README.md index 03110dc..e561152 100644 --- a/hb-headers/README.md +++ b/hb-headers/README.md @@ -1,73 +1,116 @@ +**Built with [Hyperbrowser](https://hyperbrowser.ai)** + # hb-headers -> 🔍 Instant CORS & Security-Header Checker powered by [Hyperbrowser](https://hyperbrowser.ai) +Instant CORS & Security-Header Checker - Analyze HTTP security headers with real browser requests that bypass Cloudflare, solve captchas, and follow redirects. -[![npm version](https://badge.fury.io/js/hb-headers.svg)](https://www.npmjs.com/package/hb-headers) -[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) +## Why Hyperbrowser? -## ✨ Features +[Hyperbrowser](https://hyperbrowser.ai) is the **Internet for AI** — purpose-built for developers creating AI agents and automating web tasks. With built-in stealth capabilities, you get accurate header analysis even on protected sites. -One command to: -- 🚀 Launch a stealth Hyperbrowser session (bypassing Cloudflare, redirects, captchas) -- 🔄 Follow every redirect to the final URL -- 📊 Get a color-coded analysis of security headers: - - CORS settings - - Content Security Policy - - HSTS configuration - - Cookie security - - Frame options +## What It Does -## 🚀 Quick Start +Launch a stealth browser session to: +- Launch stealth Hyperbrowser session (bypassing Cloudflare, redirects, captchas) +- Follow every redirect to the final URL +- Fetch real HTTP headers via Playwright +- Analyze security headers with color-coded output: + - CORS settings (`Access-Control-Allow-Origin`) + - Content Security Policy (`Content-Security-Policy`) + - HSTS configuration (`Strict-Transport-Security`) + - Cookie security (`Set-Cookie`) + - Frame protection (`X-Frame-Options`) -```bash -# Run directly with npx -npx hb-headers https://example.com +## Quick Start -# Or install globally -npm install -g hb-headers -hb-headers https://example.com -``` +1. **Get your API key**: https://hyperbrowser.ai +2. **Install dependencies**: + ```bash + cd hb-headers + npm install + ``` +3. **Configure**: Set your API key + ```bash + export HYPERBROWSER_API_KEY=hb_your_api_key_here + ``` +4. **Run**: + ```bash + npx tsx headers.ts https://example.com + ``` + +## Example Output -### Example Output -``` -URL chain: http → www → https://example.com ✅ - -Header Value Status -─────────────────────────────────────────────────────────────────── -Access-Control-Allow-Origin * ⚠️ -Content-Security-Policy missing ❌ -Strict-Transport-Security max-age=63072000; includeSub... ✅ -Set-Cookie session=… Secure; HttpOnly ✅ ``` +🌐 Starting stealth session for: https://example.com +✅ Reached: https://example.com -## 🔑 Setup +🔍 Security header report: -1. Visit [hyperbrowser.ai](https://hyperbrowser.ai) -2. Sign up (free tier available) and copy your API key -3. Set your API key: - ```bash - export HYPERBROWSER_API_KEY=pk_live_your_key_here - ``` - Or use the `--key` flag: `hb-headers --key ` +✅ access-control-allow-origin: * + Tip: Consider specifying only trusted origins. -## 🛠 Options +❌ content-security-policy: missing + Tip: Add a CSP to prevent XSS attacks. -| Flag | Description | -|---------|----------------------------------| -| `--json`| Output raw JSON (great for CI) | -| `--key` | Provide API key inline | +✅ strict-transport-security: max-age=63072000 + Tip: HSTS is active. -## 💪 Why Use hb-headers? +✅ set-cookie: sessionid=abc123; HttpOnly; Secure + Tip: Ensure cookies have HttpOnly and Secure flags. -- **Accurate CORS Debugging**: See headers after JS/CDN rewrites +❌ x-frame-options: missing + Tip: Protects against clickjacking. +``` + +## Features + +- **Real Browser Requests**: Uses Playwright over CDP for authentic header analysis +- **Stealth Mode**: Bypasses anti-bot protections with `useStealth: true` +- **Auto Captcha Solving**: Solves captchas automatically with `solveCaptchas: true` +- **Color-Coded Output**: Easy-to-read security analysis with chalk formatting +- **Extended Timeout**: 60-second timeout for slow-loading sites + +## Why Use hb-headers? + +- **Accurate CORS Debugging**: See headers after JavaScript/CDN rewrites - **Security Validation**: Instant pass/fail on HSTS, CSP, cookies -- **CI-Ready**: Runs headless, perfect for automation +- **Headless Ready**: Perfect for CI/CD pipelines and automation - **Enterprise-Grade**: Powered by Hyperbrowser's industrial-strength stealth stack -## 📝 License +## Code Structure + +``` +hb-headers/ +├── headers.ts # Main script with header analysis logic +├── package.json # Dependencies (@hyperbrowser/sdk, playwright-core, chalk) +└── tsconfig.json # TypeScript configuration +``` + +### How It Works + +1. Creates a stealth Hyperbrowser session with captcha solving enabled +2. Connects Playwright via WebSocket to the remote browser +3. Navigates to the target URL with extended timeout +4. Fetches HTTP headers from the final URL using Playwright's request API +5. Analyzes and displays security headers with recommendations + +## Configuration + +The script checks these security headers: +- `access-control-allow-origin` - CORS policy +- `content-security-policy` - XSS protection +- `strict-transport-security` - HTTPS enforcement +- `set-cookie` - Cookie security flags +- `x-frame-options` - Clickjacking protection + +## Use Cases -[Hyperbrowser](https://hyperbrowser.ai) - Fork, hack, and share! +- **Security Audits**: Quickly check if sites follow security best practices +- **CORS Debugging**: Verify CORS headers for API endpoints +- **CI/CD Integration**: Add security header checks to your deployment pipeline +- **Site Monitoring**: Track security header changes over time +- **Penetration Testing**: Analyze security posture of web applications --- -Built with 🤍 by [Hyperbrowser](https://hyperbrowser.ai) - The fastest way to browse, scrape, and test the web. \ No newline at end of file +🚀 **Scale your web automation** with [Hyperbrowser](https://hyperbrowser.ai) | Follow @hyperbrowser \ No newline at end of file diff --git a/hb-intern-bot/README.md b/hb-intern-bot/README.md index cb03f29..4a89424 100644 --- a/hb-intern-bot/README.md +++ b/hb-intern-bot/README.md @@ -38,11 +38,16 @@ npm run build ### 3. Environment Setup ```bash -export HYPERBROWSER_API_KEY="hb_your_api_key_here" -export OPENAI_API_KEY="sk-your_openai_key" # Optional - for AI summaries -export SLACK_WEBHOOK_URL="https://hooks.slack.com/your/webhook" # Optional +export HYPERBROWSER_API_KEY="hb_your_api_key_here" # Required +export OPENAI_API_KEY="sk-your_openai_key" # Optional - for AI summaries +export SLACK_WEBHOOK_URL="https://hooks.slack.com/..." # Optional - for Slack alerts ``` +**Notes:** +- `HYPERBROWSER_API_KEY` is **required** for web scraping +- `OPENAI_API_KEY` is optional; without it, the bot uses fallback heuristics for summaries +- `SLACK_WEBHOOK_URL` is only needed if using `--slack` flag + ### 4. Run Your First Bot ```bash # Quick demo with top 3 AI stories @@ -166,10 +171,36 @@ npx hb-intern --config production.yaml --watch 360 --slack --out /var/reports ``` ### 🧩 Modular Design -- **`/scraper`** - Source-specific extractors (HN, Reddit, PH, Blogs) -- **`/pipeline`** - Data processing (normalize, score, summarize, deck) -- **`/out`** - Output generation (markdown, PDF, Slack) -- **`/watch`** - Continuous monitoring with state persistence + +The project follows a clean pipeline architecture: + +**Core Modules:** +- **`src/scraper/`** - Source-specific extractors + - `hn.ts` - Hacker News (front page + new submissions) + - `reddit.ts` - Multiple subreddit scraper + - `producthunt.ts` - Product launches + trending + - `blogs.ts` - Company blog RSS/feed parser + +- **`src/pipeline/`** - Event processing pipeline + - `normalize.ts` - Convert scraped data to unified Event schema + - `score.ts` - Multi-factor scoring algorithm (velocity + authority + impact) + - `summarize.ts` - OpenAI-powered summarization with fallback heuristics + - `deck.ts` - PDF presentation generator with theme support + +- **`src/`** - CLI and orchestration + - `cli.ts` - Command-line argument parsing and validation + - `main.ts` - Main pipeline orchestrator (scrape → normalize → score → summarize → output) + - `watch.ts` - Continuous monitoring with state persistence + - `config.ts` - YAML config loader with bot mode presets + - `types.ts` - TypeScript interfaces and schemas + +**Pipeline Flow:** +1. **Scrape** - Parallel extraction from all enabled sources +2. **Normalize** - Convert to unified Event format, apply time/keyword filters +3. **Score** - Calculate relevance scores with weighted factors +4. **Top N** - Select highest-scoring events +5. **Summarize** - Generate AI summaries (or fallback to title/description) +6. **Output** - Write digest.md, events.jsonl, deck.pdf, send Slack notification --- @@ -205,19 +236,116 @@ npx hb-intern --config production.yaml --watch 360 --slack --out /var/reports - **`neon`** - High-contrast cyberpunk theme ### 🧠 Smart Scoring Algorithm -```typescript -final_score = (velocity * 0.4) + (authority * 0.3) + (impact * 0.3) -velocity = comments + votes + recency_boost -authority = domain_score + author_reputation -impact = keyword_relevance + engagement_ratio +The scoring system uses a weighted multi-factor approach: + +```typescript +// Weighted scoring (from src/pipeline/score.ts) +final_score = (velocity * 0.6) + (authority * 0.25) + (impact * 0.15) + +// Velocity: Engagement speed with logarithmic scaling +velocity = log(1 + points_per_hour * 10) + 0.2 * log(1 + comments) + - Points per hour with recency factoring + - Comment engagement bonus + - Logarithmic scaling prevents outliers + +// Authority: Source and domain reputation +authority = domain_reputation + source_weight + author_score + - Reputable domains (github.com, arxiv.org, etc.): +0.7 + - Source weights: HN(0.3), Blog(0.4), PH(0.25), Reddit(0.2) + - Author reputation heuristics: +0.1 + +// Impact: Keyword relevance and engagement +impact = keyword_matches + engagement_indicators + recency_boost + - Include keyword matches: +0.2 per match (max 0.8) + - Impactful words (launch, release, open source): +0.3 + - High engagement (>50 points, >20 comments): +0.3 + - Recent posts (<6h): +0.1 ``` ### 📊 Watch Mode Intelligence -- **Duplicate Detection** - Never process the same story twice -- **State Persistence** - Maintains history across runs -- **Smart Filtering** - Time-based and keyword filtering -- **Graceful Errors** - Continues on individual source failures + +The watch system provides robust continuous monitoring: + +- **Duplicate Detection** - Tracks seen event IDs in `state.json`, never processes duplicates +- **State Persistence** - Maintains `seenIds` set and `lastRun` timestamp across runs +- **Auto Cleanup** - Periodically trims state to last 5,000 IDs (every 10 runs) +- **Graceful Errors** - Individual source failures don't crash the entire pipeline +- **Parallel Scraping** - All sources scraped concurrently for speed +- **Signal Handling** - Clean shutdown on SIGINT/SIGTERM + +--- + +## 🔍 Technical Details + +### Event Schema + +All scraped content is normalized to a unified Event interface: + +```typescript +interface Event { + id: string; // Unique identifier (hashed from source+url+title) + source: 'hn' | 'reddit' | 'ph' | 'blog'; + title: string; + url: string; // Link to actual content + permalink: string; // Link to discussion/comments + points: number; + comments: number; + author?: string; + subreddit?: string; // For Reddit posts + domain?: string; // Extracted domain (e.g., "github.com") + created_at: string; // ISO timestamp + summary?: string; // AI-generated or fallback + score?: number; // Calculated relevance score (0-1) + why_matters?: string; // Why this event is significant +} +``` + +### Bot Configuration Modes + +The `config.ts` module provides three preset bot modes with sensible defaults: + +- **`ai`** - AI/ML content (default: MachineLearning, LocalLLaMA, Artificial subreddits) +- **`devtools`** - Developer tools (default: programming, webdev, devops subreddits) +- **`startup`** - Startup/business (default: startups, entrepreneur subreddits) + +Each mode provides default sources, include/exclude keywords that can be overridden in your config YAML. + +### Output Files + +When you run a bot, it generates: + +1. **`digest.md`** - Human-readable markdown summary with top events +2. **`events.jsonl`** - Machine-readable JSONL with full event data + scores +3. **`deck.pdf`** (if `--deck` flag) - PDF presentation slides with theme styling +4. **`state.json`** (watch mode) - Persistent state with seen event IDs and last run timestamp + +All files are written to the output directory (default: `./out`, configurable via `--out`). + +### Hyperbrowser Integration + +The project uses `@hyperbrowser/sdk` for web scraping: + +```typescript +import { Hyperbrowser } from '@hyperbrowser/sdk'; + +const hbClient = new Hyperbrowser({ + apiKey: process.env.HYPERBROWSER_API_KEY +}); + +// Each scraper uses the session API for reliable extraction +const session = await hbClient.session.start({ + url: targetUrl +}); +``` + +### Error Handling + +The pipeline is designed for resilience: +- Individual source scraping failures are logged but don't stop execution +- OpenAI API failures fall back to heuristic summaries +- Invalid events are filtered during normalization +- Watch mode continues even if a single run fails --- diff --git a/hb-predict/README.md b/hb-predict/README.md index 8bc4f71..94a6f38 100644 --- a/hb-predict/README.md +++ b/hb-predict/README.md @@ -1,191 +1,106 @@ -# HB-Predict: Tech Signal Detection & Prediction CLI - **Built with [Hyperbrowser](https://hyperbrowser.ai)** -A powerful TypeScript CLI that detects emerging tech signals from live web sources (Hacker News + Reddit), scores them using sophisticated algorithms, clusters near-duplicates, and generates human-ready predictions powered by Hyperbrowser's web extraction capabilities. +# hb-predict -## 🚀 Quick Start +Tech signal detection and prediction CLI. Scrapes Hacker News and Reddit to identify emerging tech trends, scores them using sophisticated algorithms, and generates AI-powered predictions. -### 1. Get an API Key -Get your Hyperbrowser API key at https://hyperbrowser.ai +## Why Hyperbrowser? -### 2. Environment Setup -```bash -# Required -export HYPERBROWSER_API_KEY="your_api_key_here" +[Hyperbrowser](https://hyperbrowser.ai) is the **Internet for AI** — purpose-built for developers creating AI agents and automating web tasks. Skip the infrastructure headaches and focus on building. -# Optional (for LLM-powered predictions) -export OPENAI_API_KEY="your_openai_key_here" -``` +## Quick Start -### 3. Install Dependencies -```bash -npm install -``` +1. **Get your API key**: https://hyperbrowser.ai +2. **Install**: `npm install` +3. **Configure**: Add `HYPERBROWSER_API_KEY` to `.env` (optional: `OPENAI_API_KEY` for AI predictions) +4. **Run**: `npm run dev` + +## Usage -### 4. Run Analysis ```bash -# Basic AI mode analysis +# Basic AI mode analysis (default) npm run dev -# Custom analysis -npm start -- --mode ai --subs r/MachineLearning,r/LocalLLaMA --window 24h --top 10 - -# Advanced usage -npx ts-node hb-predict.ts --sources hn,reddit --mode devtools --window 48h --top 15 --out ./results -``` - -## 📊 Features +# Custom analysis with specific subreddits +npx ts-node hb-predict.ts --mode ai --subs r/MachineLearning,r/LocalLLaMA -### Multi-Source Data Collection -- **Hacker News**: Front page + newest posts with points, comments, and metadata -- **Reddit**: Configurable subreddits with upvotes, comments, and author data -- **Hyperbrowser-First**: Uses official Hyperbrowser SDK for all web extraction +# Different mode presets +npx ts-node hb-predict.ts --mode crypto --window 48h --top 15 -### Intelligent Scoring System -- **Velocity**: Z-score based momentum calculation per source -- **Cross-Source**: Bonus for topics appearing across multiple platforms -- **Authority**: Reputation scoring for domains and authors -- **Novelty**: Penalizes similar content from recent history -- **Impact Hints**: Detects launch/funding/acquisition keywords +# Custom output directory +npx ts-node hb-predict.ts --mode devtools --out ./results +``` -### Smart Clustering -- TF-IDF cosine similarity for grouping related events -- Automatic keyword extraction and deduplication -- Configurable similarity thresholds +## Features -### AI-Powered Predictions -- OpenAI integration for nuanced trend analysis -- Heuristic fallback when API unavailable -- Confidence scoring and citation generation +✨ **Multi-source scraping** with Hyperbrowser SDK (Hacker News + Reddit) +🎯 **Intelligent scoring** using velocity, cross-source, and impact signals +🧠 **Smart clustering** with similarity detection and keyword extraction +🤖 **AI-powered predictions** via OpenAI (with heuristic fallback) +📊 **Multiple outputs**: Markdown reports, JSONL events, JSON clusters -## 🎯 CLI Options +## CLI Options ```bash npx ts-node hb-predict.ts [OPTIONS] Options: - --sources hn,reddit,github,ph,arxiv # Data sources (default: hn,reddit) - --subs r/MachineLearning,r/LocalLLaMA # Subreddits to scan --mode ai|crypto|devtools|fintech # Preset configurations (default: ai) - --window 24h # Time window: hours(h), days(d), minutes(m) + --subs r/MachineLearning,r/LocalLLaMA # Custom subreddits to scan + --window 24h # Time window (format: 24h, 48h, etc.) --top 10 # Number of predictions (default: 10) --out ./oracle # Output directory (default: ./oracle) - --watch # Continuous monitoring (5min intervals) - --min-karma 30 # Min Reddit user karma (default: 30) - --min-points 20 # Min HN points threshold (default: 20) ``` ### Mode Presets -- **ai**: r/MachineLearning, r/LocalLLaMA, r/artificial, r/singularity, r/ChatGPT -- **crypto**: r/CryptoCurrency, r/bitcoin, r/ethereum, r/DeFi, r/NFT -- **devtools**: r/programming, r/webdev, r/javascript, r/rust, r/golang -- **fintech**: r/fintech, r/investing, r/SecurityAnalysis, r/startups +- **ai**: r/MachineLearning, r/LocalLLaMA, r/artificial +- **crypto**: r/CryptoCurrency, r/bitcoin, r/ethereum +- **devtools**: r/programming, r/webdev, r/javascript +- **fintech**: r/fintech, r/investing, r/startups -## 📁 Output Files +## Output Files -### 1. `predictions.md` - Human-Ready Report -```markdown -# Tech Signal Predictions +The tool generates three output files in the specified directory (default: `./oracle/`): -## 1. New LLM framework gaining enterprise adoption (confidence: HIGH) -- Multiple discussions across HN and r/MachineLearning about production deployment -- Based on 8 signals across hn, reddit -- Keywords: framework, enterprise, deployment, scaling, production +### `predictions.md` - Human-Ready Report +Markdown file with ranked predictions, confidence scores, and citations. -**Citations:** -- [Company X releases enterprise LLM toolkit](https://news.ycombinator.com/item?id=123) -- [New framework simplifies LLM deployment](https://reddit.com/r/MachineLearning/...) -``` +### `events.jsonl` - Raw Scored Events +JSONL file with all scraped events and their computed scores. -### 2. `events.jsonl` - Raw Scored Events -```json -{"id":"abc123","source":"hn","title":"Revolutionary AI Framework Released","url":"https://example.com","points":245,"score":0.87,"created_at":"2024-01-15T10:30:00Z"} -{"id":"def456","source":"reddit","title":"Game-changing ML tool","url":"https://reddit.com/r/ML/...","points":156,"score":0.76,"subreddit":"MachineLearning"} -``` - -### 3. `clusters.json` - Grouped Analysis -```json -[ - { - "id": "cluster-1", - "title_hint": "Revolutionary AI Framework Released", - "events": [...], - "max_score": 0.87, - "keywords": ["framework", "ai", "released"], - "prediction": { - "claim": "AI framework adoption accelerating in enterprise", - "confidence": "high", - "citations": [...] - } - } -] -``` +### `clusters.json` - Grouped Analysis +JSON file with clustered events, keywords, and predictions. -## 🎯 Growth Use Case +## How It Works -Perfect for: -- **Tech VCs**: Spot emerging investment opportunities before they peak -- **Product Teams**: Identify trending technologies for roadmap planning -- **Market Research**: Track competitor launches and industry movements -- **Content Creators**: Generate data-driven content about tech trends -- **Developers**: Stay ahead of the curve on new tools and frameworks - -## 🔧 Technical Implementation +1. **Scrape**: Fetches latest posts from Hacker News and configured Reddit subreddits using Hyperbrowser SDK +2. **Score**: Calculates signal strength based on points, impact keywords, and cross-platform mentions +3. **Cluster**: Groups similar events using text similarity (70%+ word overlap) +4. **Predict**: Generates predictions using OpenAI (if available) or heuristic fallback +5. **Export**: Outputs markdown reports, JSONL events, and JSON clusters ### Scoring Algorithm -``` -Final Score = 0.35×Velocity + 0.25×CrossSource + 0.20×Authority + 0.10×Novelty + 0.10×ImpactHints -``` - -- **Velocity**: Z-score of points/hour within source bucket -- **CrossSource**: +0.5 for cross-platform mentions within 48h -- **Authority**: +0.25 for reputable domains, +0.15 for high-karma authors -- **Novelty**: Cosine similarity penalty vs last 14 days -- **ImpactHints**: +0.2 for launch/funding/acquisition keywords +- **Base Score**: Normalized points (capped at 100) +- **Impact Bonus**: +0.3 for keywords like "launch", "funding", "released" +- **Cross-Source Bonus**: +0.2 when same domain appears on multiple platforms -### Rate Limiting & Ethics -- Staggered API calls with 1s delays -- Respects robots.txt and platform guidelines -- Configurable thresholds to avoid spam/low-quality content - -## 🚦 Examples +## Examples ```bash # Monitor AI trends with 48h lookback npx ts-node hb-predict.ts --mode ai --window 48h --top 15 -# Track crypto markets with custom subreddits -npx ts-node hb-predict.ts --mode crypto --subs r/CryptoCurrency,r/ethereum --window 12h - -# Continuous monitoring for devtools -npx ts-node hb-predict.ts --mode devtools --watch --out ./monitoring - -# High-signal only analysis -npx ts-node hb-predict.ts --min-points 50 --min-karma 100 --top 5 -``` - -## 🛠 Development - -```bash -# Install dependencies -npm install - -# Run with development settings -npm run dev +# Track crypto markets +npx ts-node hb-predict.ts --mode crypto --window 12h -# Manual execution -npx ts-node hb-predict.ts --help +# Custom subreddit analysis +npx ts-node hb-predict.ts --subs r/rust,r/golang --top 5 ``` -## 📈 Future Enhancements +## Use Cases -- GitHub stars delta tracking -- Product Hunt integration -- Slack webhook notifications -- Historical trend analysis -- Custom domain authority scoring +**Perfect for**: Tech VCs spotting investment opportunities, product teams tracking trends, market researchers monitoring competitors, content creators finding data-driven topics, and developers staying ahead of the curve. --- -Follow @hyperbrowser for updates. +🚀 **Scale your AI development** with [Hyperbrowser](https://hyperbrowser.ai) | Follow @hyperbrowser diff --git a/hb-ui-bot/README.md b/hb-ui-bot/README.md index bc5dff6..4b38bb2 100644 --- a/hb-ui-bot/README.md +++ b/hb-ui-bot/README.md @@ -1,28 +1,30 @@ -# HB UI Bot 🎨📸 +# HB UI Bot -A powerful UI analysis tool that captures website screenshots using Hyperbrowser's official SDK and leverages OpenAI Vision for intelligent design analysis and improvement suggestions. +**Built with [Hyperbrowser](https://hyperbrowser.ai)** + +A powerful UI analysis tool that captures website screenshots using Hyperbrowser's official SDK and leverages OpenAI Vision (GPT-4o) for intelligent design analysis and improvement suggestions. ## Features -- 📸 **Screenshot Capture**: Uses Hyperbrowser's official screenshot API to capture actual website visuals -- 🧠 **AI Vision Analysis**: OpenAI GPT-4o analyzes screenshots for comprehensive UI/UX insights -- 🎨 **Color Detection**: Automatically extracts colors from HTML and visual analysis -- 🖋️ **Font Analysis**: Identifies typography choices and hierarchy -- 💡 **Smart Suggestions**: AI-powered improvement recommendations -- 🔍 **Accessibility Review**: Identifies potential accessibility issues -- 📊 **Flexible Output**: Human-readable or JSON format -- 🎯 **Visual Analysis**: Analyzes actual appearance, not just code structure +- **Screenshot Capture**: Uses Hyperbrowser's official scrape API with screenshot format +- **AI Vision Analysis**: OpenAI GPT-4o analyzes screenshots for comprehensive UI/UX insights +- **Color Detection**: Automatically extracts colors from HTML (hex, rgb, rgba) +- **Font Analysis**: Identifies typography choices and font families +- **Smart Suggestions**: AI-powered improvement recommendations +- **Accessibility Review**: Identifies potential accessibility issues +- **Flexible Output**: Human-readable or JSON format +- **Visual Analysis**: Analyzes actual appearance, not just code structure -## What's New +## What It Does -🚀 **Now with Visual Analysis!** Instead of just parsing CSS, the bot captures actual screenshots and uses OpenAI's vision capabilities to provide intelligent analysis of: +Instead of just parsing CSS, HB UI Bot captures actual screenshots and uses OpenAI's vision capabilities to provide intelligent analysis of: - **Visual Design**: Layout, composition, and visual hierarchy - **Color Schemes**: Actual colors as they appear to users - **Typography**: Font choices and text readability - **UI/UX Issues**: Real usability problems visible in screenshots - **Accessibility**: Visual accessibility concerns -- **Modern Design Trends**: Current best practices +- **Modern Design Trends**: Current best practices and recommendations ## Installation @@ -30,14 +32,14 @@ A powerful UI analysis tool that captures website screenshots using Hyperbrowser npm install ``` -## Configuration - -### Getting API Keys +## Prerequisites -1. **Hyperbrowser API Key**: Get your API key from [https://hyperbrowser.ai](https://hyperbrowser.ai) -2. **OpenAI API Key**: Get your API key from [https://openai.com/api](https://openai.com/api) (optional, only needed for AI analysis) +- Node.js 16+ +- TypeScript +- Hyperbrowser API key (get at [hyperbrowser.ai](https://hyperbrowser.ai)) +- OpenAI API key (optional, for AI analysis features) -### Setting Up Environment Variables +## Configuration Set up your API keys as environment variables: @@ -49,8 +51,8 @@ export HYPERBROWSER_API_KEY="your_hyperbrowser_api_key_here" export OPENAI_API_KEY="your_openai_api_key_here" ``` -Or create a `.env` file: -``` +Or create a `.env` file in the project root: +```env HYPERBROWSER_API_KEY=your_hyperbrowser_api_key_here OPENAI_API_KEY=your_openai_api_key_here ``` @@ -72,19 +74,21 @@ npx ts-node hb-ui-bot.ts --url https://example.com --analyze npx ts-node hb-ui-bot.ts --url https://example.com --json ``` -### Full Analysis with Custom API Keys +### With Custom API Keys ```bash npx ts-node hb-ui-bot.ts --url https://example.com --key YOUR_HB_KEY --openai-key YOUR_OPENAI_KEY --analyze ``` -## Command Line Options +## CLI Options -- `-u, --url`: Target URL to analyze (required) -- `-k, --key`: Hyperbrowser API key (or use `HYPERBROWSER_API_KEY` env var) -- `--openai-key`: OpenAI API key (or use `OPENAI_API_KEY` env var) -- `--json`: Output results in JSON format -- `-a, --analyze`: Use OpenAI Vision to analyze the screenshot and provide insights -- `--help`: Show help information +``` +-u, --url Target URL to analyze (required) +-k, --key Hyperbrowser API key (or use HYPERBROWSER_API_KEY env var) +--openai-key OpenAI API key (or use OPENAI_API_KEY env var) +--json Output results in JSON format +-a, --analyze Use OpenAI Vision to analyze the screenshot +--help Show help information +``` ## Example Output @@ -154,46 +158,57 @@ npx ts-node hb-ui-bot.ts --url https://example.com --key YOUR_HB_KEY --openai-ke } ``` -## Technical Implementation +## How It Works + +### 1. Screenshot Capture +Uses Hyperbrowser's official SDK `scrape.startAndWait()` method: +```typescript +const result = await hb.scrape.startAndWait({ + url: targetUrl, + scrapeOptions: { + formats: ['screenshot', 'html'], // Captures both visual and structural data + timeout: 30000, // 30 second timeout + waitFor: 2000 // Wait 2s for page to fully render + } +}); +``` -### Screenshot Capture -Uses Hyperbrowser's official `scrape.startAndWait()` method with: -- `formats: ['screenshot', 'html']` - captures both visual and structural data -- `timeout: 30000` - allows enough time for page loading -- `waitFor: 2000` - ensures page is fully rendered +### 2. Color & Font Extraction +- **Colors**: Extracted from HTML using regex patterns (hex, rgb, rgba) +- **Fonts**: Parsed from font-family CSS declarations +- Results are deduplicated and returned as arrays -### AI Vision Analysis +### 3. AI Vision Analysis (Optional) +When `--analyze` flag is used: - **Model**: OpenAI GPT-4o with vision capabilities -- **Input**: Base64-encoded screenshot + analysis prompt -- **Output**: Comprehensive UI/UX analysis and recommendations +- **Input**: Base64-encoded screenshot + structured analysis prompt +- **Analysis includes**: Visual design, colors, typography, UI/UX issues, improvements, accessibility, and modern design trends +- Screenshot data is automatically converted from URL to base64 if needed -### Color & Font Detection -- **Colors**: Extracted from HTML using regex patterns for hex, rgb, and rgba values -- **Fonts**: Parsed from font-family declarations in the HTML -- **Enhanced**: AI provides additional visual color analysis from screenshots +## Architecture -## Requirements - -- Node.js 16+ -- TypeScript -- Hyperbrowser API key with screenshot permissions -- OpenAI API key (optional, for visual analysis features) +Single TypeScript file (`hb-ui-bot.ts`) with: +- **CLI Framework**: `yargs` for argument parsing +- **Web Scraping**: `@hyperbrowser/sdk` for screenshot capture +- **AI Integration**: `openai` package for GPT-4o vision analysis +- **Terminal UI**: `chalk` for colored output formatting ## Error Handling -The tool provides helpful error messages and tips: -- Missing API keys +The tool provides helpful error messages for common issues: +- Missing or invalid API keys - Authentication failures -- Screenshot capture issues +- Screenshot capture problems - Network timeouts - Invalid URLs +- OpenAI API errors -## Performance Notes +## Performance -- Screenshot capture typically takes 3-5 seconds -- AI analysis adds 2-3 seconds +- Screenshot capture: ~3-5 seconds +- AI analysis (optional): ~2-3 seconds - Total processing time: 5-10 seconds per URL -- Results are cached for the session +- No caching between runs ## License diff --git a/hyper-train/README.md b/hyper-train/README.md index 958102c..f27718d 100644 --- a/hyper-train/README.md +++ b/hyper-train/README.md @@ -4,9 +4,13 @@ A powerful single-file TypeScript CLI tool that uses Hyperbrowser's official SDK to scrape URLs and create LLM-ready training datasets. Perfect for building custom datasets for fine-tuning language models. -## 🚀 Features +## Why Hyperbrowser? -- **Official Hyperbrowser SDK** - Uses the official @hyperbrowser/sdk for reliable web scraping +[Hyperbrowser](https://hyperbrowser.ai) is the **Internet for AI** — purpose-built for developers creating AI agents and automating web tasks. Skip the infrastructure headaches and focus on building. + +## Features + +- **Official Hyperbrowser SDK** - Uses the official `@hyperbrowser/sdk` for reliable web scraping - **Multiple Output Formats** - Generate JSONL and Markdown datasets - **Smart Chunking** - Paragraph-aware text splitting with configurable chunk sizes - **Embeddings Generation** - Optional OpenAI embeddings for semantic search @@ -15,7 +19,7 @@ A powerful single-file TypeScript CLI tool that uses Hyperbrowser's official SDK - **Streaming Writes** - Memory-efficient handling of large datasets - **Progress Logging** - Clear feedback during dataset creation -## 📦 Installation +## Installation 1. **Get an API key** from [https://hyperbrowser.ai](https://hyperbrowser.ai) @@ -30,7 +34,7 @@ export HYPERBROWSER_API_KEY="your_hyperbrowser_api_key" export OPENAI_API_KEY="your_openai_api_key" # Only needed for --embed or --qa ``` -## 🏃‍♂️ Quick Start +## Quick Start 1. Create a file with URLs (one per line): ```bash @@ -47,7 +51,7 @@ ts-node hypertrain.ts --input urls.txt ts-node hypertrain.ts --input urls.txt --out ./dataset --format jsonl,md --embed --qa 2 --tag "training-data" ``` -## 🔧 Usage +## Usage ``` ts-node hypertrain.ts [options] @@ -65,7 +69,7 @@ Options: --help, -h Show help message ``` -## 📊 Output Formats +## Output Formats ### JSONL Dataset (`dataset.jsonl`) ```json @@ -105,19 +109,19 @@ Options: ### Markdown Output Individual `.md` files per scraped page with structured content and metadata. -## 🎯 Use Cases +## Use Cases - **Fine-tuning Models** - Create custom training datasets for domain-specific models - **Knowledge Base Creation** - Build searchable knowledge bases with embeddings - **Content Analysis** - Generate structured datasets for content research - **QA System Training** - Create question-answer pairs for chatbot training -## 🔐 Environment Variables +## Environment Variables - `HYPERBROWSER_API_KEY` - Your Hyperbrowser API key (required) - `OPENAI_API_KEY` - Your OpenAI API key (required for --embed or --qa) -## 📋 Examples +## Examples ### Basic Dataset Creation ```bash @@ -147,7 +151,7 @@ ts-node hypertrain.ts \ ts-node hypertrain.ts --input urls.txt --qa 2 --finetune openai ``` -## 🛠️ Development +## Development Run with TypeScript directly: ```bash @@ -160,10 +164,26 @@ npm run build node hypertrain.js --input urls.txt ``` -## 📄 License +## Architecture + +This is a single-file TypeScript CLI tool with the following structure: + +- **`hypertrain.ts`** - Main CLI tool with HyperTrain class + - URL scraping using Hyperbrowser SDK + - Smart paragraph-aware text chunking + - OpenAI embeddings generation + - QA pair generation with GPT-3.5 + - Multiple output format support (JSONL, Markdown) + - Concurrent processing for performance + +- **`test-urls.txt`** - Sample URL list for testing +- **`package.json`** - Dependencies and scripts +- **`tsconfig.json`** - TypeScript configuration + +## License ISC --- -Follow [@hyperbrowser](https://x.com/hyperbrowser) for updates. +Built with [Hyperbrowser](https://hyperbrowser.ai) | Follow [@hyperbrowser](https://x.com/hyperbrowser) for updates diff --git a/link-sniper-bot/README.md b/link-sniper-bot/README.md index da525ad..d1ac096 100644 --- a/link-sniper-bot/README.md +++ b/link-sniper-bot/README.md @@ -1,15 +1,175 @@ # Link Sniper Bot -Automatically scan any webpage to find and check all external links for broken ones. +**Built with [Hyperbrowser](https://hyperbrowser.ai)** -## Setup +A command-line tool that scans any webpage to find and check all external links for broken ones. It extracts all hyperlinks from a page using Hyperbrowser's scraping capabilities, then systematically checks each link's HTTP status to identify broken links, bot-protected sites, and potential issues. -1. Get your API key from https://hyperbrowser.ai -2. Create a `.env` file with: +## Features + +- Extracts all external links from any webpage using Hyperbrowser SDK +- Intelligent status checking with HEAD and GET fallback requests +- Color-coded terminal output for easy identification of issues +- Smart categorization: + - **Working** (200-299): Links that respond successfully + - **Broken** (404, 410, 500+): Actually broken links that need fixing + - **Bot-blocked** (403): Links protected by anti-bot systems (likely work in browsers) + - **Suspicious**: Other status codes that warrant manual checking + - **Unknown**: No response (timeouts or connection issues) +- Detailed summary with status code breakdown +- Browser-like headers to maximize compatibility +- Rate-limited requests to be respectful to servers + +## Prerequisites + +- Node.js (v18 or later) +- **Hyperbrowser API Key**: Get yours at [hyperbrowser.ai](https://hyperbrowser.ai) + +## Quick Start + +1. **Install dependencies:** + ```bash + npm install + ``` + +2. **Set up environment variables:** + Create a `.env` file: + ```env + HYPERBROWSER_API_KEY=your_hyperbrowser_api_key + ``` + +3. **Run the bot:** + ```bash + npm start + ``` + +4. **Enter a URL when prompted:** ``` - HYPERBROWSER_API_KEY=your_api_key_here + Enter the URL to scan: https://example.com ``` -3. Install dependencies: `npm install` -4. Run: `npm start` -Enter any URL and the bot will extract all external links, check their status, and categorize them as working, broken, bot-blocked, or suspicious. +The bot will scrape the page, extract all external links, check their status, and provide a detailed summary. + +## Usage Example + +```bash +$ npm start + +Enter the URL to scan: https://example.com + +🔍 Scraping page and extracting links... + +Found 47 unique external links. Checking status... + +Checking: https://www.example-site.com +200 → https://www.example-site.com +Checking: https://broken-link.com/page +404 → https://broken-link.com/page (actually broken) +Checking: https://bot-protected.com +403 → https://bot-protected.com (bot protection - likely works in browser) +... + +📊 Summary for 47 links: +✅ Working: 40 +❌ Actually broken: 3 +🚫 Bot-blocked: 2 +❓ Unknown (no response): 2 +⚠️ Suspicious: 0 + +💡 Note: Links marked as "unknown" or "bot-blocked" might work fine in a real browser + +Status code breakdown: + 200: 38 links + 301: 2 links + 403: 2 links + 404: 3 links + ? (no response): 2 links +``` + +## How It Works + +1. **Scrape**: Uses Hyperbrowser SDK to fetch the HTML content of the target page +2. **Extract**: Parses HTML to find all `` tags with absolute URLs (http:// or https://) +3. **Deduplicate**: Removes duplicate links to avoid redundant checks +4. **Check**: For each unique link: + - Tries HEAD request first (faster, less bandwidth) + - Falls back to GET request if HEAD fails or returns 405 + - Uses browser-like headers to maximize compatibility + - Applies 10-second timeout per request +5. **Categorize**: Intelligently categorizes each link based on HTTP status code +6. **Report**: Displays color-coded results and generates summary statistics + +## Code Structure + +**Main file**: `index.ts` + +Key components: +- `askURL()` - Interactive CLI prompt for URL input +- `extractLinksFromHTML()` - Regex-based link extraction with HTML entity decoding +- `checkLinkStatus()` - HTTP status checker with HEAD/GET fallback +- `getStatusMessage()` - Status code interpretation and categorization +- `isBroken()` - Determines if a link is actually broken (404, 410, 500+) +- `main()` - Orchestrates the entire link checking workflow + +**Dependencies**: +- `@hyperbrowser/sdk` - Web scraping via official Hyperbrowser SDK +- `chalk` - Terminal colors for better readability +- `commander` - CLI framework (imported but not actively used in current version) +- `dotenv` - Environment variable management +- `tsx` - TypeScript execution without compilation + +## Understanding Status Codes + +The bot uses smart heuristics to categorize link status: + +| Status Code | Category | Meaning | +|-------------|----------|---------| +| 200-299 | Working | Link is accessible and responding normally | +| 403 | Bot-blocked | Anti-bot protection (likely works in real browsers) | +| 404 | Broken | Page not found - needs fixing | +| 410 | Broken | Page permanently gone - needs fixing | +| 500+ | Broken | Server error - may be temporary or permanent | +| No response | Unknown | Timeout or connection failure - check manually | +| Others | Suspicious | Unusual status - warrants manual verification | + +## Configuration + +The bot is designed to work out-of-the-box, but you can modify the code to customize: + +- **Timeout duration**: Change the 10000ms timeout in `checkLinkStatus()` +- **Rate limiting**: Adjust the 500ms delay between requests in the main loop +- **Status categorization**: Modify `getStatusMessage()` to change how status codes are interpreted +- **User-Agent**: Update headers in `checkLinkStatus()` to simulate different browsers + +## Important Notes + +- Only checks external links (http:// or https://) +- Internal relative links are ignored +- Links marked as "bot-blocked" (403) may work fine in a real browser +- "Unknown" status doesn't always mean broken - could be firewall, timeout, or anti-bot +- Uses a 500ms delay between requests to be respectful to servers +- HEAD requests are preferred over GET to save bandwidth + +## Troubleshooting + +**"HYPERBROWSER_API_KEY environment variable not set"** +- Ensure `.env` file exists in the project root +- Verify the API key is correct and not expired + +**"No HTML content received"** +- The target site may be blocking scraping attempts +- Try a different URL or check if the site is accessible + +**Many links showing as "unknown"** +- Some sites have strict firewall rules or rate limiting +- These links may work fine in a browser +- Consider increasing the timeout or reducing request frequency + +## Resources + +- Hyperbrowser Documentation: [https://docs.hyperbrowser.ai](https://docs.hyperbrowser.ai) +- Hyperbrowser Discord: [https://discord.gg/zsYzsgVRjh](https://discord.gg/zsYzsgVRjh) +- Support: info@hyperbrowser.ai + +--- + +Follow [@hyperbrowser](https://twitter.com/hyperbrowser) for updates diff --git a/llm-crawl/README.md b/llm-crawl/README.md index d947158..49cae2e 100644 --- a/llm-crawl/README.md +++ b/llm-crawl/README.md @@ -1,119 +1,134 @@ -# LLMCrawl 🕷️🤖 - **Built with [Hyperbrowser](https://hyperbrowser.ai)** +# LLMCrawl + A powerful CLI tool that combines Hyperbrowser's official Crawl API with Large Language Models to fetch structured web data and process it intelligently. Perfect for growth engineering, research, and data extraction tasks. -## ✨ Features +## Why Hyperbrowser? -- 🕷️ **Web Crawling**: Uses Hyperbrowser's official SDK with `client.crawl.startAndWait()` -- 🤖 **LLM Processing**: Integrates OpenAI GPT models for intelligent data processing -- 📊 **Multiple Output Formats**: Markdown, JSON, JSONL, and FAISS embeddings -- 🎯 **Smart Extraction**: Automatically adapts processing based on your instruction -- 🚀 **Growth-Ready**: Built for scaling content analysis and data extraction workflows +[Hyperbrowser](https://hyperbrowser.ai) is the **Internet for AI** — purpose-built for developers creating AI agents and automating web tasks. Skip the infrastructure headaches and focus on building. -## 🔧 Installation +## Features -1. Clone and install dependencies: -```bash -npm install -``` +- **Web Crawling**: Uses Hyperbrowser's official SDK with `client.crawl.startAndWait()` +- **LLM Processing**: Integrates OpenAI GPT models for intelligent data processing +- **Multiple Output Formats**: Markdown, JSON, JSONL, and FAISS embeddings +- **Smart Extraction**: Automatically adapts processing based on your instruction +- **Context-Aware**: Intelligently extracts domains from natural language instructions +- **Growth-Ready**: Built for scaling content analysis and data extraction workflows -2. **Get an API key** at [https://hyperbrowser.ai](https://hyperbrowser.ai) +## Quick Start -3. Set up environment variables: -```bash -cp .env.example .env -# Edit .env with your API keys -``` +1. **Get your API key**: https://hyperbrowser.ai +2. **Install**: `npm install` +3. **Configure**: Add `HYPERBROWSER_API_KEY` and `OPENAI_API_KEY` to `.env` +4. **Run**: `npx tsx cli.ts "your instruction here"` -## 🚀 Quick Start +## Installation ```bash # Install dependencies npm install # Set environment variables -export HYPERBROWSER_API_KEY="your_key_here" -export OPENAI_API_KEY="your_key_here" +export HYPERBROWSER_API_KEY="your_hyperbrowser_api_key" +export OPENAI_API_KEY="your_openai_api_key" -# Run examples +# Run an example npx tsx cli.ts "Find all AI startup launches in 2025 from techcrunch.com and summarize in 3 bullets" ``` -## 💡 Usage Examples +## Usage Examples -### 1. Startup News Summary +### Startup News Summary ```bash -llmcrawl "Find all AI startup launches in 2025 from techcrunch.com and summarize in 3 bullets" +npx tsx cli.ts "Find all AI startup launches in 2025 from techcrunch.com and summarize in 3 bullets" ``` -→ Crawls TechCrunch, extracts startup news, returns markdown summary +Crawls TechCrunch, extracts startup news, returns markdown summary -### 2. Product Reviews Analysis +### Product Reviews Analysis ```bash -llmcrawl "Collect 50 reviews of iPhone 16 Pro from bestbuy.com, return JSONL with {rating, pros, cons, sentiment}" --json -o reviews.jsonl +npx tsx cli.ts "Collect 50 reviews of iPhone 16 Pro from bestbuy.com, return JSONL with {rating, pros, cons, sentiment}" --json -o reviews.jsonl ``` -→ Crawls BestBuy reviews, extracts structured data, saves as JSONL +Crawls BestBuy reviews, extracts structured data, saves as JSONL -### 3. Research Paper Database +### Research Paper Database ```bash -llmcrawl "Crawl arxiv.org for latest multimodal LLM papers and export FAISS db" -o papers.bin +npx tsx cli.ts "Crawl arxiv.org for latest multimodal LLM papers and export FAISS db" -o papers.bin ``` -→ Crawls ArXiv, creates searchable embeddings database +Crawls ArXiv, creates searchable embeddings database -## 🎛️ CLI Options +## CLI Options ```bash -llmcrawl [options] +npx tsx cli.ts [options] Options: --json Output results in JSON format - -o, --out Save output to file - -m, --model OpenAI model (default: gpt-4-turbo-preview) + -o, --out Save output to file + -m, --model OpenAI model (default: gpt-4o) -v, --verbose Enable verbose logging -h, --help Show help ``` -## 🔑 Environment Variables +## Environment Variables + +Create a `.env` file or export these variables: ```bash -HYPERBROWSER_API_KEY # Get at https://hyperbrowser.ai -OPENAI_API_KEY # Get at https://platform.openai.com +HYPERBROWSER_API_KEY=your_hyperbrowser_api_key # Get at https://hyperbrowser.ai +OPENAI_API_KEY=your_openai_api_key # Get at https://platform.openai.com ``` -## 🏗️ Architecture +## How It Works + +1. **Domain Extraction**: Intelligently extracts website URLs from natural language instructions +2. **Web Crawling**: Uses Hyperbrowser SDK's `client.crawl.startAndWait()` to scrape content with configurable options: + - Follows internal links to discover relevant pages + - Filters content based on instruction keywords + - Removes navigation, ads, and other non-content elements +3. **Content Processing**: Routes crawled data to OpenAI with context-aware system prompts optimized for different tasks (summarization, reviews, research, etc.) +4. **Output Formatting**: Automatically detects desired output format from instruction and saves or displays results + +## Architecture + +The project is organized into three main modules: + +- **`cli.ts`** - Commander-based CLI entrypoint with argument parsing and output handling +- **`crawl.ts`** - CrawlService class wrapping Hyperbrowser SDK for intelligent web scraping +- **`llm.ts`** - LLMService class managing OpenAI API calls for content processing and embeddings + +## Output Formats -- **`crawl.ts`** → Hyperbrowser integration with official SDK -- **`llm.ts`** → OpenAI integration for content processing -- **`cli.ts`** → Commander-based CLI entrypoint +- **Markdown** (default): Human-readable terminal output with formatting +- **JSON/JSONL**: Structured data with `--json` flag or when specified in instruction +- **FAISS Database**: Vector embeddings for semantic search (requires `faiss-node` dependency) -## 🎯 Growth Use Cases +## Use Cases -- **Content Research**: Auto-generate social media content from trending topics -- **Competitor Analysis**: Extract and analyze competitor product data -- **Lead Generation**: Scrape and qualify prospects from industry sites -- **Market Research**: Gather insights from review sites and forums -- **SEO Content**: Generate blog ideas from trending searches and discussions +**Perfect for**: Growth engineering, content research, competitor analysis, lead generation, market research, data extraction, automated scraping workflows. -## 🔄 Development +## Development ```bash # Development mode npm run dev "your instruction here" -# Build +# Build TypeScript npm run build -# Run built version +# Run built version node dist/cli.js "your instruction here" ``` -## 📊 Output Formats +## Technical Stack -- **Markdown** (default): Human-readable terminal output -- **JSON/JSONL**: Structured data with `--json` flag -- **FAISS Database**: Vector embeddings for semantic search +- **[@hyperbrowser/sdk](https://www.npmjs.com/package/@hyperbrowser/sdk)**: Web scraping and crawling +- **OpenAI GPT-4o**: Natural language processing and embeddings +- **Commander**: CLI argument parsing +- **Chalk**: Terminal output formatting +- **TypeScript**: Type-safe development with ES modules --- -Follow [@hyperbrowser](https://x.com/hyperbrowser) for updates. +🚀 **Scale your AI development** with [Hyperbrowser](https://hyperbrowser.ai) | Follow [@hyperbrowser](https://x.com/hyperbrowser) diff --git a/meta-scraper/README.md b/meta-scraper/README.md index e4c5dcc..832f0e2 100644 --- a/meta-scraper/README.md +++ b/meta-scraper/README.md @@ -1,64 +1,53 @@ -# Meta Scraper Tool +# Meta Scraper -A powerful web scraping tool that extracts and analyzes meta tags from websites using AI. This tool combines Hyperbrowser's web scraping capabilities with OpenAI's GPT-4 to provide comprehensive meta tag analysis and insights. +**Built with [Hyperbrowser](https://hyperbrowser.ai)** + +AI-powered meta tag extraction and analysis tool. Scrape any website with Hyperbrowser SDK and analyze meta tags (Open Graph, Twitter Cards, SEO metadata) using GPT-4. Perfect for SEO audits, social media preview generation, and competitive analysis. ## Features -- 🌐 Interactive URL input via terminal -- 🔍 Reliable web scraping with Hyperbrowser -- 🤖 AI-powered meta tag analysis using GPT-4 -- 📊 Structured JSON output with: - - Page title and description - - Open Graph tags (title, description, image) - - Twitter Card information - - AI-generated summary and use cases +- 🌐 **Interactive CLI**: Simple terminal-based URL input +- 🔍 **Smart Scraping**: Uses Hyperbrowser SDK `scrape.startAndWait()` for reliable extraction +- 🤖 **AI Analysis**: GPT-4 powered insights on meta tags and site purpose +- 📊 **Structured Output**: Clean JSON with Open Graph, Twitter Cards, and AI-generated summaries +- ⚡ **Fast**: 30-second timeout with automatic error handling -## Prerequisites +## Get an API key -- Node.js (v14 or higher) -- TypeScript -- Hyperbrowser API key -- OpenAI API key +- Get your key at [https://hyperbrowser.ai](https://hyperbrowser.ai) +- Get OpenAI key at [https://platform.openai.com](https://platform.openai.com) ## Setup -1. **Clone or download this project** - -2. **Install dependencies** - ```bash - npm install - ``` - -3. **Get your API keys** - - Get your Hyperbrowser API key from [hyperbrowser.ai](https://hyperbrowser.ai) - - Get your OpenAI API key from OpenAI +```bash +npm install +``` -4. **Create environment file** - Create a `.env` file in the project root: - ```env - HYPERBROWSER_API_KEY=your_hyperbrowser_api_key_here - OPENAI_API_KEY=your_openai_api_key_here - ``` +Create a `.env` file in this folder: +```bash +HYPERBROWSER_API_KEY=your_key_here +OPENAI_API_KEY=your_key_here +``` -## Usage +## Quick Start -Run the tool: ```bash -npx ts-node meta-scrapper.ts -``` +# Run the interactive tool +npx tsx meta-scrapper.ts -The tool will prompt you to enter a URL: -``` -🌐 Meta Scraper Tool -Enter a single URL to analyze -URL: https://example.com +# When prompted, enter a URL: +# URL: https://example.com ``` -Enter any website URL and the tool will: -1. Scrape the website content -2. Extract meta tags -3. Analyze the content with AI -4. Return structured JSON results +## What it does + +1. **Scrape**: Uses Hyperbrowser SDK to fetch HTML content +2. **Extract**: GPT-4 parses meta tags including: + - Standard title and description + - Open Graph tags (og:title, og:description, og:image) + - Twitter Card metadata +3. **Analyze**: AI generates summary and identifies use cases +4. **Output**: Returns structured JSON to terminal ## Example Output @@ -80,19 +69,45 @@ Enter any website URL and the tool will: } ``` -## How It Works +## Use Cases + +- **SEO Audits**: Validate meta tags across multiple pages +- **Social Media Previews**: Check how links will appear when shared +- **Competitive Analysis**: Analyze competitor meta strategies +- **Growth Marketing**: Batch analyze landing page metadata +- **Content Management**: Verify meta consistency across site +- **Web Scraping**: Extract structured metadata for datasets + +## Architecture -1. **Web Scraping**: Uses Hyperbrowser to reliably scrape website content, handling JavaScript-heavy sites -2. **Meta Extraction**: Extracts various meta tags including standard HTML meta tags and Open Graph properties -3. **AI Analysis**: Uses GPT-4 to analyze the content and generate insights about the website's purpose and use cases -4. **Structured Output**: Returns clean, structured JSON data for easy integration with other tools +- **`meta-scrapper.ts`**: Main CLI tool with readline interface +- Uses `@hyperbrowser/sdk` for web scraping +- Uses `openai` SDK for GPT-4 analysis +- Automatic JSON extraction with markdown stripping ## Troubleshooting -- **"No content received from OpenAI"**: Check your OpenAI API key and ensure you have credits -- **"Failed to scrape [URL]"**: Check your Hyperbrowser API key and ensure the URL is accessible -- **JSON parsing errors**: The tool automatically handles various response formats from OpenAI +**"No content received from OpenAI"** +- Check your OpenAI API key is valid +- Ensure you have API credits available + +**"Failed to scrape [URL]"** +- Verify your Hyperbrowser API key +- Check if URL is accessible and valid +- Some sites may block automated access + +**JSON parsing errors** +- Tool automatically handles markdown code blocks +- Falls back to regex extraction if needed +- Check full error output for debugging + +## Notes + +- Uses only official Hyperbrowser SDK methods (`@hyperbrowser/sdk`) +- 30-second scrape timeout for performance +- GPT-4o model for accurate meta tag extraction +- Automatically strips markdown formatting from AI responses -## License +--- -MIT License +Follow [@hyperbrowser](https://x.com/hyperbrowser) for updates. diff --git a/o3-pro-extractor/README.md b/o3-pro-extractor/README.md new file mode 100644 index 0000000..c897a3e --- /dev/null +++ b/o3-pro-extractor/README.md @@ -0,0 +1,179 @@ +# O3-Pro Extractor + +**Built with [Hyperbrowser](https://hyperbrowser.ai)** + +A powerful web data extraction tool that combines Hyperbrowser's enterprise-grade web scraping with OpenAI's cutting-edge O3-Pro model to extract structured data from any website with unparalleled accuracy and reasoning capabilities. + +## Why O3-Pro? + +- High-quality structured data extraction with advanced reasoning +- Handles complex table structures and nested data +- Intelligent data normalization and type inference +- Proven accuracy on data extraction tasks +- Perfect for extracting structured information from dynamic web pages + +## Features + +- Web scraping with selective content filtering (include/exclude tags) +- Markdown-based content extraction for cleaner data +- Schema validation using Zod +- Structured JSON output with type safety +- Error handling and validation + +## Prerequisites + +- Node.js (v16 or higher) +- TypeScript +- Hyperbrowser API key +- OpenAI API key with O3-Pro model access + +## Setup + +1. **Install dependencies** + ```bash + npm install + ``` + +2. **Get your API keys** + - Get your Hyperbrowser API key from [hyperbrowser.ai](https://hyperbrowser.ai) + - Get your OpenAI API key with O3-Pro access from [OpenAI Platform](https://platform.openai.com) + +3. **Create environment file** + Create a `.env` file in the project root: + ```env + HYPERBROWSER_API_KEY=your_hyperbrowser_api_key_here + OPENAI_API_KEY=your_openai_api_key_here + ``` + +## Usage + +Run the extractor: +```bash +npx tsx o3-pro-extractor.ts +``` + +The tool will: +1. Scrape Wikipedia's list of largest cities +2. Extract only table data (`.wikitable` class) as markdown +3. Process the markdown with O3-Pro to extract structured city data +4. Validate the output with Zod schema +5. Save results to `cities.json` + +## How It Works + +### 1. Web Scraping +Uses Hyperbrowser's advanced scraping capabilities with selective filtering: +- **Format**: Extracts content as markdown for cleaner processing +- **Include tags**: Only scrapes `.wikitable` elements +- **Exclude tags**: Removes `img` tags to reduce noise + +### 2. Data Extraction +Leverages OpenAI's O3-Pro model for intelligent data extraction: +- Structured prompt with clear field definitions +- Schema-guided extraction for type safety +- Advanced reasoning for handling complex table structures + +### 3. Schema Validation +Uses Zod for runtime type checking: +```typescript +const CitySchema = z.object({ + city: z.string(), + country: z.string(), + population: z.number(), + rank: z.number(), +}); +``` + +## Example Output + +```json +{ + "cities": [ + { + "city": "Tokyo", + "country": "Japan", + "population": 37393128, + "rank": 1 + }, + { + "city": "Delhi", + "country": "India", + "population": 32941308, + "rank": 2 + }, + { + "city": "Shanghai", + "country": "China", + "population": 28516904, + "rank": 3 + } + ] +} +``` + +## Use Cases + +This pattern is ideal for: + +- **Market Research**: Extract competitor data, pricing tables, product specifications +- **Lead Generation**: Scrape business directories, contact information, company profiles +- **Financial Analysis**: Extract stock tables, financial statements, economic indicators +- **E-commerce**: Scrape product catalogs, reviews, specifications from multiple sources +- **Real Estate**: Extract property listings, pricing data, market statistics +- **Academic Research**: Gather structured data from research databases, publication tables + +## Customization + +### Change the Target Website +Modify the URL in the scrape configuration: +```typescript +const scrapeResult = await client.scrape.startAndWait({ + url: "your-target-url", + scrapeOptions: { + formats: ["markdown"], + includeTags: [".your-selector"], + excludeTags: ["unwanted-elements"], + }, +}); +``` + +### Customize the Extraction Schema +Update the Zod schema to match your data structure: +```typescript +const YourSchema = z.object({ + field1: z.string(), + field2: z.number(), + field3: z.array(z.string()), +}); +``` + +### Adjust the System Prompt +Modify `SYSTEM_PROMPT` to guide the AI for different extraction tasks: +```typescript +const SYSTEM_PROMPT = `Your custom instructions for data extraction...`; +``` + +## Troubleshooting + +- **"Scrape failed"**: Check your Hyperbrowser API key and ensure the target URL is accessible +- **"No markdown data found"**: Verify your `includeTags` selector matches elements on the page +- **Invalid schema**: Ensure your Zod schema matches the expected output structure +- **O3-Pro access denied**: Confirm your OpenAI API key has access to the O3-Pro model + +## Why Hyperbrowser? + +- Handles JavaScript-heavy sites automatically +- Built-in proxy rotation and CAPTCHA solving +- Reliable scraping at scale +- Selective content extraction (includeTags/excludeTags) +- Multiple output formats (HTML, markdown, text) + +## Resources + +- [Hyperbrowser Documentation](https://docs.hyperbrowser.ai) +- [OpenAI O3-Pro Documentation](https://platform.openai.com/docs/models/o3-pro) +- [Zod Documentation](https://zod.dev) + +## License + +MIT License \ No newline at end of file diff --git a/oss-web-extractor/README.md b/oss-web-extractor/README.md index 48be7ff..aad39c4 100644 --- a/oss-web-extractor/README.md +++ b/oss-web-extractor/README.md @@ -2,20 +2,25 @@ **Built with [Hyperbrowser](https://hyperbrowser.ai)** -A blazingly fast web data extractor powered by Hyperbrowser's scraping infrastructure and OpenAI's latest open-source `gpt-oss-20b` model. Extract structured data from any website with enterprise-grade reliability and **zero API costs** for AI inference. +A blazingly fast web data extractor powered by Hyperbrowser's scraping infrastructure and OpenAI's open-source `gpt-oss-20b` model hosted on Fireworks AI. Extract structured data from any website with enterprise-grade reliability and cost-effective AI inference. -## 🚀 Why This Rocks +## Why Hyperbrowser? -- ✅ **Zero AI costs** - Local inference with open-source models -- ✅ **Enterprise reliability** - Hyperbrowser handles CAPTCHAs, proxies, rate limits -- ✅ **Lightning fast** - gpt-oss-20b delivers lower latency and runs on consumer hardware -- ✅ **Fully customizable** - Modify extraction schemas for any use case -- ✅ **Growth-ready** - Built for scale with retry logic and error handling +[Hyperbrowser](https://hyperbrowser.ai) is the **Internet for AI** — purpose-built for developers creating AI agents and automating web tasks. Skip the infrastructure headaches and focus on building. + +## Features + +- **Enterprise-grade scraping** - Hyperbrowser handles CAPTCHAs, proxies, and rate limits automatically +- **Cost-effective AI** - Uses OpenAI's open-source gpt-oss-20b model via Fireworks AI +- **Lightning fast** - High-performance inference with minimal latency +- **Fully customizable** - Easily modify extraction schemas with Zod validation +- **Production-ready** - Built-in retry logic and error handling +- **Structured output** - Clean JSON output with type validation ## Prerequisites -1. **Get an API key** at https://hyperbrowser.ai -2. Install vLLM to run gpt-oss-20b locally (works on consumer hardware!) +1. **Hyperbrowser API key**: Get yours at https://hyperbrowser.ai +2. **Fireworks API key**: Sign up at https://fireworks.ai for access to gpt-oss-20b ## Setup @@ -26,21 +31,9 @@ npm install 2. Set up environment variables: ```bash -cp .env.example .env -# Add your Hyperbrowser API key -echo "HYPERBROWSER_API_KEY=your_api_key_here" > .env -``` - -3. Start the gpt-oss-120b model server: -```bash -# Install vLLM with gpt-oss support (from official Hugging Face instructions) -uv pip install --pre vllm==0.10.1+gptoss \ - --extra-index-url https://wheels.vllm.ai/gpt-oss/ \ - --extra-index-url https://download.pytorch.org/whl/nightly/cu128 \ - --index-strategy unsafe-best-match - -# Start the model server (auto-downloads from Hugging Face) -vllm serve openai/gpt-oss-120b +# Create a .env file with your API keys +echo "HYPERBROWSER_API_KEY=your_hyperbrowser_key" > .env +echo "FIREWORKS_API_KEY=your_fireworks_key" >> .env ``` ## Usage @@ -51,40 +44,174 @@ npm run start ``` The tool will: -1. Scrape Wikipedia's list of largest cities -2. Extract structured data using gpt-oss-120b -3. Save results to `cities.json` +1. Scrape Wikipedia's list of largest cities using Hyperbrowser +2. Extract structured data using gpt-oss-20b on Fireworks AI +3. Save validated results to `extracted-data.json` + +## How It Works + +The extractor follows a simple two-step pipeline: + +1. **Scraping Phase**: Hyperbrowser fetches and converts the target webpage to markdown + - Filters content using CSS selectors (e.g., `.wikitable` class) + - Removes unwanted elements (images, ads, etc.) + - Converts HTML to clean markdown format + +2. **Extraction Phase**: AI model processes markdown and extracts structured data + - Uses gpt-oss-20b via Fireworks AI for cost-effective inference + - Validates output with Zod schemas for type safety + - Implements retry logic with exponential backoff + - Saves validated JSON output to file + +## Code Structure + +``` +oss-web-extractor/ +├── oss-web-extractor.ts # Main application logic +├── package.json # Dependencies and scripts +├── tsconfig.json # TypeScript configuration +├── extracted-data.json # Output file (generated) +└── README.md # This file +``` + +### Key Components + +- **CONFIG**: Centralized configuration for API keys, model selection, and file paths +- **Zod Schemas**: `CitySchema` and `ResponseSchema` ensure type-safe data validation +- **extractDataWithRetry()**: Robust extraction with retry logic and error handling +- **main()**: Orchestrates scraping → extraction → validation → file output workflow + +## Customization + +### Change the Target URL + +```typescript +const scrapeResult = await client.scrape.startAndWait({ + url: "https://your-target-website.com", + scrapeOptions: { + formats: ["markdown"], + includeTags: [".your-css-selector"], + excludeTags: ["img", "script"], + }, +}); +``` + +### Modify the Data Schema + +```typescript +// Define your custom schema +const ProductSchema = z.object({ + name: z.string(), + price: z.number(), + rating: z.number(), + availability: z.string(), +}); + +const ResponseSchema = z.object({ + products: z.array(ProductSchema) +}); +``` -## 💡 Growth Use Cases +### Update the System Prompt -Perfect for **data-driven growth teams** who need to: +```typescript +const SYSTEM_PROMPT = `Extract product information from the markdown content. +Return data in the following format: +- name: Product name +- price: Price as a number +- rating: Rating out of 5 +- availability: Stock status`; +``` -- 📊 **Monitor competitor pricing** from e-commerce sites for dynamic pricing strategies -- 📱 **Track social media metrics** across platforms for content optimization -- 💼 **Extract job postings** for talent acquisition and market analysis -- ⭐ **Scrape product reviews** for sentiment analysis and feature insights -- 🎯 **Gather market data** for business intelligence dashboards -- 📈 **Auto-generate LinkedIn carousels** from scraped industry stats +## Example Output -## 📊 Example Output +The tool extracts structured data from Wikipedia's list of largest cities: ```json { "cities": [ { "city": "Tokyo", - "country": "Japan", - "population": 37393128, + "country": "Japan", + "population": 37468000, "rank": 1 }, { - "city": "Delhi", + "city": "Delhi", "country": "India", - "population": 32941308, + "population": 28514000, "rank": 2 + }, + { + "city": "Shanghai", + "country": "China", + "population": 25582000, + "rank": 3 } ] } ``` -Follow @hyperbrowser for updates. \ No newline at end of file +See `extracted-data.json` for the complete output with 81 cities. + +## Use Cases + +Perfect for teams who need to extract structured data at scale: + +- **E-commerce**: Monitor competitor pricing, product catalogs, and inventory +- **Real Estate**: Scrape property listings, prices, and market trends +- **Job Boards**: Extract job postings, salaries, and requirements +- **Research**: Gather datasets from public websites for analysis +- **Market Intelligence**: Track competitor features, reviews, and updates +- **Lead Generation**: Extract business directories and contact information + +## Technical Stack + +- **[@hyperbrowser/sdk](https://www.npmjs.com/package/@hyperbrowser/sdk)**: Enterprise-grade web scraping +- **Fireworks AI**: Cost-effective hosting for gpt-oss-20b model +- **Zod**: Runtime type validation and schema definition +- **TypeScript**: Type-safe development +- **dotenv**: Environment variable management + +## Environment Variables + +| Variable | Required | Description | +|----------|----------|-------------| +| `HYPERBROWSER_API_KEY` | Yes | Your Hyperbrowser API key from https://hyperbrowser.ai | +| `FIREWORKS_API_KEY` | Yes | Your Fireworks AI API key for gpt-oss-20b access | + +## Error Handling + +The tool includes robust error handling: + +- **Retry Logic**: Automatically retries failed extractions (up to 3 attempts) +- **Exponential Backoff**: Increases wait time between retries +- **Debug Output**: Saves raw responses to `debug-raw-response.txt` on parse errors +- **Validation**: Zod schemas catch data format issues before file output + +## Troubleshooting + +**"Cannot connect to gpt-oss-20b API" error**: +- Verify `FIREWORKS_API_KEY` is set correctly in `.env` +- Check your Fireworks AI account has API access enabled + +**"Scrape failed" error**: +- Ensure `HYPERBROWSER_API_KEY` is valid +- Check the target URL is accessible +- Verify your Hyperbrowser account has sufficient credits + +**JSON parse errors**: +- Check `debug-raw-response.txt` for the raw model output +- Adjust the `SYSTEM_PROMPT` to provide clearer instructions +- Modify temperature setting for more deterministic output + +**Installation issues**: +- Use Node.js v18 or higher +- Run `npm install` to ensure all dependencies are installed +- Check that TypeScript is properly installed + +--- + +**Perfect for**: Web scraping at scale, data extraction pipelines, competitive intelligence, market research automation. + +🚀 **Scale your AI development** with [Hyperbrowser](https://hyperbrowser.ai) | Follow @hyperbrowser \ No newline at end of file diff --git a/product-search/README.md b/product-search/README.md index 2f1c5fd..c3af27a 100644 --- a/product-search/README.md +++ b/product-search/README.md @@ -1,108 +1,139 @@ # Product Finder -A command-line tool built with TypeScript and Hyperbrowser to search for products, extract their information, find similar products, and track them over time. +**Built with [Hyperbrowser](https://hyperbrowser.ai)** + +A powerful command-line tool built with TypeScript and Hyperbrowser SDK to search for products, extract their information, find similar products, and track them over time. + +## Why Hyperbrowser? + +[Hyperbrowser](https://hyperbrowser.ai) is the **Internet for AI** — purpose-built for developers creating AI agents and automating web tasks. Skip the infrastructure headaches and focus on building. ## Features -- **Product Search**: Extract detailed information from any product URL -- **Similar Products**: Find similar products from Google Shopping -- **Data Tracking**: Save product details to easily track price changes -- **Automatic Refresh**: Schedule updates to keep your product data current -- **User-Friendly Interface**: Progress indicators and clear formatted output -- **OpenAI Integration**: Use OpenAI to sort products by similarity +- **Product Search**: Extract detailed information from any product URL using Hyperbrowser's Extract API +- **Similar Products**: Find similar products from Bing Shopping automatically +- **Data Tracking**: Save product details to easily track price changes over time +- **Automatic Refresh**: Schedule updates with cron jobs to keep your product data current +- **User-Friendly Interface**: Progress indicators with `ora` and clear formatted output +- **OpenAI Integration**: Optional AI-powered product similarity sorting + +## Quick Start + +1. **Get your API key**: https://hyperbrowser.ai +2. **Install**: `npm install` +3. **Configure**: Add `HYPERBROWSER_API_KEY` to `.env` (and optionally `OPENAI_API_KEY`) +4. **Build**: `npm run build` +5. **Run**: `npm run search -- --url "https://example.com/product"` ## Requirements - Node.js 18 or higher -- Hyperbrowser API key (get one at [hyperbrowser.io](https://hyperbrowser.io)) +- Hyperbrowser API key (get one at [hyperbrowser.ai](https://hyperbrowser.ai)) +- OpenAI API key (optional, for AI-powered similarity sorting) - Linux/macOS for the scheduling feature (uses crontab) ## Installation -1. Clone this repository or download the source code -2. Install dependencies: - ```bash - npm install - ``` -3. Create a `.env` file in the project root with your API key: - ``` - HYPERBROWSER_API_KEY=your_api_key_here - OPENAI_API_KEY=your_openai_api_key_here # Optional, only needed for similarity sorting - ``` -4. Build the project: - ```bash - npm run build - ``` +```bash +# Install dependencies +npm install + +# Create .env file with your API keys +echo "HYPERBROWSER_API_KEY=your_hyperbrowser_api_key" > .env +echo "OPENAI_API_KEY=your_openai_api_key" >> .env # Optional + +# Build the project +npm run build +``` ## Usage -### Search for a Product +### 1. Search for a Product Extract information about a product and find similar items: ```bash -npm run search -- --url "https://example.com/product/123" +npm run search -- --url "https://www.amazon.com/dp/B08N5WRWNW" ``` -Options: +This will: +- Extract product details (name, brand, description, price) using Hyperbrowser +- Search for similar products on Bing Shopping +- Sort results by similarity (if OpenAI API key is provided) +- Save results to `saved_products.json` + +**Options:** - `--url, -u`: Product URL (required) - `--output, -o`: Custom output file path (optional, defaults to `saved_products.json`) -### Refresh Product Data +### 2. Refresh Product Data Update the similar products for all items in your saved data: ```bash -npm run refresh -- --file "./my-products.json" +npm run refresh -- --file "./saved_products.json" ``` -Or use the default file location: -```bash -npm run refresh:default -``` - -Options: +**Options:** - `--file, -f`: Path to the saved product data file (optional, defaults to `saved_products.json`) ## Advanced Usage -### Schedule Automatic Updates +### 3. Schedule Automatic Updates -Set up a cron job to run the refresh operation periodically: +Set up a cron job to automatically refresh product data: ```bash -npm run schedule -- --interval "0 */6 * * *" --file "./my-products.json" -``` +# Custom schedule (every 6 hours) +npm run schedule -- --interval "0 */6 * * *" --file "./saved_products.json" -Or use one of the preset scheduling options: -```bash -npm run schedule:daily # Run once a day at midnight -npm run schedule:hourly # Run every hour -npm run schedule:weekly # Run once a week on Sunday +# Daily at midnight (default) +npm run schedule -- --file "./saved_products.json" ``` -Options: -- `--interval, -i`: Cron schedule expression (optional, defaults to daily at midnight) -- `--file, -f`: Path to the saved product file (optional, defaults to `saved_products.json`) +This creates a shell script and adds it to your crontab. Logs are saved to `scheduled-run.log`. -### Remove Scheduled Updates +**Options:** +- `--interval, -i`: Cron schedule expression (default: `0 0 * * *` - daily at midnight) +- `--file, -f`: Path to the saved product file (default: `saved_products.json`) -Remove the cron job when you no longer need automatic updates: +**Common cron patterns:** +- `0 */6 * * *` - Every 6 hours +- `0 0 * * *` - Daily at midnight +- `0 */1 * * *` - Every hour +- `0 0 * * 0` - Weekly on Sunday + +### 4. Remove Scheduled Updates ```bash +# Remove from crontab, keep script file npm run unschedule -``` -Or remove the job and delete the script file: -```bash -npm run unschedule:clean +# Remove from crontab and delete script file +npm run unschedule -- --delete-script ``` -Options: -- `--delete-script, -d`: Also delete the script file (optional, defaults to `false`) +## How It Works + +1. **Product Extraction**: Uses Hyperbrowser's Extract API to scrape product details from any product page +2. **Similar Product Search**: Automatically searches Bing Shopping for similar products based on product name +3. **AI Sorting** (optional): Uses OpenAI GPT-4o-mini to rank products by similarity to the original +4. **Data Persistence**: Saves results to JSON file with timestamps for tracking +5. **Scheduled Updates**: Creates shell scripts and cron jobs for automatic refreshing + +## Code Structure + +``` +src/ +├── index.ts # CLI entrypoint with Commander.js +├── product.ts # Product search and refresh logic +├── scheduler.ts # Cron job management +├── display.ts # Console output formatting +└── types.ts # Zod schemas and TypeScript types +``` -## Data Structure +## Data Format The tool stores data in JSON format with the following structure: @@ -121,14 +152,33 @@ The tool stores data in JSON format with the following structure: "brand": "Other Brand", "description": "Another great product...", "price": 89.99, - "link": "https://example.com/similar1", + "linkToProduct": "https://example.com/similar1", "onSale": true, "salePrice": 79.99 } - // More similar products... ], "lastUpdated": "2023-11-15T12:34:56.789Z" } - // More products... } -``` \ No newline at end of file +``` + +## Technologies Used + +- **[@hyperbrowser/sdk](https://www.npmjs.com/package/@hyperbrowser/sdk)** - Web scraping and data extraction +- **OpenAI GPT-4o-mini** - AI-powered similarity ranking +- **Commander.js** - CLI argument parsing +- **Zod** - Schema validation +- **Ora** - Terminal spinners and progress indicators +- **TypeScript** - Type-safe development + +## Use Cases + +- **Price Monitoring**: Track price changes for products you're interested in +- **Comparison Shopping**: Find and compare similar products across retailers +- **Market Research**: Analyze product offerings and pricing in a category +- **Deal Finding**: Monitor for sales and price drops on similar items +- **Product Discovery**: Discover alternatives to products you're researching + +--- + +Follow [@hyperbrowser](https://x.com/hyperbrowser) for updates. \ No newline at end of file diff --git a/ragzip/README.md b/ragzip/README.md index b75b06c..c145ec7 100644 --- a/ragzip/README.md +++ b/ragzip/README.md @@ -1,53 +1,154 @@ +**Built with [Hyperbrowser](https://hyperbrowser.ai)** + # RAGZip -**Built with [Hyperbrowser](https://hyperbrowser.ai)** +A powerful CLI tool that builds citation-tagged context packs for LLMs by scraping and intelligently processing web content. RAGZip extracts content from websites, ranks chunks by relevance using TF-IDF, deduplicates similar content, and fits everything within your specified token budget. + +## Why Hyperbrowser? + +[Hyperbrowser](https://hyperbrowser.ai) is the **Internet for AI** — purpose-built for developers creating AI agents and automating web tasks. Skip the infrastructure headaches and focus on building. -A single-file CLI that uses Hyperbrowser's official scrape API to extract website content and builds citation-tagged context packs for LLMs. +## Features -## Setup +📚 **Smart web scraping** with Hyperbrowser's official SDK +🎯 **TF-IDF ranking** to prioritize the most relevant content +🔄 **Automatic deduplication** using Jaccard similarity (70% threshold) +🤖 **Optional LLM compression** via OpenAI (25-60% size reduction) +📊 **Multiple output formats**: JSONL and Markdown +🏷️ **Citation tracking** with chunk-level source URLs +💾 **Token budgeting** to fit within your context window -1. **Get an API key** from [https://hyperbrowser.ai](https://hyperbrowser.ai) +## Quick Start -2. **Install dependencies** +1. **Get your API key**: https://hyperbrowser.ai +2. **Install dependencies**: ```bash npm install ``` - -3. **Set environment variables** +3. **Configure environment**: ```bash cp env.example .env # Add your keys to .env: # HYPERBROWSER_API_KEY=hb_your_key_here - # OPENAI_API_KEY=sk_your_key_here # optional for compression + # OPENAI_API_KEY=sk-your_key_here # optional, only for --llm compression ``` -## Usage +## Usage Examples +### Basic Usage ```bash -# Basic usage +# Process a single URL with default settings +npm start -- --url https://docs.example.com + +# Specify a custom token budget npm start -- --url https://example.com --budget 2000 +``` -# Multiple URLs with compression -npm start -- --url https://docs.example.com --url https://blog.example.com --llm --format md +### Multiple URLs +```bash +# Process multiple URLs in one command +npm start -- --url https://docs.example.com --url https://blog.example.com --budget 5000 -# From file -echo "https://example.com" > urls.txt +# Or use a file with newline-separated URLs +echo "https://docs.example.com" > urls.txt +echo "https://blog.example.com" >> urls.txt npm start -- --urls urls.txt --budget 5000 ``` -## Options +### With LLM Compression +```bash +# Enable OpenAI compression to reduce token usage further +npm start -- --url https://example.com --llm --budget 3000 + +# Combine with markdown output format +npm start -- --urls urls.txt --llm --format md --out my-context +``` + +## CLI Options + +| Option | Type | Default | Description | +|--------|------|---------|-------------| +| `--url ` | string[] | - | URLs to process (can be specified multiple times) | +| `--urls ` | string | - | File containing newline-separated URLs | +| `--budget ` | number | 8000 | Maximum token budget for the final pack | +| `--out ` | string | "distill" | Output directory for generated files | +| `--format ` | "jsonl" \| "md" | "jsonl" | Output format | +| `--llm` | boolean | false | Enable OpenAI-powered compression | + +## Output Files + +RAGZip generates three files in the output directory: + +### 1. `pack.jsonl` (or `pack.md`) +The main context pack containing processed chunks: +```jsonl +{"chunk":"Content here...","tokens":150,"source":"https://example.com#chunk0","rank":0.8523} +{"chunk":"More content...","tokens":200,"source":"https://example.com#chunk1","rank":0.7412} +``` + +### 2. `stats.json` +Processing statistics and metrics: +```json +{ + "pages": 2, + "raw_chunks": 150, + "kept_chunks": 40, + "raw_tokens": 12000, + "kept_tokens": 7800, + "dedupe_rate": 15.5, + "compression_ratio": 65.2 +} +``` + +### 3. `citations.md` +Source URLs for attribution: +```markdown +# Citations + +- https://docs.example.com +- https://blog.example.com +``` + +## How It Works + +1. **Scrape**: Uses Hyperbrowser SDK to fetch content as clean markdown +2. **Chunk**: Splits content into semantic chunks (max 800 tokens each) +3. **Score**: Applies TF-IDF algorithm to rank chunks by relevance +4. **Deduplicate**: Removes similar chunks using Jaccard similarity +5. **Select**: Fits top-ranked chunks within token budget +6. **Compress** (optional): Uses OpenAI to compress selected chunks +7. **Export**: Generates JSONL/MD pack with citations and stats + +## Prerequisites + +- **Node.js**: v16 or higher +- **Hyperbrowser API Key**: Required for web scraping +- **OpenAI API Key**: Optional, only needed for `--llm` compression + +## Environment Variables + +Create a `.env` file with: + +```bash +HYPERBROWSER_API_KEY=hb_your_api_key_here +OPENAI_API_KEY=sk-your_openai_key_here # Optional +``` + +## Use Cases + +- **RAG pipelines**: Create context-optimized knowledge bases for retrieval-augmented generation +- **Documentation processing**: Convert docs into LLM-ready format with citations +- **Research assistance**: Build focused context packs from multiple sources +- **Training data**: Generate curated datasets for fine-tuning -- `--url ` - URLs to process (can repeat) -- `--urls ` - File with newline-separated URLs -- `--budget ` - Token budget (default: 8000) -- `--out ` - Output directory (default: "distill") -- `--format ` - Output format (default: jsonl) -- `--llm` - Enable OpenAI compression +## Technical Details -## Output +- **Chunk size**: 20-800 tokens per chunk +- **Deduplication threshold**: 70% Jaccard similarity +- **TF-IDF scoring**: Per-chunk relevance ranking +- **Compression**: 25-60% size reduction with OpenAI (gpt-4o-mini) +- **Token estimation**: ~4 characters per token -- `pack.jsonl` - Main context pack with chunks -- `stats.json` - Processing statistics -- `citations.md` - Source URLs +--- -**Follow @hyperbrowser_ai for updates.** \ No newline at end of file +🚀 **Scale your AI development** with [Hyperbrowser](https://hyperbrowser.ai) | Follow [@hyperbrowser_ai](https://twitter.com/hyperbrowser_ai) \ No newline at end of file diff --git a/real-estate-finder/README.md b/real-estate-finder/README.md index dc2c816..36a08da 100644 --- a/real-estate-finder/README.md +++ b/real-estate-finder/README.md @@ -1,8 +1,14 @@ -# US Real Estate Finder +**Built with [Hyperbrowser](https://hyperbrowser.ai)** -An intelligent real estate bot that automatically finds and extracts property listings across the United States based on your search criteria. The bot uses SERP API to search for relevant real estate websites and Hyperbrowser to extract structured data from those sites. +# Real Estate Finder -## 🚀 Get Started in 2 Minutes +An intelligent real estate bot that automatically finds and extracts property listings across the United States based on your search criteria. The bot uses SerpAPI to search for relevant real estate websites and Hyperbrowser to extract structured data from those sites. + +## Why Hyperbrowser? + +[Hyperbrowser](https://hyperbrowser.ai) is the **Internet for AI** — purpose-built for developers creating AI agents and automating web tasks. Skip the infrastructure headaches and focus on building. + +## Quick Start Ready to find your next home? This bot does all the heavy lifting for you - no more manually browsing dozens of real estate websites! @@ -17,17 +23,20 @@ Ready to find your next home? This bot does all the heavy lifting for you - no m You'll need API keys for: -1. **Hyperbrowser**: Get your API key at [hyperbrowser.ai](https://hyperbrowser.ai) -2. **SerpAPI**: Sign up at [SerpAPI](https://serpapi.com) to get your search API key +1. **Hyperbrowser API Key**: Sign up at [https://hyperbrowser.ai](https://hyperbrowser.ai) +2. **SerpAPI Key**: Sign up at [https://serpapi.com](https://serpapi.com) to get your search API key + +## Installation -## Setup +1. Navigate to the project directory and install dependencies: -1. Clone this repository and install dependencies: ```bash +cd real-estate-finder npm install ``` -2. Create a `.env` file in the root directory with your API keys: +2. Create a `.env` file in the project root with your API keys: + ```env HYPERBROWSER_API_KEY=your_hyperbrowser_api_key_here SERPAPI_KEY=your_serpapi_key_here @@ -96,11 +105,24 @@ The bot will then: - HotPads.com - ForRent.com -## 🎯 Start Your Property Search Now! +## How It Works + +1. **User Input** - Interactive CLI prompts for search criteria (location, bedrooms, budget, property type) +2. **Web Search** - SerpAPI finds relevant real estate listing URLs from trusted platforms +3. **URL Validation** - Filters for legitimate US real estate websites (Zillow, Apartments.com, etc.) +4. **Data Extraction** - Hyperbrowser extracts structured listing data using AI-powered prompts +5. **Result Display** - Formatted output with all property details and links -1. **Get your free API keys**: [Hyperbrowser](https://hyperbrowser.ai) and [SerpAPI](https://serpapi.com) -2. **Clone and setup** this repository (takes 2 minutes) -3. **Run the bot** and find your perfect property! +## Project Structure + +``` +real-estate-finder/ +├── real-estate-bot.ts # Main application logic +├── package.json # Dependencies and scripts +├── tsconfig.json # TypeScript configuration +├── .env # Environment variables (create this) +└── README.md # This file +``` ## API Rate Limits @@ -123,15 +145,28 @@ The bot will then: - Make sure you've added `SERPAPI_KEY=your_actual_api_key` to your `.env` file - Get your free API key from [SerpAPI](https://serpapi.com/manage-api-key) -## Technical Details +## Dependencies + +- **[@hyperbrowser/sdk](https://www.npmjs.com/package/@hyperbrowser/sdk)** - Web scraping and browser automation +- **[serpapi](https://www.npmjs.com/package/serpapi)** - Google search API for finding real estate websites +- **[zod](https://www.npmjs.com/package/zod)** - TypeScript-first schema validation +- **[inquirer](https://www.npmjs.com/package/inquirer)** - Interactive command-line prompts +- **[dotenv](https://www.npmjs.com/package/dotenv)** - Environment variable management +- **[typescript](https://www.npmjs.com/package/typescript)** - TypeScript language support + +## Requirements + +- **Node.js** 18+ or later +- **TypeScript** 5.0+ (installed via npm) +- **Hyperbrowser API Key** (get at [hyperbrowser.ai](https://hyperbrowser.ai)) +- **SerpAPI Key** (get at [serpapi.com](https://serpapi.com)) + +## Learn More -The application uses: -- **TypeScript** for type safety -- **Zod** for data validation and schema definition -- **SerpAPI** for web search functionality -- **Hyperbrowser SDK** for intelligent data extraction -- **Inquirer** for interactive command-line prompts +- **Hyperbrowser Documentation**: [https://docs.hyperbrowser.ai](https://docs.hyperbrowser.ai) +- **Hyperbrowser Discord**: [https://discord.gg/zsYzsgVRjh](https://discord.gg/zsYzsgVRjh) +- **Support**: info@hyperbrowser.ai -## License +--- -MIT License \ No newline at end of file +**Ready to find your perfect property? Get started in minutes!** \ No newline at end of file diff --git a/research-bot/README.md b/research-bot/README.md index 84fb7a0..a3410f2 100644 --- a/research-bot/README.md +++ b/research-bot/README.md @@ -1,32 +1,39 @@ -# Research Bot - **Built with [Hyperbrowser](https://hyperbrowser.ai)** +# Research Bot + Automated competitive intelligence for founders. Monitor competitor websites, detect meaningful changes, and get AI-powered insights delivered to Slack. Perfect for tracking pricing updates, product launches, and strategic moves. -## Features +## Why Hyperbrowser? -- 🔍 **Smart Change Detection**: SHA256 hashing to catch real content changes -- 🤖 **AI-Powered Analysis**: OpenAI summarization with founder-focused insights -- 📊 **Priority Classification**: P0/P1/P2 tagging for actionable intelligence -- ⏰ **Automated Scheduling**: Built-in cron scheduler respects `cadence_hours` config -- 🚀 **Hyperbrowser Powered**: Reliable web scraping with official SDK -- 📱 **Slack Integration**: Real-time notifications via webhooks -- ⚡ **Concurrent Processing**: Efficient batch processing with timeouts -- 🎯 **Group Filtering**: Monitor specific competitor sets or pricing pages +[Hyperbrowser](https://hyperbrowser.ai) is the **Internet for AI** — purpose-built for developers creating AI agents and automating web tasks. Skip the infrastructure headaches and focus on building. -## Get an API Key +## Features -Get your Hyperbrowser API key from [https://hyperbrowser.ai](https://hyperbrowser.ai) +- **Smart Change Detection**: SHA256 hashing to catch real content changes +- **AI-Powered Analysis**: OpenAI summarization with founder-focused insights +- **Priority Classification**: P0/P1/P2 tagging for actionable intelligence +- **Automated Scheduling**: Built-in cron scheduler respects `cadence_hours` config +- **Hyperbrowser Powered**: Reliable web scraping with official SDK +- **Slack Integration**: Real-time notifications via webhooks +- **Concurrent Processing**: Efficient batch processing with timeouts +- **Group Filtering**: Monitor specific competitor sets or pricing pages ## Quick Start +1. **Get your API keys**: + - Hyperbrowser: https://hyperbrowser.ai + - OpenAI: https://platform.openai.com +2. **Install**: `npm install` +3. **Configure**: Copy `.env.example` to `.env` and add your API keys +4. **Run**: `npm run start:once` or `npm run start:continuous` + ```bash # Install dependencies npm install # Setup environment -cp env.example .env +cp .env.example .env # Edit .env with your API keys # Run once @@ -38,10 +45,13 @@ npm run start:continuous ## Environment Variables +Create a `.env` file with the following: + ```bash -HYPERBROWSER_API_KEY=your_hyperbrowser_key_here -OPENAI_API_KEY=your_openai_key_here -SLACK_WEBHOOK_URL=your_slack_webhook_url_here # Optional +HYPERBROWSER_API_KEY=your_hyperbrowser_key_here # Get at https://hyperbrowser.ai +OPENAI_API_KEY=your_openai_key_here # Get at https://platform.openai.com +OPENAI_MODEL=gpt-4o-mini # Optional (default: gpt-4o-mini) +SLACK_WEBHOOK_URL=your_slack_webhook_url_here # Optional ``` ## Configuration @@ -89,17 +99,17 @@ tsx agent.ts --group competitors --continuous tsx agent.ts --group tech --continuous ``` -## Growth Use Cases +## Use Cases Perfect for founders who need to: - **Track Competitor Pricing**: Get alerted when rivals change pricing strategy -- **Monitor Product Launches**: Stay ahead of new feature releases +- **Monitor Product Launches**: Stay ahead of new feature releases - **Follow Industry News**: Auto-summarize relevant blog posts and announcements - **Watch Hiring Patterns**: Detect when competitors scale specific teams - **Monitor Legal Changes**: Track ToS, privacy policy, compliance updates -This creates reusable competitive intelligence that drives strategic decisions and helps you stay ahead of market moves. +**Perfect for**: Competitive intelligence, market research, product strategy, growth engineering. ## Output @@ -131,18 +141,52 @@ const response = await openai.chat.completions.create({ }); ``` +## How It Works + +1. **Web Scraping**: Uses Hyperbrowser SDK's `scrape.startAndWait()` to fetch page content +2. **Change Detection**: Compares SHA256 hashes of content to detect changes +3. **AI Analysis**: Routes detected changes to OpenAI with founder-focused prompts +4. **Priority Classification**: Automatically tags changes as P0/P1/P2 based on business impact +5. **Smart Tagging**: Regex-based categorization (pricing, hiring, product, etc.) +6. **Report Generation**: Creates markdown reports with executive summaries +7. **Notifications**: Sends alerts to Slack when changes are detected + ## Architecture -Single-file TypeScript implementation (~260 LOC): +Single-file TypeScript implementation (`agent.ts`, ~270 LOC): - **Hyperbrowser SDK**: Fast web scraping via `scrape.startAndWait()` - **OpenAI Integration**: GPT-4o-mini for intelligent summarization - **Concurrent Processing**: Batch processing with 4-URL concurrency - **Timeout Protection**: 60-second per-URL timeout with graceful failures -- **State Management**: JSON snapshots for change detection +- **State Management**: JSON snapshots in `.data/snapshots/` for change detection - **Cron Scheduling**: Built-in node-cron for automated monitoring -- **Minimal Dependencies**: Essential deps only for maximum reliability +- **YAML Configuration**: Flexible config file for URLs, groups, and rules + +## Technical Stack + +- **[@hyperbrowser/sdk](https://www.npmjs.com/package/@hyperbrowser/sdk)**: Web scraping and automation +- **OpenAI GPT-4o-mini**: AI-powered content analysis and summarization +- **node-cron**: Automated scheduling for continuous monitoring +- **YAML**: Human-friendly configuration format +- **TypeScript**: Type-safe development with ES modules + +## Development + +```bash +# Development mode with tsx +npm run start + +# Run once +npm run start:once + +# Continuous mode +npm run start:continuous + +# Build TypeScript +npm run build +``` --- -Follow [@hyperbrowser](https://twitter.com/hyperbrowser) for more updates. \ No newline at end of file +🚀 **Scale your AI development** with [Hyperbrowser](https://hyperbrowser.ai) | Follow [@hyperbrowser](https://x.com/hyperbrowser) \ No newline at end of file diff --git a/resource-summary/README.md b/resource-summary/README.md index 939794d..52c4479 100644 --- a/resource-summary/README.md +++ b/resource-summary/README.md @@ -1,30 +1,149 @@ +**Built with [Hyperbrowser](https://hyperbrowser.ai)** + # Resource Summary -A lightweight CLI that analyzes webpage resources using Hyperbrowser's API. +A lightweight CLI tool that analyzes webpage resources and structure using Hyperbrowser's scraping API. Get instant insights into images, links, scripts, stylesheets, and page architecture - with optional AI-powered analysis for deeper understanding. + +## Why Hyperbrowser? + +[Hyperbrowser](https://hyperbrowser.ai) is the **Internet for AI** — purpose-built for developers creating AI agents and automating web tasks. Skip the infrastructure headaches and focus on building. + +## Features + +- **Quick Resource Analysis**: Counts images, links, scripts, and stylesheets from page content +- **AI-Powered Mode**: Uses Hyperbrowser Browser Use agent for detailed page structure analysis +- **Performance Insights**: Identifies potential performance issues in advanced mode +- **Beautiful CLI Output**: Color-coded results with icons for easy reading +- **Dual Analysis Modes**: Choose between fast basic analysis or comprehensive AI-powered inspection + +## Quick Start -## Setup +1. **Get your API key**: https://hyperbrowser.ai +2. **Install**: `npm install` +3. **Configure**: Add `HYPERBROWSER_API_KEY` to `.env` +4. **Run**: `npx tsx index.ts --url https://example.com` -1. Get your API key from [hyperbrowser.ai](https://hyperbrowser.ai) -2. Create a `.env` file: - ``` - HYPERBROWSER_API_KEY=your_api_key_here - ``` -3. Install dependencies: - ```bash - npm install - ``` +## Installation + +```bash +# Install dependencies +npm install + +# Set environment variable +export HYPERBROWSER_API_KEY="your_hyperbrowser_api_key" + +# Run basic analysis +npx tsx index.ts --url https://example.com +``` -## Usage +## Usage Examples +### Basic Resource Analysis (Fast) ```bash -# Basic analysis (fast) npx tsx index.ts --url https://example.com +``` +Output: +``` +🔍 Analyzing https://example.com… +⚡ Running basic content analysis... -# Advanced AI analysis +📊 Resource Summary: +──────────────────────────────────────── +🖼️ Images 12 +🔗 Links 45 +📜 Scripts 8 +🎨 Stylesheets 3 +──────────────────────────────────────── +📄 Content length: 15,432 characters +``` + +### Advanced AI-Powered Analysis +```bash npx tsx index.ts --url https://github.com --mode advanced ``` +Uses Hyperbrowser's Browser Use agent to: +- Count all visible images on the page +- Count all links +- Identify external scripts and stylesheets +- Analyze page structure (headers, sections) +- Provide performance observations + +## CLI Options + +```bash +npx tsx index.ts --url [--mode ] + +Options: + --url Page URL to analyze (required) + --mode Analysis mode: basic or advanced (default: basic) + --help Show help +``` + +## Analysis Modes + +### Basic Mode (Default) +- **Speed**: Fast - completes in seconds +- **Method**: Uses Hyperbrowser's `scrape.startAndWait()` to fetch markdown content +- **Analysis**: Counts resources using regex pattern matching +- **Best For**: Quick audits, automated monitoring, batch URL processing + +### Advanced Mode +- **Speed**: Slower - requires browser automation +- **Method**: Uses Hyperbrowser's Browser Use agent for interactive page inspection +- **Analysis**: AI-powered examination of live page DOM and resources +- **Best For**: In-depth analysis, performance auditing, detailed page structure review + +## Environment Variables + +Create a `.env` file or export this variable: + +```bash +HYPERBROWSER_API_KEY=your_hyperbrowser_api_key # Get at https://hyperbrowser.ai +``` + +## How It Works + +1. **URL Input**: Accepts target webpage URL via CLI flag +2. **Mode Selection**: Routes to basic (scrape-based) or advanced (agent-based) analysis +3. **Data Collection**: + - Basic: Uses Hyperbrowser SDK's `scrape.startAndWait()` to fetch markdown + - Advanced: Uses `agents.browserUse.startAndWait()` with AI task instructions +4. **Resource Counting**: Analyzes content for images, links, scripts, and stylesheets +5. **Output Formatting**: Displays results with color-coded icons and formatting + +## Code Structure + +The project consists of a single TypeScript file with modular functions: + +- **`main()`** - Entry point, initializes Hyperbrowser client and routes to analysis mode +- **`runBasicAnalysis()`** - Performs fast content-based resource counting +- **`runAdvancedAnalysis()`** - Uses Browser Use agent for AI-powered analysis with fallback +- **`analyzeContent()`** - Regex-based resource counting from markdown content +- **`getResourceIcon()`** - Returns emoji icons for resource types +- **`getResourceColor()`** - Returns chalk color functions for resource types + +## Use Cases + +**Perfect for**: SEO auditing, page weight analysis, resource optimization, performance monitoring, competitive analysis, web scraping validation. + +## Technical Stack + +- **[@hyperbrowser/sdk](https://www.npmjs.com/package/@hyperbrowser/sdk)**: Web scraping and browser automation +- **yargs**: CLI argument parsing +- **chalk**: Terminal output formatting +- **dotenv**: Environment variable management +- **TypeScript**: Type-safe development + +## Development + +```bash +# Run with TypeScript directly +npx tsx index.ts --url + +# Add to package.json scripts +npm start -- --url --mode advanced +``` -## Output +--- -- **Basic Mode**: Counts images, links, scripts, and stylesheets from content -- **Advanced Mode**: AI-powered page structure analysis with performance insights \ No newline at end of file +🚀 **Scale your AI development** with [Hyperbrowser](https://hyperbrowser.ai) | Follow [@hyperbrowser](https://x.com/hyperbrowser) \ No newline at end of file diff --git a/scam-scanner-bot/README.md b/scam-scanner-bot/README.md index 5995669..8c7be97 100644 --- a/scam-scanner-bot/README.md +++ b/scam-scanner-bot/README.md @@ -1,198 +1,278 @@ -# 🛡️ Scam Scanner – Intelligent Scam Store Scanner +# Scam Scanner Bot -> **Powered by [Hyperbrowser.ai](https://hyperbrowser.ai) – Real browsers in the cloud** +**Built with [Hyperbrowser](https://hyperbrowser.ai)** -**One command. One scan. Instant fraud detection.** +An interactive CLI tool that analyzes suspicious online stores for potential fraud indicators. Uses Hyperbrowser's browser automation to scrape websites and detect common scam patterns like fake urgency, insecure links, suspicious domains, and missing legal pages. -This tool leverages Hyperbrowser's cloud-based browser infrastructure to perform deep analysis of suspicious online stores, detecting fraud patterns that traditional crawlers miss. +## What It Does ---- +Detect potential scam websites by analyzing: -## 🚀 **Why Hyperbrowser Makes the Difference** +- **Suspicious Content Patterns** - Fake urgency, limited time offers, too-good-to-be-true claims +- **Insecure Links** - HTTP links that could compromise security +- **External Redirects** - Unusual number of external domain links +- **Domain Credibility** - Suspicious TLDs (.tk, .ml, .ga, .cf, .pw) +- **Missing Legal Pages** - No contact info, privacy policy, or terms of service +- **Payment Security** - Lack of secure payment mentions -### Traditional Crawlers vs. Hyperbrowser -| Traditional Crawlers | 🌟 **Hyperbrowser** | -|---------------------|---------------------| -| ❌ Miss JavaScript-loaded content | ✅ **Full JS execution** – sees dynamic content | -| ❌ Can't detect redirects | ✅ **Real browser behavior** – catches all redirects | -| ❌ Limited to static HTML | ✅ **Complete rendering** – captures post-load fraud tactics | -| ❌ Require local Chrome setup | ✅ **Zero setup** – everything runs in the cloud | +## Quick Start -### 🎯 **What Scam-Scanner Detects** -- 🔓 **Insecure HTTP assets** on HTTPS sites -- ⚠️ **Failed API calls** (4xx/5xx errors) -- 🏦 **Suspicious payment iframes** from unknown providers -- 📅 **Brand-new domains** with missing legal pages -- 🖼️ **Duplicate stock photos** (coming soon) +### 1. Get Your API Key ---- +**Hyperbrowser API Key:** Sign up at [https://hyperbrowser.ai](https://hyperbrowser.ai) -## 📦 **Quick Start** +### 2. Installation -### 1️⃣ **Get Your Hyperbrowser API Key** -🔑 **[Get your free API key at hyperbrowser.ai →](https://hyperbrowser.ai)** - -### 2️⃣ **Install & Setup** ```bash -# Clone the repository -git clone https://github.com/hyperbrowserai/examples cd scam-scanner-bot +npm install +``` -# Install dependencies -pnpm install # or npm install +### 3. Environment Setup -# Configure your API keys -cp .env.example .env -``` +Create a `.env` file in the project root: -### 3️⃣ **Add Your Keys to `.env`** ```env -# 🔑 Get this at hyperbrowser.ai -HYPERBROWSER_API_KEY=pk_live_xxx - -# 🤖 Optional: For AI-powered scoring -OPENAI_API_KEY=sk-xxx +HYPERBROWSER_API_KEY=your_hyperbrowser_api_key_here ``` -### 4️⃣ **Build & Scan** +Or export it directly: + ```bash -# Build the project -pnpm run build +export HYPERBROWSER_API_KEY="your_key_here" +``` + +### 4. Run the Scanner -# Scan a suspicious store -node dist/index.js --url https://suspect-store.xyz +```bash +npm run dev ``` ---- +## Usage + +### Interactive Mode -## 📊 **Sample Output** +The tool prompts you to enter a URL to scan: +```bash +npm run dev ``` -🔍 Hyperbrowser analyzing https://suspect-store.xyz... -✨ Scan complete in 2.3s -━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ -📈 FRAUD ANALYSIS REPORT -━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ +**Example Session:** -🚨 PhonyScore: 82/100 (HIGH RISK) +``` +Enter store URL: https://suspicious-store.com +🔍 Starting Hyperbrowser scrape… + +Scam Scanner Results +──────────────────────────────────────── +Suspicious Content Patterns: 3 +Insecure Links (HTTP): 5 +External Links: 12 +Suspicious TLD: No +Has Contact Info: No +Has Privacy Policy: No +Mentions Secure Payment: No + +⚠ Suspicious content patterns detected +✖ Sample insecure link → http://example.com/image.jpg + +Risk Assessment: +🚨 HIGH RISK - Likely scam website! +``` -⚠️ Red Flags Detected: -• 🔓 12 insecure HTTP assets loaded -• ❌ 4 failed API requests (4xx/5xx) -• 🏦 Payment iframe from unknown provider -• 📅 Domain registered only 11 days ago -• 📄 Missing privacy policy and terms +## Features -💡 Recommendation: AVOID - Multiple fraud indicators present -``` +### Pattern Detection + +**Suspicious Content:** +- Urgent sale / limited time offers +- "Act now" pressure tactics +- Flash sales and fake scarcity +- "Too good to be true" claims +- Wholesale/liquidation language +- "Going out of business" urgency +### Security Analysis ---- +**Insecure Elements:** +- HTTP links on HTTPS sites +- Missing SSL/encryption mentions +- No secure payment indicators -## 🛠️ **Command Reference** +### Domain Analysis -| Flag | Required | Description | -|------|----------|-------------| -| `--url`, `-u` | ✅ | Target store URL (include http/https) | -| `--key`, `-k` | ⚠️ | Hyperbrowser API key (or set `HYPERBROWSER_API_KEY`) | -| `--json` | ❌ | Output machine-readable JSON | +**Trust Indicators:** +- Suspicious TLD detection (.tk, .ml, .ga, .cf, .pw) +- External domain redirect patterns +- Contact information presence +- Privacy policy and terms of service ---- +### Risk Scoring -## 🔬 **How It Works** +The tool calculates a risk score based on: +- Number of suspicious content patterns (weight: 2) +- Presence of insecure links (weight: 1) +- Excessive external links (weight: 1) +- Suspicious TLD (weight: 2) +- Missing contact information (weight: 1) +- Missing privacy policy (weight: 1) +- No secure payment mentions (weight: 1) + +**Risk Levels:** +- **HIGH RISK** (5+ points): Likely scam website +- **MEDIUM RISK** (3-4 points): Exercise caution +- **LOW RISK** (0-2 points): Appears legitimate + +## Sample Output -```mermaid -graph LR - A[🌐 Target URL] --> B[☁️ Hyperbrowser Cloud] - B --> C[🖥️ Real Browser Session] - C --> D[📡 Network Monitoring] - C --> E[🎯 Content Analysis] - D --> F[🔍 Fraud Detection] - E --> F - F --> G[📊 PhonyScore Report] +``` +Scam Scanner Results +──────────────────────────────────────── +Suspicious Content Patterns: 4 +Insecure Links (HTTP): 8 +External Links: 15 +Suspicious TLD: Yes +Has Contact Info: No +Has Privacy Policy: No +Mentions Secure Payment: No + +⚠ Suspicious content patterns detected +✖ Sample insecure link → http://cdn.example.com/asset.js +❓ Many external links → https://payment-processor.xyz +⚠ Suspicious TLD detected → example.tk + +Risk Assessment: +🚨 HIGH RISK - Likely scam website! ``` -1. **🚀 Launch Session** – Hyperbrowser spins up a real browser in the cloud -2. **📡 Monitor Everything** – Capture all network requests, redirects, and dynamic content -3. **🔍 Analyze Patterns** – Run advanced heuristics on collected data -4. **🤖 AI Scoring** – GPT-4 evaluates fraud probability -5. **📊 Generate Report** – Get actionable insights with confidence scores +## Alternative Commands ---- +```bash +# Development mode (default) +npm run dev -## 🌟 **Why Choose Hyperbrowser?** +# Build TypeScript +npm run build -### ⚡ **Performance** -- **2-3 second scans** – Faster than setting up local Chrome -- **Global edge network** – Optimal performance worldwide -- **Automatic scaling** – No infrastructure management +# Run compiled version +npm start -### 🛡️ **Security & Reliability** -- **Isolated browser sessions** – Every scan runs in a fresh environment -- **Enterprise-grade security** – Your data never leaves secure cloud -- **99.9% uptime SLA** – Production-ready reliability +# Direct execution with ts-node +npx ts-node scam-scanner-bot.ts +``` + +## Project Structure + +``` +scam-scanner-bot/ +├── scam-scanner-bot.ts # Main application logic +├── package.json # Dependencies and scripts (phonycart) +├── tsconfig.json # TypeScript configuration +├── .env # Environment variables (create this) +└── README.md # This file +``` -### 💰 **Cost-Effective** -- **Pay-per-scan** – No monthly fees or commitments -- **Free tier available** – Perfect for testing and small projects -- **Transparent pricing** – Know exactly what you're paying for +## How It Works ---- +1. **User Input** - Interactive prompt for store URL +2. **URL Validation** - Ensures proper http:// or https:// protocol +3. **Web Scraping** - Hyperbrowser scrapes HTML and extracts links +4. **Pattern Analysis** - Checks content against suspicious pattern list +5. **Security Check** - Analyzes links for HTTP insecurity and external domains +6. **Domain Analysis** - Evaluates TLD and checks for trust indicators +7. **Risk Calculation** - Computes weighted risk score +8. **Report Generation** - Color-coded console output with findings -## 🤝 **Contributing** +## Technical Details -We love contributions! Here's how to get started: +### Technologies Used + +- **[@hyperbrowser/sdk](https://www.npmjs.com/package/@hyperbrowser/sdk)** - Browser automation and web scraping +- **[tldts](https://www.npmjs.com/package/tldts)** - Domain parsing and TLD analysis +- **[chalk](https://www.npmjs.com/package/chalk)** - Colored terminal output +- **[dotenv](https://www.npmjs.com/package/dotenv)** - Environment variable management +- **[readline](https://nodejs.org/api/readline.html)** - Built-in Node.js for CLI input +- **TypeScript** - Type-safe development + +### Scrape Configuration -### 🔧 **Adding New Detection Rules** ```typescript -// src/checks/your-check.ts -export const yourCheck = { - id: 'your-check', - severity: 'high', - check: (session) => { - // Your fraud detection logic - return { found: boolean, details: string }; - } -}; +scrapeOptions: { + formats: ['html', 'links'], + waitUntil: 'networkidle', + timeout: 30000 +} ``` -### 🚀 **Development Setup** -```bash -# Fork and clone -git clone https://github.com/hyperbrowserai/examples +- **formats**: Extracts both HTML content and all page links +- **waitUntil**: Waits for network to be idle before scraping +- **timeout**: 30-second maximum wait time -# Install dependencies -pnpm install +## Use Cases -# Build and test -pnpm run build -pnpm test -``` +**E-commerce Shoppers:** +- Quick credibility check before making online purchases +- Identify potential scam stores ---- +**Fraud Researchers:** +- Analyze patterns across suspicious websites +- Build databases of scam indicators -## 📞 **Support & Community** +**Consumer Protection:** +- Screen reported websites for fraud indicators +- Generate evidence for investigations -- 📚 **[Hyperbrowser Documentation](https://docs.hyperbrowser.ai)** -- 💬 **[Discord Community](https://discord.gg/zsYzsgVRjh)** -- 🐛 **[Report Issues](https://github.com/hyperbrowserai/examples)** +**Browser Extensions:** +- Integrate as backend service for real-time URL checking +- Build consumer safety tools ---- +## Limitations -## 📄 **License** +- Analysis is based on heuristics, not 100% accurate +- Legitimate sites may trigger false positives +- Does not perform deep financial or legal verification +- Requires publicly accessible websites +- No AI-powered analysis (uses pattern matching only) -MIT License – Feel free to use in your projects! +## Troubleshooting ---- +### Common Issues + +**Missing API Key:** +```bash +# Verify your .env file exists and contains the key +cat .env +``` + +**URL Format Error:** +- Ensure URL starts with `http://` or `https://` +- Example: `https://example.com` not `example.com` + +**Scrape Failed:** +- Some websites may block automated scraping +- Check if the site is publicly accessible +- Verify Hyperbrowser API quota limits + +**TypeScript Errors:** +```bash +# Reinstall dependencies +rm -rf node_modules package-lock.json +npm install +``` -
+## Requirements +- **Node.js** 16+ or later +- **TypeScript** 5.0+ (installed via npm) +- **Hyperbrowser API Key** (get at [hyperbrowser.ai](https://hyperbrowser.ai)) -**[🔑 Get your free Hyperbrowser API key →](https://hyperbrowser.ai)** +## Learn More -Built with ❤️ and **[Hyperbrowser.ai](https://hyperbrowser.ai)** – The future of web automation +- **Hyperbrowser Documentation:** [https://docs.hyperbrowser.ai](https://docs.hyperbrowser.ai) +- **Hyperbrowser Discord:** [https://discord.gg/zsYzsgVRjh](https://discord.gg/zsYzsgVRjh) +- **Support:** info@hyperbrowser.ai -[🌟 Star us on GitHub](https://github.com/hyperbrowserai/) • [📖 Documentation](https://docs.hyperbrowser.ai) +## License -
+MIT diff --git a/site-graph/README.md b/site-graph/README.md index 7656584..d647236 100644 --- a/site-graph/README.md +++ b/site-graph/README.md @@ -1,6 +1,8 @@ # Site Graph Crawler -A TypeScript-based web crawler that generates visual site maps, identifies orphan pages, and analyzes page sizes using the Hyperbrowser API. +**Built with [Hyperbrowser](https://hyperbrowser.ai)** + +A TypeScript-based web crawler that generates visual site maps, identifies orphan pages, and analyzes page sizes using the Hyperbrowser Crawl API. ## Features @@ -25,19 +27,28 @@ A TypeScript-based web crawler that generates visual site maps, identifies orpha npm install ``` -## Getting Your API Key +## Get an API Key + +Get your Hyperbrowser API key at **[https://hyperbrowser.ai](https://hyperbrowser.ai)** + +## Quick Start + +```bash +# Install dependencies +npm install -1. Visit [hyperbrowser.ai](https://hyperbrowser.ai) -2. Sign up for an account or log in -3. Navigate to your dashboard/API section -4. Generate a new API key -5. Copy the API key for use in the next step +# Set up environment variables +export HYPERBROWSER_API_KEY="your_api_key_here" + +# Run the application +npx ts-node site-graph.ts +``` ## Configuration -Create a `.env` file in the project root and add your Hyperbrowser API key: +Create a `.env` file in the project root: -```env +```bash HYPERBROWSER_API_KEY=your_api_key_here ``` @@ -46,8 +57,6 @@ HYPERBROWSER_API_KEY=your_api_key_here ### Running the Crawler ```bash -npm run dev -# or npx ts-node site-graph.ts ``` @@ -109,28 +118,56 @@ A table showing the top 10 largest pages by content size, which can help identif - Content that could be optimized - Resource-heavy pages that need attention -## Technical Details +## API Reference -- **Language**: TypeScript with ES modules -- **Crawling**: Uses Hyperbrowser's cloud-based crawling service -- **Domain Parsing**: Uses `tldts` for reliable domain extraction -- **Output**: Styled with `chalk` and `cli-table3` for beautiful terminal display +Uses **Hyperbrowser's official API methods**: -## Configuration Options +```typescript +import { Hyperbrowser } from '@hyperbrowser/sdk'; -You can modify the crawler behavior by editing the `StartCrawlJobParams` in `site-graph.ts`: +const client = new Hyperbrowser({ apiKey: HB_KEY }); -```typescript +// Start crawling and wait for completion const crawlResult = await client.crawl.startAndWait({ url: target, - maxPages: depth * 10, // Maximum pages to crawl - followLinks: true, // Follow links to discover new pages + maxPages: depth * 10, + followLinks: true, scrapeOptions: { - formats: ['links'] // Extract link information + formats: ['links', 'html', 'markdown'] } }); ``` +## Technical Details + +- **Language**: TypeScript with ES modules +- **Crawling**: Uses Hyperbrowser's cloud-based Crawl API via `crawl.startAndWait()` +- **Domain Parsing**: Uses `tldts` for reliable domain extraction +- **Output**: Styled with `chalk` and `cli-table3` for beautiful terminal display + +## Development + +### Project Structure + +``` +site-graph/ +├── site-graph.ts # Main application file +├── package.json # Project dependencies and scripts +├── tsconfig.json # TypeScript configuration +├── .env # Environment variables (create this) +└── README.md # This file +``` + +### Architecture + +Single-file TypeScript implementation (~72 LOC): + +- **Hyperbrowser SDK**: Cloud-based web crawling via `crawl.startAndWait()` +- **Interactive CLI**: readline-based user input with colored output +- **Domain Filtering**: Uses `tldts` to ensure same-domain link following +- **Graph Analysis**: Builds site map, detects orphan pages, and analyzes page sizes +- **Formatted Output**: Beautiful terminal output with `chalk` and `cli-table3` + ## Troubleshooting ### Common Issues @@ -152,19 +189,23 @@ const crawlResult = await client.crawl.startAndWait({ - Verify the site allows crawling (check robots.txt) - Try a smaller depth/maxPages value -## Dependencies +### Getting Help -- `@hyperbrowser/sdk` - Cloud-based web crawling service -- `chalk` - Terminal styling and colors -- `cli-table3` - ASCII table formatting -- `dotenv` - Environment variable management -- `tldts` - Domain parsing and validation -- `readline/promises` - Interactive command-line input +- Check the [Hyperbrowser documentation](https://docs.hyperbrowser.ai) for API-related issues +- Ensure your API key has sufficient credits +- Verify that the target URL is accessible and allows crawling +- Join the [Discord community](https://discord.gg/zsYzsgVRjh) for support -## License +## Dependencies -ISC +- **@hyperbrowser/sdk**: Hyperbrowser SDK for cloud-based web crawling +- **chalk**: Terminal styling and colors +- **cli-table3**: ASCII table formatting +- **dotenv**: Environment variable management +- **tldts**: Domain parsing and validation +- **commander**: CLI framework (dependency) +- **TypeScript**: Type-safe development -## Contributing +--- -Feel free to submit issues and enhancement requests! +Follow [@hyperbrowser](https://x.com/hyperbrowser) for updates. diff --git a/site2prompt/README.md b/site2prompt/README.md index d196442..6cb6da0 100644 --- a/site2prompt/README.md +++ b/site2prompt/README.md @@ -2,38 +2,247 @@ # site2prompt -Convert websites into AI training datasets. Scrape, clean, and optimize web content for LLM fine-tuning. +Convert websites into AI-ready training datasets. Scrape, clean, deduplicate, and optimize web content for LLM fine-tuning with intelligent token budgeting. + +## Overview + +`site2prompt` is a command-line tool that transforms web content into prompt-ready formats optimized for training and fine-tuning large language models. It handles the entire pipeline from scraping to deduplication, compression, and export in multiple formats. ## Why Hyperbrowser? [Hyperbrowser](https://hyperbrowser.ai) is the **Internet for AI** — purpose-built for developers creating AI agents and automating web tasks. Skip the infrastructure headaches and focus on building. -## Quick Start +## Features -1. **Get your API key**: https://hyperbrowser.ai -2. **Install**: `npm install` -3. **Configure**: Add `HYPERBROWSER_API_KEY` to `.env` -4. **Run**: `ts-node site2prompt.ts --urls urls.txt --budget 4000` +- **Powerful web scraping** using Hyperbrowser SDK with automatic rendering and content extraction +- **Smart content chunking** - Breaks content into optimized blocks (≤120 tokens each) +- **Token budget management** - Control total output size with configurable token limits +- **Intelligent deduplication** - Uses Jaccard similarity (80% threshold) to eliminate redundant content +- **Dual compression modes**: + - **Heuristic**: Fast keyword-based extraction of important content + - **LLM-powered**: OpenAI GPT-4 compression for maximum quality (with `--llm` flag) +- **Multiple export formats**: + - **JSONL**: One prompt per line with metadata (perfect for fine-tuning) + - **CSV**: Tabular format with URL, title, content, and token counts + - **Markdown**: Citation list with source URLs + - **JSON**: Detailed statistics about the scraping process +- **Batch processing** - Process multiple URLs from a file or command line -## Features +## Prerequisites + +### Required +- Node.js (v14 or higher) +- **HYPERBROWSER_API_KEY**: Get your API key at [hyperbrowser.ai](https://hyperbrowser.ai) -✨ **Instant web scraping** with Hyperbrowser's official SDK -🧠 **Smart content optimization** (≤120 tokens per block) -🔄 **Auto-deduplication** using Jaccard similarity -🤖 **OpenAI compression** with `--llm` flag -📊 **Multiple exports**: JSONL, CSV, Markdown, JSON +### Optional +- **OPENAI_API_KEY**: Required only if using `--llm` flag for OpenAI-powered compression + +## Installation + +```bash +# Clone or navigate to the site2prompt directory +cd site2prompt + +# Install dependencies +npm install +``` + +## Configuration + +Create a `.env` file in the site2prompt directory: + +```env +HYPERBROWSER_API_KEY=your_hyperbrowser_api_key_here + +# Optional: Only needed if using --llm flag +OPENAI_API_KEY=your_openai_api_key_here +``` ## Usage +### Basic Usage + ```bash -# Scrape URLs and create training data -ts-node site2prompt.ts --urls urls.txt --budget 4000 --llm +# Process URLs from a file +ts-node site2prompt.ts --urls urls.txt --budget 4000 -# Quick single URL +# Process a single URL ts-node site2prompt.ts --url https://docs.hyperbrowser.ai + +# Process multiple URLs +ts-node site2prompt.ts --url https://example.com --url https://another.com --budget 5000 +``` + +### Advanced Usage + +```bash +# Use OpenAI compression for higher quality +ts-node site2prompt.ts --urls urls.txt --budget 4000 --llm + +# Specify custom output directory +ts-node site2prompt.ts --urls urls.txt --out my-dataset --budget 8000 + +# Process with default 8000 token budget +ts-node site2prompt.ts --urls urls.txt +``` + +### Command-Line Options + +| Option | Type | Default | Description | +|--------|------|---------|-------------| +| `--url` | string[] | - | Single or multiple URLs to scrape (repeatable) | +| `--urls` | string | - | Path to file containing newline-separated URLs | +| `--budget` | number | 8000 | Maximum total tokens to include in output | +| `--out` | string | `distill` | Output directory for generated files | +| `--llm` | boolean | false | Enable OpenAI GPT-4 compression (requires OPENAI_API_KEY) | + +### URL File Format + +Create a text file (e.g., `urls.txt`) with one URL per line: + +``` +https://docs.hyperbrowser.ai +https://github.com/hyperbrowser +https://stackoverflow.com/questions/tagged/web-scraping +https://www.npmjs.com/package/@hyperbrowser/sdk +``` + +## Output Files + +All output files are generated in the specified output directory (default: `distill/`): + +### 1. prompts.jsonl +JSONL format with one prompt per line, ideal for LLM training: +```json +{"prompt":"Content block here...","metadata":{"url":"https://example.com","title":"Page Title","tokens":95}} +``` + +### 2. prompts.csv +CSV format with headers: +```csv +url,title,content,tokens +"https://example.com","Page Title","Content here...",95 +``` + +### 3. citations.md +Markdown file with numbered citations: +```markdown +# Citations + +1. [Page Title](https://example.com) +2. [Another Page](https://another.com) +``` + +### 4. stats.json +Statistics about the scraping process: +```json +{ + "totalUrls": 4, + "successfulScrapes": 4, + "failedScrapes": 0, + "totalTokens": 3938, + "deduplicatedBlocks": 74, + "finalBlocks": 34, + "llmCompressed": true +} +``` + +## How It Works + +1. **Scraping**: Uses Hyperbrowser SDK to fetch and render web pages, extracting both markdown and HTML content +2. **Cleaning**: Removes navigation, footers, scripts, and other non-content elements using Cheerio +3. **Extraction**: Pulls meaningful content including headings, paragraphs, lists, and code blocks +4. **Compression**: + - Heuristic mode: Filters for important keywords and structural elements + - LLM mode: Uses GPT-4 to compress content while preserving key technical details +5. **Chunking**: Breaks content into blocks of ≤120 tokens each for optimal LLM consumption +6. **Deduplication**: Calculates Jaccard similarity between blocks and removes near-duplicates (≥80% similar) +7. **Budget Control**: Selects blocks until reaching the specified token budget +8. **Export**: Generates multiple output formats for different use cases + +## Use Cases + +- **Building domain-specific AI models**: Create training datasets from specialized documentation +- **Documentation fine-tuning**: Convert technical docs into training data for coding assistants +- **Knowledge base creation**: Extract and structure information from multiple sources +- **Dataset generation**: Prepare web content for supervised fine-tuning or RAG systems +- **Content analysis**: Deduplicate and compress large amounts of web content + +## Code Structure + +``` +site2prompt/ +├── site2prompt.ts # Main CLI application +├── package.json # Dependencies and scripts +├── tsconfig.json # TypeScript configuration +├── urls.txt # Example URL list +├── distill/ # Default output directory +│ ├── prompts.jsonl # Training data in JSONL format +│ ├── prompts.csv # Training data in CSV format +│ ├── citations.md # Source citations +│ └── stats.json # Processing statistics +└── README.md # This file ``` -**Perfect for**: Building domain-specific AI models, creating training datasets from documentation, generating fine-tuning data. +### Key Functions + +- `cleanHtml()`: Removes unwanted HTML elements and extracts meaningful content +- `createBlocks()`: Splits content into token-limited blocks +- `deduplicateBlocks()`: Removes similar content using Jaccard similarity +- `compressWithLLM()`: Uses OpenAI to compress content intelligently +- `compressHeuristic()`: Fast keyword-based compression fallback +- `countTokens()`: Rough token estimation (1 token ≈ 4 characters) +- `jaccardSimilarity()`: Calculates content similarity for deduplication + +## Troubleshooting + +### "HYPERBROWSER_API_KEY environment variable is required" +- Ensure you have created a `.env` file with your API key +- Or export it in your shell: `export HYPERBROWSER_API_KEY=your_key` + +### "No URLs provided" +- Provide at least one URL using `--url` flag or `--urls` flag pointing to a file +- Check that your URLs file has valid HTTP/HTTPS URLs + +### OpenAI compression not working +- Verify `OPENAI_API_KEY` is set in `.env` or environment +- Ensure you're using the `--llm` flag +- Check your OpenAI API quota and billing status + +### Low success rate +- Some websites may block automated scraping +- Try reducing concurrent requests or adding delays +- Check network connectivity and URL validity + +## Examples + +### Example 1: Build a documentation dataset +```bash +# Create a file with documentation URLs +cat > docs.txt << EOF +https://docs.hyperbrowser.ai +https://developer.mozilla.org/en-US/docs/Web +https://nodejs.org/docs/latest/api/ +EOF + +# Process with LLM compression +ts-node site2prompt.ts --urls docs.txt --budget 10000 --llm --out docs-dataset +``` + +### Example 2: Quick single-page extraction +```bash +ts-node site2prompt.ts --url https://github.com/hyperbrowser --budget 2000 +``` + +### Example 3: Multiple URLs with custom output +```bash +ts-node site2prompt.ts \ + --url https://example.com/page1 \ + --url https://example.com/page2 \ + --url https://example.com/page3 \ + --budget 5000 \ + --out my-training-data +``` --- diff --git a/site2rag/README.md b/site2rag/README.md index 3447b1e..98182a6 100644 --- a/site2rag/README.md +++ b/site2rag/README.md @@ -1,81 +1,55 @@ -# Site2RAG +**Built with [Hyperbrowser](https://hyperbrowser.ai)** -A CLI tool that uses **Hyperbrowser** to scrape webpage content, automatically cleans boilerplate, and outputs token-budgeted chunks ready for RAG or embedding pipelines. +# site2rag -## Features - -- 🌐 **Hyperbrowser Integration**: Uses Hyperbrowser's powerful browser automation for reliable scraping -- 🔍 **Web Scraping**: Fetches rendered HTML from any webpage with anti-bot protection bypass -- 🧹 **Content Cleaning**: Automatically removes boilerplate (nav, header, footer, scripts) -- ✂️ **Smart Chunking**: Splits content into token-budgeted chunks (configurable) -- 📊 **Multiple Output Formats**: JSON, Markdown, or human-readable summary -- 🎯 **RAG-Ready**: Perfect for ingestion into embedding pipelines - -## Prerequisites - -1. **Hyperbrowser Account**: Sign up at [hyperbrowser.ai](https://hyperbrowser.ai/) to get your API key -2. **Node.js**: Version 16 or higher - -## Setup +Transform any webpage into RAG-ready chunks. Scrape, clean, and intelligently chunk web content for embedding pipelines and vector databases. -### 1. Install Dependencies +## Why Hyperbrowser? -```bash -npm install -``` +[Hyperbrowser](https://hyperbrowser.ai) is the **Internet for AI** — purpose-built for developers creating AI agents and automating web tasks. Skip the infrastructure headaches and focus on building. -### 2. Configure API Key +## Quick Start -Get your Hyperbrowser API key from [hyperbrowser.ai](https://hyperbrowser.ai/) and set it as an environment variable: +1. **Get your API key**: https://hyperbrowser.ai +2. **Install**: `npm install` +3. **Configure**: Add `HYPERBROWSER_API_KEY` to `.env` +4. **Run**: `npx ts-node index.ts --url https://example.com` -```bash -# Create a .env file -echo "HYPERBROWSER_API_KEY=your_api_key_here" > .env -``` - -Or export it directly: +## Features -```bash -export HYPERBROWSER_API_KEY=your_api_key_here -``` +✨ **Powered by Hyperbrowser SDK** for reliable web scraping +🧹 **Auto-cleanup** removes navigation, headers, and boilerplate +✂️ **Smart chunking** with configurable token budgets +📊 **Multiple output formats**: JSON, Markdown, or summary +🎯 **RAG-optimized** chunks ready for embeddings ## Usage -### Basic Usage - ```bash +# Basic scraping with summary npx ts-node index.ts --url https://example.com -``` - -### Command Line Options - -- `--url, -u`: **Required.** URL to scrape -- `--json`: Output chunks in JSON format for programmatic use -- `--md`: Output chunks in Markdown format with headers and source citations -- `--maxTokens`: Maximum tokens per chunk (default: 1000) -### Examples +# JSON output for RAG pipelines +npx ts-node index.ts --url https://docs.example.com --json --maxTokens 500 -**Basic scraping with summary:** -```bash -npx ts-node index.ts --url https://blog.example.com +# Markdown format with citations +npx ts-node index.ts --url https://blog.example.com --md --maxTokens 1500 ``` -**JSON output for API integration:** -```bash -npx ts-node index.ts --url https://docs.example.com --json --maxTokens 500 -``` +### CLI Options -**Markdown output for documentation:** -```bash -npx ts-node index.ts --url https://news.example.com --md --maxTokens 1500 -``` +| Option | Alias | Description | Default | +|--------|-------|-------------|---------| +| `--url` | `-u` | URL to scrape (required) | - | +| `--json` | - | Output as JSON for API integration | false | +| `--md` | - | Output as Markdown with citations | false | +| `--maxTokens` | - | Maximum tokens per chunk | 1000 | + +**Perfect for**: Building RAG applications, creating vector database content, embedding pipelines, AI knowledge bases. ## Output Formats ### JSON Format -Perfect for API integrations and automated workflows: - ```json { "source": "https://example.com", @@ -92,8 +66,6 @@ Perfect for API integrations and automated workflows: ``` ### Markdown Format -Great for documentation and human-readable output: - ```markdown # Context Pack for https://example.com @@ -105,8 +77,6 @@ Main content chunk... ``` ### Summary Format (Default) -Quick overview of extracted chunks: - ``` Context Chunks ─────────────────────────── @@ -117,35 +87,31 @@ Chunk 3 — 492 tokens Run with --md or --json for full output ``` -## Powered by Hyperbrowser +## How It Works -This tool leverages [Hyperbrowser](https://hyperbrowser.ai/), a powerful browser automation platform that provides: +1. **Scrape**: Hyperbrowser fetches fully-rendered HTML (handles JavaScript, anti-bot protection) +2. **Clean**: Removes navigation, headers, footers, scripts, and boilerplate +3. **Parse**: Extracts paragraphs and filters noise (minimum 50 characters) +4. **Chunk**: Intelligently splits content by token budget while preserving context +5. **Output**: Formats chunks for your pipeline (JSON/Markdown/Summary) -- **Rendered Content**: Gets fully rendered HTML including JavaScript-generated content -- **Anti-Bot Bypass**: Handles modern anti-scraping measures automatically -- **Reliable Scraping**: Built-in retries and error handling -- **Scalable**: Can handle high-volume scraping needs +## Environment Setup -## Use Cases +```bash +# Create .env file +echo "HYPERBROWSER_API_KEY=your_api_key_here" > .env -- **RAG Pipelines**: Generate embedding-ready content chunks -- **Content Analysis**: Extract and analyze web content at scale -- **Documentation**: Convert web content to structured markdown -- **Data Integration**: JSON output for seamless API integration -- **Research**: Gather and organize web content for analysis +# Or export directly +export HYPERBROWSER_API_KEY=your_api_key_here +``` ## Development -### Build -```bash -npm run build -``` - -### Test ```bash -npm start -- --url https://example.com +npm run build # Compile TypeScript +npm start # Run with default options ``` -## License +--- -ISC \ No newline at end of file +🚀 **Scale your AI development** with [Hyperbrowser](https://hyperbrowser.ai) | Follow @hyperbrowser \ No newline at end of file diff --git a/tweet-fetcher/README.md b/tweet-fetcher/README.md index e69de29..d750291 100644 --- a/tweet-fetcher/README.md +++ b/tweet-fetcher/README.md @@ -0,0 +1,252 @@ +**Built with [Hyperbrowser](https://hyperbrowser.ai)** + +# Tweet Fetcher + +A powerful CLI tool for extracting tweets and followers from Twitter/X profiles using Hyperbrowser's browser automation and AI-powered data extraction. Perfect for social media analysis, research, and lead generation. + +## Why Hyperbrowser? + +[Hyperbrowser](https://hyperbrowser.ai) is the **Internet for AI**  purpose-built for developers creating AI agents and automating web tasks. Skip the infrastructure headaches and focus on building. + +## Features + +- **Smart Extraction**: Uses Hyperbrowser's Extract API with structured schemas for reliable data extraction +- **Tweet Collection**: Fetches the 10 most recent tweets with engagement metrics (likes, retweets, replies) +- **Follower Scraping**: Extracts verified followers from Twitter profiles +- **Persistent Sessions**: Creates and reuses browser profiles for consistent scraping +- **Proxy Support**: Built-in proxy configuration for reliable access +- **Stealth Mode**: Uses anti-detection features (stealth, adblock, cookie handling) +- **CLI Interface**: Rich terminal UI with progress indicators and colored output + +## Quick Start + +1. **Get your API key**: https://hyperbrowser.ai +2. **Install**: `uv sync` (or `pip install -r pyproject.toml`) +3. **Configure**: Add `HYPERBROWSER_API_KEY` to `.env` +4. **Run**: `python main.py tweets elonmusk` + +## Installation + +```bash +# Install dependencies using uv (recommended) +uv sync + +# Or using pip +pip install click hyperbrowser openai pydantic pydantic-settings python-dotenv rich + +# Set environment variables +export HYPERBROWSER_API_KEY="your_hyperbrowser_api_key" +``` + +## Configuration + +Create a `.env` file in the project directory: + +```bash +# Required +HYPERBROWSER_API_KEY=your_hyperbrowser_api_key # Get at https://hyperbrowser.ai + +# Optional: Proxy configuration for reliable access +PROXY_SERVER_URL=http://your-proxy:port +PROXY_SERVER_USERNAME=your_username +PROXY_SERVER_PASSWORD=your_password + +# Optional: Reuse existing profile ID +PROFILE_ID=your_profile_id +``` + +## Usage + +### Extract Tweets + +Fetch the 10 most recent tweets from a profile: + +```bash +python main.py tweets elonmusk +``` + +Output saved to: `elonmusk_tweets.json` + +**Example output structure:** +```json +{ + "tweets": [ + { + "content": "Tweet text here...", + "num_likes": 15234, + "num_retweets_and_quotes": 1234, + "num_replies": 567, + "published_at": "2025-09-29T10:30:00Z" + } + ] +} +``` + +### Extract Followers + +Fetch verified followers from a profile: + +```bash +python main.py followers elonmusk +``` + +Output saved to: `elonmusk_followers.json` + +**Example output structure:** +```json +{ + "followers": ["@user1", "@user2", "@user3", ...] +} +``` + +### Extract Both + +Get both tweets and followers in one command: + +```bash +python main.py all elonmusk +``` + +### Session Management + +#### Create a New Session + +For initial setup or when you need to authenticate manually: + +```bash +python main.py session +``` + +This creates a persistent browser profile and returns a live session URL where you can: +- Log into Twitter/X manually +- Solve any CAPTCHAs +- Complete any verification steps + +The session ID and profile ID are saved locally for reuse. + +#### Stop a Session + +Close an active browser session: + +```bash +# Stop the most recent session +python main.py stop + +# Stop a specific session +python main.py stop --session-id +``` + +## CLI Commands + +```bash +python main.py [command] [options] + +Commands: + session Create a new browser session for manual authentication + tweets Extract tweets from a Twitter profile + followers Extract verified followers from a Twitter profile + all Extract both tweets and followers + stop Stop an active browser session + +Options: + --session-id TEXT Session ID to stop (for stop command) + --help Show help message +``` + +## How It Works + +1. **Profile Management**: Creates or reuses a persistent browser profile stored in `.profile` +2. **Session Creation**: Launches a browser session with stealth mode, proxy, and adblock enabled +3. **Schema-Based Extraction**: Uses Pydantic models to define expected data structure: + - `AllTweets`: List of tweets with content and engagement metrics + - `AllFollowers`: List of follower handles +4. **AI Extraction**: Hyperbrowser's Extract API uses AI to parse the rendered page and extract structured data +5. **JSON Output**: Results are saved to JSON files for easy processing + +## Project Structure + +- **`main.py`** - CLI entrypoint with Click commands and extraction logic +- **`config.py`** - Settings management using Pydantic with `.env` file support +- **`schemas.py`** - Pydantic models defining data structures for extraction +- **`.profile`** - Auto-generated file storing persistent profile ID +- **`.session`** - Auto-generated file storing active session ID + +## Data Schemas + +### Tweet Schema +```python +class Tweet(BaseModel): + content: str # Tweet text + num_likes: int # Like count + num_retweets_and_quotes: int # Retweet + quote count + num_replies: int # Reply count + published_at: str # Timestamp +``` + +### Followers Schema +```python +class AllFollowers(BaseModel): + followers: list[str] # List of Twitter handles +``` + +## Advanced Configuration + +### Custom Proxy Setup + +Configure proxy settings in `.env` for reliable access: + +```bash +PROXY_SERVER_URL=http://proxy.example.com:8080 +PROXY_SERVER_USERNAME=myusername +PROXY_SERVER_PASSWORD=mypassword +``` + +### Profile Persistence + +The tool automatically creates and persists browser profiles to maintain: +- Login sessions +- Cookies and localStorage +- Browser fingerprint consistency + +Profile ID is stored in `.profile` and reused across runs. + +## Use Cases + +**Perfect for**: Social media research, competitor analysis, influencer tracking, lead generation, sentiment analysis, trend monitoring, audience research. + +## Development + +```bash +# Install dev dependencies +uv sync --dev + +# Format code +isort . +ruff check . + +# Run with different profiles +PROFILE_ID=custom_profile python main.py tweets example +``` + +## Technical Stack + +- **[@hyperbrowser/sdk](https://pypi.org/project/hyperbrowser/)**: Browser automation and AI extraction +- **Click**: CLI framework with rich command structure +- **Rich**: Beautiful terminal UI with progress indicators +- **Pydantic**: Data validation and settings management +- **Python 3.13+**: Modern Python features + +## Troubleshooting + +### Rate Limiting +If you encounter rate limits, use the `session` command to authenticate manually and solve any CAPTCHAs. + +### Authentication Required +Run `python main.py session` to create a live session where you can log into Twitter manually. The session will be persisted for future runs. + +### Proxy Issues +Ensure your proxy credentials are correct in `.env` and the proxy server is accessible. + +--- + += **Scale your AI development** with [Hyperbrowser](https://hyperbrowser.ai) | Follow [@hyperbrowser](https://x.com/hyperbrowser) \ No newline at end of file diff --git a/vibe-posting-bot/README.md b/vibe-posting-bot/README.md index 4825257..5855832 100644 --- a/vibe-posting-bot/README.md +++ b/vibe-posting-bot/README.md @@ -1,47 +1,72 @@ -# Vibe Posting Bot 🤖 +# Vibe Posting Bot -An intelligent social media bot that automatically scrapes tech news sources, detects new content, and creates authentic, human-like posts for Typefully. +**Built with [Hyperbrowser](https://hyperbrowser.ai)** -## Features +> An intelligent social media bot that automatically scrapes tech news sources, detects new content, and creates authentic, human-like posts for Typefully. -- **🕐 Periodic cron job**: Automatically runs every 3 hours (configurable) -- **🔍 Smart change detection**: Only posts when new content is detected -- **🎭 Multiple vibes**: Choose from founder, dev, investor, or casual tones -- **📝 Typefully integration**: Automatically creates drafts in your Typefully account -- **🎯 Authentic tone**: Uses advanced prompts to sound human, not robotic -- **🧪 Dry run mode**: Test without actually posting +[![TypeScript](https://img.shields.io/badge/TypeScript-007ACC?style=flat&logo=typescript&logoColor=white)](https://typescriptlang.org) +[![Hyperbrowser](https://img.shields.io/badge/Hyperbrowser-00FF88?style=flat)](https://hyperbrowser.ai) +[![OpenAI](https://img.shields.io/badge/OpenAI-000000?style=flat&logo=openai&logoColor=white)](https://openai.com) -## Setup +--- -1. **Install dependencies**: - ```bash - npm install - ``` +## What It Does -2. **Create `.env` file** with your API keys: - ```bash - # Required API Keys - HYPERBROWSER_API_KEY=your_hyperbrowser_api_key_here - OPENAI_API_KEY=your_openai_api_key_here - TYPEFULLY_API_KEY=your_typefully_api_key_here - ``` +Transform your social media presence with an AI-powered bot that: -3. **Get your API keys**: - - [Typefully API key](https://typefully.com/settings/api) - - [OpenAI API key](https://platform.openai.com/api-keys) - - [Hyperbrowser API key](https://hyperbrowser.ai) +🕐 **Automated Scheduling** - Runs on a cron schedule (default: every 3 hours) +🔍 **Smart Change Detection** - Only posts when new content is detected using content hashing +🎭 **Multiple Personalities** - Choose from founder, dev, investor, or casual vibes +📝 **Typefully Integration** - Automatically creates drafts in your Typefully account +🎯 **Authentic Voice** - Uses GPT-4 with advanced prompts to sound human, not robotic +🧪 **Dry Run Mode** - Test the bot without actually posting +🔄 **Content Tracking** - Maintains state to avoid duplicate posts + +--- + +## Quick Start + +### 1. Get API Keys + +You'll need three API keys to run this bot: +- **Hyperbrowser API key** - Get one at [hyperbrowser.ai](https://hyperbrowser.ai) +- **OpenAI API key** - Get one at [platform.openai.com/api-keys](https://platform.openai.com/api-keys) +- **Typefully API key** - Get one at [typefully.com/settings/api](https://typefully.com/settings/api) + +### 2. Installation + +```bash +cd vibe-posting-bot +npm install +``` + +### 3. Environment Setup + +Create a `.env` file in the project directory: + +```bash +# Required API Keys +HYPERBROWSER_API_KEY=your_hyperbrowser_api_key_here +OPENAI_API_KEY=your_openai_api_key_here +TYPEFULLY_API_KEY=your_typefully_api_key_here +``` + +**Note:** All three API keys are required for the bot to function properly. + +--- ## Usage -### Quick Start Scripts +### Quick Start Examples + ```bash # Run with default settings (founder vibe, every 3 hours) npm run start -# Test without posting (dry run) +# Test without posting (dry run mode) npm run dry-run -# Different vibes +# Different personality vibes npm run founder # Tech founder perspective npm run dev-vibe # Developer perspective npm run investor # VC/investor perspective @@ -49,6 +74,7 @@ npm run casual # Casual tech enthusiast ``` ### Advanced Usage + ```bash # Custom cron schedule (every hour) npm run dev -- --schedule "0 * * * *" @@ -57,55 +83,102 @@ npm run dev -- --schedule "0 * * * *" npm run dev -- --tone thread # Thread format npm run dev -- --tone one-liner # Single tweet (default) -# Combine options +# Combine multiple options npm run dev -- --vibe dev --tone thread --schedule "0 */2 * * *" + +# Dry run with custom settings +npm run dev -- --vibe investor --dryRun ``` +--- + ## Monitored Sources -The bot currently monitors: -- Anthropic News -- OpenAI Blog -- Google DeepMind Technology -- Hugging Face Blog -- Hacker News +The bot monitors these tech news sources by default: + +- **[Anthropic News](https://www.anthropic.com/news)** - Latest from Anthropic/Claude +- **[OpenAI Blog](https://openai.com/blog)** - OpenAI announcements and research +- **[Google DeepMind](https://deepmind.google/technology/)** - DeepMind technology updates +- **[Hugging Face Blog](https://huggingface.co/blog)** - Open source AI/ML developments +- **[Hacker News](https://news.ycombinator.com/)** - Tech community discussions + +You can customize these URLs by editing the `urls` array in `vibe-posting-bot.ts`. + +--- ## How It Works -1. **Content Detection**: Scrapes configured URLs and creates hashes of content -2. **Change Detection**: Compares current content with previously seen content -3. **Smart Generation**: Uses GPT-4 with context-aware prompts for authentic voice -4. **Posting**: Creates drafts in Typefully for review before publishing +1. **Content Scraping**: Uses Hyperbrowser SDK to scrape configured URLs and extract markdown content +2. **Change Detection**: Generates SHA-256 content hashes and compares with previously seen content stored in `seen-content.json` +3. **AI Generation**: When new content is detected, uses GPT-4 with personality-specific prompts to generate authentic tweets +4. **Draft Creation**: Posts generated tweets as drafts to Typefully for your review +5. **State Management**: Updates the seen content file to prevent duplicate posts +6. **Continuous Monitoring**: Runs on a cron schedule (default: every 3 hours) + +--- ## Customization ### Adding New Sources -Edit the `urls` array in `vibe-posting-bot.ts`: + +Edit the `urls` array in `vibe-posting-bot.ts` (line 29): + ```typescript const urls = [ "https://your-new-source.com/blog", - // ... existing URLs + "https://www.anthropic.com/news", + // ... other URLs ]; ``` -### Custom Vibes -Add new personality types in the `SYSTEM_PROMPTS` object with multiple prompt variations for variety. +### Creating Custom Vibes + +Add new personality types to the `SYSTEM_PROMPTS` object in `vibe-posting-bot.ts` (starting at line 52). Each vibe should include: +- Multiple prompt variations (for variety and to avoid repetition) +- Clear style guidelines +- Example tone and formatting instructions + +```typescript +const SYSTEM_PROMPTS = { + your_custom_vibe: [ + `Your first prompt variation with style guidelines...`, + `Your second prompt variation...`, + // Add 2-3 variations per vibe + ], + // ... other vibes +}; +``` + +### Adjusting the Temperature + +The bot uses GPT-4 with `temperature: 0.8` for variety (line 356). You can adjust this: +- Lower (0.3-0.5): More consistent, predictable output +- Higher (0.8-1.0): More creative, varied output + +--- -## Files Generated +## Generated Files -- `seen-content.json`: Tracks content hashes to detect changes -- Console logs with timestamps and operation details +- **`seen-content.json`** - Tracks content hashes and last-seen timestamps to prevent duplicate posts +- **Console logs** - Colored terminal output with timestamps, status updates, and generated tweets -## Tips +--- -- Start with `--dryRun` to test the bot behavior -- The bot posts drafts to Typefully - you still control what gets published -- Each vibe has multiple prompt variations to avoid repetitive content -- Content is only processed when changes are detected +## Pro Tips + +- **Start with dry run** - Use `--dryRun` flag to test the bot without actually posting +- **Review before publishing** - The bot creates drafts in Typefully, you control what gets published +- **Multiple prompt variations** - Each vibe has 3 different prompts that rotate for variety +- **Change detection only** - Content is only processed when new updates are detected +- **Graceful shutdown** - Press `Ctrl+C` to stop the bot gracefully +- **Check logs** - Monitor console output for scraping status and generated tweets + +--- ## Cron Schedule Format -The `--schedule` parameter uses standard cron syntax: +The `--schedule` parameter uses standard cron syntax for scheduling the bot runs: + ``` # ┌───────────── minute (0 - 59) # │ ┌───────────── hour (0 - 23) @@ -114,10 +187,87 @@ The `--schedule` parameter uses standard cron syntax: # │ │ │ │ ┌───────────── day of the week (0 - 6) (Sunday to Saturday) # │ │ │ │ │ # * * * * * +``` -# Examples: +**Common Examples:** + +```bash "0 */3 * * *" # Every 3 hours (default) "0 9 * * *" # Every day at 9 AM "0 9,17 * * *" # Every day at 9 AM and 5 PM "0 9 * * 1-5" # Every weekday at 9 AM -``` \ No newline at end of file +"*/30 * * * *" # Every 30 minutes +"0 */6 * * *" # Every 6 hours +``` + +--- + +## Technical Details + +### Dependencies + +- **[@hyperbrowser/sdk](https://www.npmjs.com/package/@hyperbrowser/sdk)** (v0.51.0) - Browser automation and web scraping +- **[openai](https://www.npmjs.com/package/openai)** (v5.7.0) - GPT-4 integration for tweet generation +- **[axios](https://www.npmjs.com/package/axios)** (v1.10.0) - HTTP client for Typefully API +- **[node-cron](https://www.npmjs.com/package/node-cron)** (v3.0.3) - Cron job scheduling +- **[yargs](https://www.npmjs.com/package/yargs)** (v18.0.0) - CLI argument parsing +- **[zod](https://www.npmjs.com/package/zod)** (v3.25.67) - Schema validation +- **[chalk](https://www.npmjs.com/package/chalk)** (v4.1.2) - Colored terminal output +- **[dotenv](https://www.npmjs.com/package/dotenv)** (v16.5.0) - Environment variable management + +### Architecture + +**Single-File Design**: The entire bot is contained in `vibe-posting-bot.ts` (442 lines) for simplicity and easy customization. + +**Core Components:** + +1. **Content Hashing** - Uses SHA-256 to detect content changes +2. **State Persistence** - JSON file stores seen content hashes and timestamps +3. **AI Generation** - Random prompt selection from 3 variations per vibe +4. **Typefully Integration** - REST API calls to create drafts +5. **Cron Scheduling** - Automatic periodic execution +6. **Error Handling** - Continues on individual source failures + +**Data Flow:** + +``` +URLs → Hyperbrowser Scrape → Content Hash → Change Detection + ↓ +Typefully Draft ← GPT-4 Generation ← New Content Detected +``` + +--- + +## Use Cases + +Perfect for: + +- **Tech Influencers** - Stay active on Twitter without manual content curation +- **Developer Advocates** - Share relevant tech news in your brand voice +- **VCs & Investors** - Post about emerging technologies and startups +- **Tech Companies** - Automated social media presence for company news +- **Content Creators** - Generate tweet ideas from multiple sources + +--- + +## Troubleshooting + +### API Key Issues +- Ensure all three API keys are set in `.env` file +- Test each API key independently before running the bot + +### No New Content Detected +- Check if sources have actually updated since last run +- Delete `seen-content.json` to reset change detection + +### Typefully Draft Creation Fails +- Verify your Typefully API key has draft creation permissions +- Check the API response in console logs for error details + +### Bot Not Running on Schedule +- Ensure the bot process stays running (use `pm2` or similar for production) +- Verify cron syntax is correct using an online cron validator + +--- + +**Built with [Hyperbrowser](https://hyperbrowser.ai)** | Follow [@hyperbrowser](https://twitter.com/hyperbrowser) for updates \ No newline at end of file