Skip to content

aneeshg5/ScraperAgent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ€– Multimodal Web Research Intelligence Agent

A cutting-edge AI-powered research agent that combines web browsing, screenshot analysis, and NVIDIA's advanced AI models to provide comprehensive multimodal research insights.

NVIDIA AI Stack FastAPI Playwright Python

🌟 Features

🧠 Advanced AI Capabilities

  • NVIDIA Nemotron Integration: Leverages NVIDIA's latest LLM models for advanced reasoning
  • Vision Analysis: Screenshot analysis using NVIDIA Vision models
  • Multimodal Understanding: Combines text and visual content for comprehensive insights

🌐 Web Research Automation

  • Intelligent Browsing: Automated web navigation with Playwright
  • Content Extraction: Smart text extraction and cleaning
  • Screenshot Capture: Automatic visual content capture
  • Multi-URL Processing: Parallel processing of multiple sources

πŸ’Ύ Session Management

  • Memory System: Persistent storage of research sessions
  • Search History: Query tracking and suggestions
  • User Preferences: Customizable settings and preferences

🎨 Modern Interface

  • Beautiful UI: Responsive, modern web interface
  • Real-time Updates: Live progress tracking
  • Interactive Results: Rich formatting and source attribution

πŸ—οΈ Architecture

graph TB
    A[Frontend Interface] --> B[FastAPI Backend]
    B --> C[Agent Core]
    C --> D[Browser Tools]
    C --> E[Vision Analysis]
    C --> F[NVIDIA AI Models]
    D --> G[Playwright Browser]
    E --> H[Vision Models]
    F --> I[Nemotron LLM]
    C --> J[Memory System]
    J --> K[SQLite Database]
Loading

πŸš€ Quick Start

Prerequisites

  • Python 3.8+
  • Node.js (for Playwright)
  • NVIDIA API Access (for full functionality)

Installation

  1. Clone the repository:
git clone <repository-url>
cd multimodal-browser-agent
  1. Install Python dependencies:
pip install -r requirements.txt
  1. Install Playwright browsers:
playwright install
  1. Configure environment variables:
cp .env.example .env
# Edit .env with your NVIDIA API credentials
  1. Start the application:
python -m uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
  1. Access the interface:

βš™οΈ Configuration

Environment Variables

Create a .env file with the following configurations:

# NVIDIA API Configuration
NVIDIA_API_KEY=your_nvidia_nim_api_key_here
NVIDIA_VISION_ENDPOINT=https://your-vision-api-endpoint.com/api/v1/vision
NIM_ENDPOINT=https://your-nim-endpoint.com/api/v1/generate

# Database Configuration
DATABASE_URL=sqlite:///./agent_memory.db

# FastAPI Configuration
API_HOST=0.0.0.0
API_PORT=8000
DEBUG=True

# Browser Configuration
HEADLESS_BROWSER=True
BROWSER_TIMEOUT=30000

NVIDIA API Setup

  1. Get API Access:

    • Visit NVIDIA Developer
    • Sign up for NIM (NVIDIA Inference Microservices) access
    • Obtain your API key
  2. Configure Endpoints:

    • Update NVIDIA_API_KEY with your API key
    • Set NIM_ENDPOINT to your NIM service URL
    • Configure NVIDIA_VISION_ENDPOINT for vision capabilities

πŸ”§ API Reference

Research Endpoint

POST /agent/research

Perform multimodal web research on provided URLs.

{
  "query": "What are the latest AI trends?",
  "urls": [
    "https://example.com/ai-news",
    "https://example.com/tech-trends"
  ],
  "max_tokens": 512,
  "include_screenshots": true
}

Response:

{
  "result": "Comprehensive analysis results...",
  "sources_analyzed": ["https://example.com/ai-news"],
  "vision_insights": ["Screenshot analysis results..."],
  "processing_time": 15.2,
  "timestamp": "2024-01-15T10:30:00Z"
}

Vision Analysis Endpoint

POST /agent/vision

Analyze uploaded images using NVIDIA vision models.

  • Upload image file
  • Optional query parameter for targeted analysis

Health Check

GET /health

Check system health and service status.

πŸ§ͺ Usage Examples

Basic Research Query

import requests

response = requests.post("http://localhost:8000/agent/research", json={
    "query": "Compare the latest GPU architectures from NVIDIA",
    "urls": [
        "https://www.nvidia.com/en-us/geforce/graphics-cards/",
        "https://www.nvidia.com/en-us/data-center/a100/"
    ],
    "max_tokens": 1024,
    "include_screenshots": True
})

results = response.json()
print(results["result"])

Image Analysis

import requests

with open("screenshot.png", "rb") as f:
    response = requests.post(
        "http://localhost:8000/agent/vision",
        files={"file": f},
        data={"query": "What UI elements are visible in this screenshot?"}
    )

analysis = response.json()
print(analysis["analysis"])

🏭 Production Deployment

Docker Deployment

FROM python:3.9-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
RUN playwright install --with-deps

COPY . .
EXPOSE 8000

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

Cloud Deployment Options

  1. NVIDIA DGX Cloud: Optimal for GPU-accelerated inference
  2. AWS/GCP/Azure: Standard cloud deployment
  3. Kubernetes: Scalable container orchestration
  4. Docker Compose: Multi-service local deployment

πŸ” Development

Project Structure

multimodal-browser-agent/
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ main.py              # FastAPI application
β”‚   β”œβ”€β”€ agent/
β”‚   β”‚   β”œβ”€β”€ core.py          # Agent orchestration
β”‚   β”‚   β”œβ”€β”€ browser_tools.py # Playwright automation
β”‚   β”‚   └── vision.py        # Vision model integration
β”‚   β”œβ”€β”€ models/
β”‚   β”‚   └── schemas.py       # API schemas
β”‚   └── utils/
β”‚       └── memory.py        # Memory management
β”œβ”€β”€ frontend/
β”‚   └── index.html          # Web interface
β”œβ”€β”€ requirements.txt        # Python dependencies
β”œβ”€β”€ .env.example           # Environment template
└── README.md             # Documentation

Development Setup

  1. Create virtual environment:
python -m venv venv
source venv/bin/activate  # Linux/Mac
# or
venv\Scripts\activate     # Windows
  1. Install dependencies:
pip install -r requirements.txt
pip install pytest pytest-asyncio  # For testing
  1. Run tests:
pytest
  1. Development server:
uvicorn app.main:app --reload --log-level debug

Adding New Features

  1. New Agent Tools: Extend app/agent/ modules
  2. API Endpoints: Add routes to app/main.py
  3. Frontend Features: Modify frontend/index.html
  4. Data Models: Update app/models/schemas.py

πŸ§ͺ Testing

Unit Tests

# Run all tests
pytest

# Run specific test file
pytest tests/test_agent.py

# Run with coverage
pytest --cov=app tests/

Integration Tests

# Test API endpoints
pytest tests/test_api.py

# Test browser automation
pytest tests/test_browser.py

Load Testing

# Using Locust
pip install locust
locust -f tests/load_test.py --host=http://localhost:8000

πŸ“Š Monitoring & Analytics

Built-in Monitoring

  • Health Check: /health endpoint
  • Agent Status: /agent/status endpoint
  • Database Stats: Memory usage and session statistics

Logging

The application uses structured logging:

import logging
logger = logging.getLogger(__name__)

# Logs are automatically formatted with timestamps and levels
logger.info("Research completed successfully")
logger.error("NVIDIA API error", extra={"status_code": 500})

Performance Metrics

  • Processing time tracking
  • Memory usage monitoring
  • API response times
  • Success/failure rates

πŸ”’ Security Considerations

API Security

  • Rate Limiting: Implement request rate limiting
  • Authentication: Add API key authentication for production
  • Input Validation: Comprehensive input sanitization
  • CORS Configuration: Proper cross-origin settings

Data Privacy

  • Secure Storage: Encrypted database connections
  • Data Retention: Configurable session cleanup
  • User Privacy: No sensitive data logging

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Submit a pull request

Code Style

  • Follow PEP 8 for Python code
  • Use type hints
  • Add docstrings to functions
  • Keep functions focused and modular

πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • NVIDIA: For providing cutting-edge AI models and infrastructure
  • FastAPI: For the excellent web framework
  • Playwright: For robust browser automation
  • LangChain: For AI application development tools

πŸ“ž Support

  • Documentation: Check the /docs endpoint
  • Issues: Report bugs via GitHub issues
  • Discussions: Join community discussions
  • Email: Contact the development team

Built with ❀️ using NVIDIA AI Stack

Ready to revolutionize web research with multimodal AI? Get started today!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published