An AI-powered agent that automates vendor adverse media investigations using Bright Data's web scraping infrastructure, OpenAI for risk analysis, and OpenHands SDK for intelligent monitoring script generation.
- Automated Vendor Investigation: Search for adverse media across multiple risk categories (litigation, fraud, financial issues, regulatory violations, operational risks)
- Intelligent Web Scraping: Uses Bright Data SERP API and Web Unlocker to bypass paywalls, CAPTCHAs, and access protected content
- AI Risk Analysis: OpenAI analyzes extracted content and provides severity scores with actionable recommendations
- Auto-Generated Monitoring: OpenHands SDK creates production-ready Python scripts that continuously monitor vendor sources
- REST API: FastAPI-based endpoints for easy integration into existing systems
- Browser Automation: Optional Bright Data Browser API support for JavaScript-heavy sites and complex workflows
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ DISCOVERY │────▶│ ACCESS │────▶│ ACTION │
│ (SERP API) │ │ (Web Unlocker) │ │ (OpenAI + SDK) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
Search Google Bypass paywalls Analyze risks
for red flags and CAPTCHAs Generate scripts
- Python 3.12 or higher
- Bright Data account with API access
- OpenAI API key
- OpenHands Cloud account or your own LLM API key
git clone <your-repo-url>
cd tprm-agent
# Create virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtCreate a .env file in the project root:
# Bright Data API Token (for SERP API)
BRIGHT_DATA_API_TOKEN=your_api_token
# Bright Data SERP Zone
BRIGHT_DATA_SERP_ZONE=your_serp_zone_name
# Bright Data Web Unlocker Zone
BRIGHT_DATA_UNLOCKER_ZONE=your_unlocker_zone_name
# OpenAI (for risk analysis)
OPENAI_API_KEY=your_openai_api_key
# OpenHands (for agentic script generation)
LLM_API_KEY=your_llm_api_key
LLM_MODEL=openai/gpt-4oStart the API server:
python -m uvicorn api.main:app --reloadVisit http://localhost:8000/docs for interactive API documentation.
Run a direct investigation:
python src/agent.pycurl -X POST "http://localhost:8000/investigate" \
-H "Content-Type: application/json" \
-d '{
"vendor_name": "Acme Corp",
"categories": ["litigation", "fraud"],
"generate_monitors": true
}'Response:
{
"investigation_id": "uuid-here",
"status": "started",
"message": "Investigation started for Acme Corp"
}curl http://localhost:8000/investigate/{investigation_id}curl http://localhost:8000/healthfrom src.agent import TPRMAgent
# Initialize agent
agent = TPRMAgent()
# Run investigation
result = agent.investigate(
vendor_name="Acme Corp",
categories=["litigation", "financial", "fraud"],
generate_monitors=True
)
# Print results
print(f"Found {result.total_sources_found} sources")
print(f"Risk Assessments: {len(result.risk_assessments)}")
for assessment in result.risk_assessments:
print(f"\n[{assessment.category}] Severity: {assessment.severity}")
print(f"Summary: {assessment.summary}")tprm-agent/
├── src/
│ ├── __init__.py
│ ├── config.py # Configuration and settings
│ ├── discovery.py # SERP API integration
│ ├── access.py # Web Unlocker integration
│ ├── actions.py # OpenAI + OpenHands SDK
│ ├── agent.py # Main orchestration
│ └── browser.py # Browser API (optional)
├── api/
│ └── main.py # FastAPI endpoints
├── scripts/
│ └── generated/ # Auto-generated monitoring scripts
├── .env # Environment variables (not in git)
├── requirements.txt # Python dependencies
├── Procfile # Railway deployment config
└── README.md
- Install Railway CLI:
npm i -g @railway/cli- Login and initialize:
railway login
railway init- Deploy:
railway up-
Add environment variables in Railway dashboard → Variables
-
Generate domain:
railway domain-
Create
render.yaml(already included) -
Push to GitHub
-
Connect to Render:
- Go to render.com
- New → Web Service
- Connect your repository
- Render auto-detects configuration
-
Add environment variables in Render dashboard
After investigation, auto-generated monitoring scripts are saved in scripts/generated/:
cd scripts/generated
python monitor_acme_corp.pycrontab -eAdd:
0 9 * * * cd /path/to/tprm-agent/scripts/generated && python3 monitor_acme_corp.py
-
SERP API Zone:
- Go to Bright Data dashboard → Proxies & Scraping Infrastructure
- Add → SERP API
- Copy zone name and API token
-
Web Unlocker Zone:
- Add → Web Unlocker
- Copy zone name
-
Browser API (Optional):
- Add → Scraping Browser
- Copy credentials for advanced scenarios
Modify src/discovery.py to add custom risk categories:
RISK_CATEGORIES = {
"litigation": ["lawsuit", "litigation", "sued"],
"financial": ["bankruptcy", "insolvency", "debt"],
"fraud": ["fraud", "scam", "investigation"],
"regulatory": ["violation", "fine", "penalty"],
"operational": ["recall", "safety issue", "disruption"],
# Add your custom categories here
}{
"vendor_name": "string (required)",
"categories": ["litigation", "fraud"] (optional),
"generate_monitors": true (optional, default: true)
}{
"vendor_name": "string",
"started_at": "ISO 8601 timestamp",
"completed_at": "ISO 8601 timestamp",
"total_sources_found": 0,
"total_sources_accessed": 0,
"risk_assessments": [
{
"vendor_name": "string",
"category": "string",
"severity": "low|medium|high|critical",
"summary": "string",
"key_findings": ["string"],
"sources": ["url"],
"recommended_actions": ["string"],
"assessed_at": "ISO 8601 timestamp"
}
],
"monitoring_scripts": [
{
"vendor_name": "string",
"script_path": "string",
"urls_monitored": ["url"],
"check_frequency": "daily",
"created_at": "ISO 8601 timestamp"
}
],
"errors": ["string"]
}For JavaScript-heavy sites or complex workflows:
from src.browser import BrowserClient
import asyncio
async def search_court_registry():
client = BrowserClient()
result = await client.fill_and_submit_form(
url="https://court-registry.example.com/search",
form_data={
"#party-name": "Acme Corp",
"#case-type": "civil",
},
submit_selector="#search-button",
result_selector=".search-results",
)
if result.success:
print(f"Court records: {result.text}")
asyncio.run(search_court_registry())Authentication errors with Bright Data:
- Verify API token and zone names in
.env - Check zone status in Bright Data dashboard
- Ensure zones have available bandwidth
OpenAI rate limits:
- Add retry logic or reduce concurrent requests
- Upgrade OpenAI plan for higher limits
Generated scripts not using Bright Data:
- Check that OpenHands SDK prompt includes Web Unlocker instructions
- Verify
actions.pyhas the correct prompt template
Deployment failures:
- Check dependency sizes (Railway/Render support larger apps than Vercel)
- Verify
Procfileexists and is correct - Ensure all environment variables are set
- Parallel requests: Modify
access.pyto useasynciofor faster content extraction - Caching: Implement Redis caching for repeated vendor investigations
- Rate limiting: Add rate limiting middleware to prevent API abuse
- Database: Use PostgreSQL to persist investigation history
- Never commit
.envfile to git - Use Railway/Render environment variables for production
- Rotate API keys regularly
- Implement authentication on API endpoints for production use
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open a Pull Request
MIT License - see LICENSE file for details
- Bright Data SERP API Documentation
- Bright Data Web Unlocker Documentation
- Bright Data Browser API Documentation
- OpenHands SDK Documentation
- OpenAI API Documentation
- FastAPI Documentation
For issues and questions:
- Open an issue on GitHub
- Check Bright Data documentation
- Review OpenHands SDK examples
- Add support for SEC filings and OFAC sanctions lists
- Implement PostgreSQL database for investigation history
- Add Slack/Teams webhook notifications
- Build React dashboard for visualization
- Support batch vendor investigations
- Add PDF report generation
- Implement user authentication
- Add multi-language support