An AI-powered tool that automatically determines whether content is related to the UNDP Accelerator Labs network using the Ollama API and natural language processing.
This project provides an intelligent content filtering system designed to identify articles, blog posts, and other textual content that are relevant to the UNDP Accelerator Labs. The system uses a local Ollama model to analyze content in multiple languages and determine relevance based on sophisticated criteria.
- AI-Powered Content Analysis: Uses Ollama's LLaMA 3.2 model for intelligent content classification
- Multilingual Support: Can analyze content in English, French, Spanish, Arabic, and other languages
- HTML Content Processing: Extracts main content from HTML while filtering out navigation, headers, footers, and "Related Content" sections
- Database Integration: Connects to PostgreSQL database to process articles at scale
- Two-Step Validation: Implements a robust checking system to ensure accuracy
- API Integration: Interfaces with NLP API for content management
The system identifies content as relevant if it:
- Explicitly mentions Accelerator Labs or variations (UNDP Accelerator Lab Network, AccLab, etc.)
- Contains mentions in main content (not in sidebars or related content sections)
- Discusses Accelerator Labs activities including innovation, partnerships, or solutions
- References key roles such as Head of Experiment, Head of Solution, or Head of Exploration
- Covers Accelerator Labs contributions to broader topics like innovation or partnerships
- Python 3.7 or higher
- PostgreSQL database
- Ollama installed locally with LLaMA 3.2 model
-
Clone the repository
git clone https://github.com/your-username/acclab-scanner.git cd acclab-scanner -
Install Python dependencies
pip install -r requirements.txt
-
Install and setup Ollama
# Install Ollama (macOS) brew install ollama # Start Ollama service ollama serve # Pull the required model ollama pull llama3.2:3b-instruct-q4_1
-
Configure environment variables
Create a
.envfile in the project root with the following variables:# Database Configuration DB_NAME=your_database_name DB_USER=your_database_user DB_PASS=your_database_password DB_HOST=your_database_host DB_PORT=your_database_port # API Tokens API_TOKEN=your_api_token NLP_WRITE_TOKEN=your_nlp_write_token
The system expects the following PostgreSQL tables:
- id: Primary key
- relevance: Integer (relevance score)
- url: Article URL
- title: Article title
- ai_agent_checked: Boolean (processing status)article_content: Contains processed article contentarticle_html_content: Contains HTML contentraw_html: Contains raw HTML backup
from main import check_relevance
# Check plain text
text = "The UNDP Accelerator Labs are working on innovative solutions."
is_relevant = check_relevance(text)
print(f"Relevant: {is_relevant}") # Output: Relevant: True
# Check HTML content
html_content = "<html><body><p>AccLab innovation initiative...</p></body></html>"
is_relevant = check_relevance_from_html_or_text(html_content)Run the main script to process articles from the database:
python main.pyThis will:
- Fetch up to 30 unprocessed articles from the database
- Analyze each article for UNDP Accelerator Labs relevance
- Update the database with relevance scores
- Handle API integrations for content management
The script includes multilingual test examples:
examples = [
"The UNDP Accelerator Labs are working on innovative solutions.", # English
"Les laboratoires d'accélération du PNUD développent des solutions innovantes.", # French
"Los laboratorios de aceleración del PNUD están trabajando en soluciones innovadoras.", # Spanish
"مختبرات التسريع التابعة لبرنامج الأمم المتحدة الإنمائي تعمل على حلول مبتكرة." # Arabic
]┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ PostgreSQL │────│ Python Script │────│ Ollama API │
│ Database │ │ │ │ (LLaMA 3.2) │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│
▼
┌──────────────────┐
│ NLP API │
│ Integration │
└──────────────────┘
check_relevance(text: str) -> bool: Main relevance checking function using AIextract_main_content(html: str) -> str: Extracts clean content from HTMLcheck_relevance_from_html_or_text(content: str) -> bool: Unified content checkerfetch_and_update_articles(): Database batch processing function
call_nlp_api(url, payload): Handles external API communications
The system uses llama3.2:3b-instruct-q4_1 by default. You can modify the model in the check_relevance function:
model = 'llama3.2:3b-instruct-q4_1' # Change this to use a different modelCurrently processes 30 articles per run. Modify the LIMIT in the SQL query to change this:
LIMIT 30; -- Change this number to process more/fewer articles-
Ollama Connection Error
- Ensure Ollama service is running:
ollama serve - Verify the model is installed:
ollama list
- Ensure Ollama service is running:
-
Database Connection Issues
- Check your
.envfile configuration - Verify database credentials and accessibility
- Check your
-
Model Response Format Issues
- The system validates model responses for "true"/"false" format
- Unexpected responses are treated as "not relevant"
Enable detailed logging by examining the print statements in the code. The system outputs:
- Response validation results
- Content extraction status
- Database processing information
- Fork the repository
- Create a feature branch (
git checkout -b feature/new-feature) - Commit your changes (
git commit -am 'Add new feature') - Push to the branch (
git push origin feature/new-feature) - Create a Pull Request
This project is part of the UNDP innovation initiatives. Please refer to your organization's licensing guidelines.
For issues or questions related to this tool, please contact the development team or create an issue in the repository.
Note: This tool requires local Ollama installation and appropriate database access. Ensure all prerequisites are met before running the system in production.