UNDP Accelerator Labs Content Relevance Checker

An AI-powered tool that automatically determines whether content is related to the UNDP Accelerator Labs network using the Ollama API and natural language processing.

Overview

This project provides an intelligent content filtering system designed to identify articles, blog posts, and other textual content that are relevant to the UNDP Accelerator Labs. The system uses a local Ollama model to analyze content in multiple languages and determine relevance based on sophisticated criteria.

Features

AI-Powered Content Analysis: Uses Ollama's LLaMA 3.2 model for intelligent content classification
Multilingual Support: Can analyze content in English, French, Spanish, Arabic, and other languages
HTML Content Processing: Extracts main content from HTML while filtering out navigation, headers, footers, and "Related Content" sections
Database Integration: Connects to PostgreSQL database to process articles at scale
Two-Step Validation: Implements a robust checking system to ensure accuracy
API Integration: Interfaces with NLP API for content management

Key Detection Criteria

The system identifies content as relevant if it:

Explicitly mentions Accelerator Labs or variations (UNDP Accelerator Lab Network, AccLab, etc.)
Contains mentions in main content (not in sidebars or related content sections)
Discusses Accelerator Labs activities including innovation, partnerships, or solutions
References key roles such as Head of Experiment, Head of Solution, or Head of Exploration
Covers Accelerator Labs contributions to broader topics like innovation or partnerships

Installation

Prerequisites

Python 3.7 or higher
PostgreSQL database
Ollama installed locally with LLaMA 3.2 model

Setup Instructions

Clone the repository

git clone https://github.com/your-username/acclab-scanner.git
cd acclab-scanner

Install Python dependencies
```
pip install -r requirements.txt
```

Install and setup Ollama

# Install Ollama (macOS)
brew install ollama

# Start Ollama service
ollama serve

# Pull the required model
ollama pull llama3.2:3b-instruct-q4_1

Configure environment variables

Create a .env file in the project root with the following variables:

# Database Configuration
DB_NAME=your_database_name
DB_USER=your_database_user
DB_PASS=your_database_password
DB_HOST=your_database_host
DB_PORT=your_database_port

# API Tokens
API_TOKEN=your_api_token
NLP_WRITE_TOKEN=your_nlp_write_token

Database Schema

The system expects the following PostgreSQL tables:

`articles` table

- id: Primary key
- relevance: Integer (relevance score)
- url: Article URL
- title: Article title
- ai_agent_checked: Boolean (processing status)

Supporting tables

article_content: Contains processed article content
article_html_content: Contains HTML content
raw_html: Contains raw HTML backup

Usage

Basic Content Checking

from main import check_relevance

# Check plain text
text = "The UNDP Accelerator Labs are working on innovative solutions."
is_relevant = check_relevance(text)
print(f"Relevant: {is_relevant}")  # Output: Relevant: True

# Check HTML content
html_content = "<html><body><p>AccLab innovation initiative...</p></body></html>"
is_relevant = check_relevance_from_html_or_text(html_content)

Batch Processing from Database

Run the main script to process articles from the database:

python main.py

This will:

Fetch up to 30 unprocessed articles from the database
Analyze each article for UNDP Accelerator Labs relevance
Update the database with relevance scores
Handle API integrations for content management

Testing with Examples

The script includes multilingual test examples:

examples = [
    "The UNDP Accelerator Labs are working on innovative solutions.",  # English
    "Les laboratoires d'accélération du PNUD développent des solutions innovantes.",  # French
    "Los laboratorios de aceleración del PNUD están trabajando en soluciones innovadoras.",  # Spanish
    "مختبرات التسريع التابعة لبرنامج الأمم المتحدة الإنمائي تعمل على حلول مبتكرة."  # Arabic
]

System Architecture

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   PostgreSQL    │────│  Python Script   │────│   Ollama API    │
│   Database      │    │                  │    │  (LLaMA 3.2)    │
└─────────────────┘    └──────────────────┘    └─────────────────┘
                              │
                              ▼
                       ┌──────────────────┐
                       │    NLP API       │
                       │   Integration    │
                       └──────────────────┘

Functions Overview

Core Functions

check_relevance(text: str) -> bool: Main relevance checking function using AI
extract_main_content(html: str) -> str: Extracts clean content from HTML
check_relevance_from_html_or_text(content: str) -> bool: Unified content checker
fetch_and_update_articles(): Database batch processing function

Utility Functions

call_nlp_api(url, payload): Handles external API communications

Configuration

Model Configuration

The system uses llama3.2:3b-instruct-q4_1 by default. You can modify the model in the check_relevance function:

model = 'llama3.2:3b-instruct-q4_1'  # Change this to use a different model

Processing Limits

Currently processes 30 articles per run. Modify the LIMIT in the SQL query to change this:

LIMIT 30;  -- Change this number to process more/fewer articles

Troubleshooting

Common Issues

Ollama Connection Error
- Ensure Ollama service is running: ollama serve
- Verify the model is installed: ollama list
Database Connection Issues
- Check your .env file configuration
- Verify database credentials and accessibility
Model Response Format Issues
- The system validates model responses for "true"/"false" format
- Unexpected responses are treated as "not relevant"

Debugging

Enable detailed logging by examining the print statements in the code. The system outputs:

Response validation results
Content extraction status
Database processing information

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/new-feature)
Commit your changes (git commit -am 'Add new feature')
Push to the branch (git push origin feature/new-feature)
Create a Pull Request

License

This project is part of the UNDP innovation initiatives. Please refer to your organization's licensing guidelines.

Support

For issues or questions related to this tool, please contact the development team or create an issue in the repository.

Note: This tool requires local Ollama installation and appropriate database access. Ensure all prerequisites are met before running the system in production.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

UNDP Accelerator Labs Content Relevance Checker

Overview

Features

Key Detection Criteria

Installation

Prerequisites

Setup Instructions

Database Schema

`articles` table

Supporting tables

Usage

Basic Content Checking

Batch Processing from Database

Testing with Examples

System Architecture

Functions Overview

Core Functions

Utility Functions

Configuration

Model Configuration

Processing Limits

Troubleshooting

Common Issues

Debugging

Contributing

License

Support

About

Uh oh!

Releases

Packages

Languages

UNDP-Accelerator-Labs/acclab-scanner

Folders and files

Latest commit

History

Repository files navigation

UNDP Accelerator Labs Content Relevance Checker

Overview

Features

Key Detection Criteria

Installation

Prerequisites

Setup Instructions

Database Schema

articles table

Supporting tables

Usage

Basic Content Checking

Batch Processing from Database

Testing with Examples

System Architecture

Functions Overview

Core Functions

Utility Functions

Configuration

Model Configuration

Processing Limits

Troubleshooting

Common Issues

Debugging

Contributing

License

Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`articles` table

Packages