Skip to content

UNDP-Accelerator-Labs/acclab-scanner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

UNDP Accelerator Labs Content Relevance Checker

An AI-powered tool that automatically determines whether content is related to the UNDP Accelerator Labs network using the Ollama API and natural language processing.

Overview

This project provides an intelligent content filtering system designed to identify articles, blog posts, and other textual content that are relevant to the UNDP Accelerator Labs. The system uses a local Ollama model to analyze content in multiple languages and determine relevance based on sophisticated criteria.

Features

  • AI-Powered Content Analysis: Uses Ollama's LLaMA 3.2 model for intelligent content classification
  • Multilingual Support: Can analyze content in English, French, Spanish, Arabic, and other languages
  • HTML Content Processing: Extracts main content from HTML while filtering out navigation, headers, footers, and "Related Content" sections
  • Database Integration: Connects to PostgreSQL database to process articles at scale
  • Two-Step Validation: Implements a robust checking system to ensure accuracy
  • API Integration: Interfaces with NLP API for content management

Key Detection Criteria

The system identifies content as relevant if it:

  1. Explicitly mentions Accelerator Labs or variations (UNDP Accelerator Lab Network, AccLab, etc.)
  2. Contains mentions in main content (not in sidebars or related content sections)
  3. Discusses Accelerator Labs activities including innovation, partnerships, or solutions
  4. References key roles such as Head of Experiment, Head of Solution, or Head of Exploration
  5. Covers Accelerator Labs contributions to broader topics like innovation or partnerships

Installation

Prerequisites

  • Python 3.7 or higher
  • PostgreSQL database
  • Ollama installed locally with LLaMA 3.2 model

Setup Instructions

  1. Clone the repository

    git clone https://github.com/your-username/acclab-scanner.git
    cd acclab-scanner
  2. Install Python dependencies

    pip install -r requirements.txt
  3. Install and setup Ollama

    # Install Ollama (macOS)
    brew install ollama
    
    # Start Ollama service
    ollama serve
    
    # Pull the required model
    ollama pull llama3.2:3b-instruct-q4_1
  4. Configure environment variables

    Create a .env file in the project root with the following variables:

    # Database Configuration
    DB_NAME=your_database_name
    DB_USER=your_database_user
    DB_PASS=your_database_password
    DB_HOST=your_database_host
    DB_PORT=your_database_port
    
    # API Tokens
    API_TOKEN=your_api_token
    NLP_WRITE_TOKEN=your_nlp_write_token

Database Schema

The system expects the following PostgreSQL tables:

articles table

- id: Primary key
- relevance: Integer (relevance score)
- url: Article URL
- title: Article title
- ai_agent_checked: Boolean (processing status)

Supporting tables

  • article_content: Contains processed article content
  • article_html_content: Contains HTML content
  • raw_html: Contains raw HTML backup

Usage

Basic Content Checking

from main import check_relevance

# Check plain text
text = "The UNDP Accelerator Labs are working on innovative solutions."
is_relevant = check_relevance(text)
print(f"Relevant: {is_relevant}")  # Output: Relevant: True

# Check HTML content
html_content = "<html><body><p>AccLab innovation initiative...</p></body></html>"
is_relevant = check_relevance_from_html_or_text(html_content)

Batch Processing from Database

Run the main script to process articles from the database:

python main.py

This will:

  1. Fetch up to 30 unprocessed articles from the database
  2. Analyze each article for UNDP Accelerator Labs relevance
  3. Update the database with relevance scores
  4. Handle API integrations for content management

Testing with Examples

The script includes multilingual test examples:

examples = [
    "The UNDP Accelerator Labs are working on innovative solutions.",  # English
    "Les laboratoires d'accélération du PNUD développent des solutions innovantes.",  # French
    "Los laboratorios de aceleración del PNUD están trabajando en soluciones innovadoras.",  # Spanish
    "مختبرات التسريع التابعة لبرنامج الأمم المتحدة الإنمائي تعمل على حلول مبتكرة."  # Arabic
]

System Architecture

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   PostgreSQL    │────│  Python Script   │────│   Ollama API    │
│   Database      │    │                  │    │  (LLaMA 3.2)    │
└─────────────────┘    └──────────────────┘    └─────────────────┘
                              │
                              ▼
                       ┌──────────────────┐
                       │    NLP API       │
                       │   Integration    │
                       └──────────────────┘

Functions Overview

Core Functions

  • check_relevance(text: str) -> bool: Main relevance checking function using AI
  • extract_main_content(html: str) -> str: Extracts clean content from HTML
  • check_relevance_from_html_or_text(content: str) -> bool: Unified content checker
  • fetch_and_update_articles(): Database batch processing function

Utility Functions

  • call_nlp_api(url, payload): Handles external API communications

Configuration

Model Configuration

The system uses llama3.2:3b-instruct-q4_1 by default. You can modify the model in the check_relevance function:

model = 'llama3.2:3b-instruct-q4_1'  # Change this to use a different model

Processing Limits

Currently processes 30 articles per run. Modify the LIMIT in the SQL query to change this:

LIMIT 30;  -- Change this number to process more/fewer articles

Troubleshooting

Common Issues

  1. Ollama Connection Error

    • Ensure Ollama service is running: ollama serve
    • Verify the model is installed: ollama list
  2. Database Connection Issues

    • Check your .env file configuration
    • Verify database credentials and accessibility
  3. Model Response Format Issues

    • The system validates model responses for "true"/"false" format
    • Unexpected responses are treated as "not relevant"

Debugging

Enable detailed logging by examining the print statements in the code. The system outputs:

  • Response validation results
  • Content extraction status
  • Database processing information

Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/new-feature)
  3. Commit your changes (git commit -am 'Add new feature')
  4. Push to the branch (git push origin feature/new-feature)
  5. Create a Pull Request

License

This project is part of the UNDP innovation initiatives. Please refer to your organization's licensing guidelines.

Support

For issues or questions related to this tool, please contact the development team or create an issue in the repository.


Note: This tool requires local Ollama installation and appropriate database access. Ensure all prerequisites are met before running the system in production.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages