Skip to content

AqilaRifti/TrustGraph

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Wikipedia vs Grokipedia Quality Control System

OriginTrail Global Hackathon 2025

An AI-powered content comparison and trust annotation system that analyzes articles from Wikipedia and Grokipedia, detects discrepancies, and publishes Community Notes to the OriginTrail Decentralized Knowledge Graph (DKG).

πŸš€ Features

  • Automated Content Fetching: Scrapes 50+ topics from Wikipedia and Grokipedia
  • Vector Embeddings: Uses Sentence-Transformers for semantic analysis
  • AI-Powered Analysis: Leverages Cerebras AI with 8-key load balancing
  • Discrepancy Detection: Identifies length, keyword, and structural differences
  • Community Notes: Generates neutral, evidence-based fact-checking notes
  • DKG Publishing: Publishes trust annotations to OriginTrail blockchain
  • Web Dashboard: Clean, responsive UI with real-time progress tracking

πŸ› οΈ Tech Stack

Backend

  • Python 3.9+ with Flask
  • Pinecone - Vector database (free tier)
  • Sentence-Transformers - Local embeddings (all-MiniLM-L6-v2)
  • Cerebras Cloud SDK - AI analysis with load balancing
  • OriginTrail DKG - Decentralized knowledge graph
  • BeautifulSoup4 - Web scraping
  • Wikipedia API - Content fetching

Frontend

  • HTML5/CSS3/JavaScript
  • Bootstrap 5 - Responsive design
  • Vanilla JS - No frameworks needed

πŸ“¦ Installation

Prerequisites

  • Python 3.9+ with pip
  • Node.js 16+ with npm
  • Pinecone API key (free tier)
  • Cerebras API keys (provided)
  • OriginTrail DKG Node (local or remote)
  • Wallet with testnet tokens (NEURO + TRAC)

Setup Steps

  1. Clone the repository
git clone <repository-url>
cd hackathon-project
  1. Install Python dependencies
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt
  1. Install Node.js dependencies (DKG SDK)
npm install
  1. Configure environment variables
cp .env.example .env

Edit .env and add your credentials:

# Pinecone
PINECONE_API_KEY=your_pinecone_api_key

# DKG Edge Node (uses official OriginTrail SDK)
DKG_SERVICE_URL=http://localhost:3000
DKG_ENDPOINT=http://localhost:8900
DKG_PUBLIC_KEY=0xYourPublicAddress
DKG_PRIVATE_KEY=0xYourPrivateKey

# Flask
FLASK_DEBUG=True
FLASK_PORT=5000
  1. Start DKG Edge Node service (separate terminal)
npm start
  1. Start Python Flask app (separate terminal)
python app.py
  1. Access the dashboard
http://localhost:5000

Quick Test

Test DKG integration:

node test-dkg.js

🎯 Usage

Starting a Scan

  1. Open the dashboard at http://localhost:5000
  2. Click the "πŸš€ Start Scanning" button
  3. Watch real-time progress as topics are analyzed
  4. View results in the table once complete

Viewing Comparisons

  1. Click "View" on any completed topic
  2. Review:
    • Similarity score (color-coded)
    • AI analysis from Cerebras
    • Detected discrepancies
    • Community Note
    • Side-by-side content comparison

Publishing to DKG

  1. On a comparison page, click "Publish to DKG"
  2. Wait for confirmation
  3. UAL (Universal Asset Locator) will be displayed

πŸ“Š API Endpoints

GET /

Renders the main dashboard

GET /comparison/<topic_name>

Renders detailed comparison page for a topic

GET /api/topics

Returns list of all topics with status

[
  {
    "name": "Artificial Intelligence",
    "similarity": 0.85,
    "discrepancies": 2,
    "status": "completed",
    "ai_analysis_available": true
  }
]

POST /api/scan

Starts background scanning process

{
  "status": "scanning",
  "job_id": "scan_001"
}

GET /api/scan-status

Returns current scan progress

{
  "status": "processing",
  "progress": 45,
  "current_topic": "Quantum Computing"
}

GET /api/topic/<topic_name>

Returns detailed analysis for a specific topic

{
  "similarity_score": 0.82,
  "discrepancies": [...],
  "ai_analysis": "...",
  "community_note": "...",
  "ual": "did:dkg:..."
}

POST /api/publish-dkg

Manually publishes a topic to DKG

{
  "topic": "Artificial Intelligence",
  "discrepancies": [...],
  "similarity_score": 0.82,
  "ai_analysis": "..."
}

πŸ”— DKG Integration

The system publishes Community Notes as JSON-LD Knowledge Assets to the OriginTrail Decentralized Knowledge Graph:

  • Format: ActivityStreams JSON-LD with provenance
  • Blockchain: NeuroWeb Testnet (Chain ID: 20430)
  • Local Node: Connects to http://localhost:8900
  • UAL: Returns Universal Asset Locator for each published note
  • Mock Mode: Works without DKG node (generates mock UALs)

See DKG_SETUP.md for detailed setup instructions.

πŸ”‘ Key Features Explained

API Key Load Balancing

The system rotates through 8 Cerebras API keys using round-robin algorithm to avoid rate limits:

# Automatic rotation on each request
key_rotator.get_next_key()

Vector Similarity

Uses cosine similarity on 384-dimensional embeddings:

  • 0.8-1.0: High similarity (green)
  • 0.6-0.8: Moderate similarity (yellow)
  • 0.0-0.6: Low similarity (red)

Discrepancy Detection

Three types of discrepancies:

  1. Length: >30% difference in content length
  2. Keyword: <50% overlap in top keywords (TF-IDF)
  3. Structural: Significant differences in formatting

Graceful Error Handling

  • Failed topics are skipped, not blocking the scan
  • Cerebras failures fall back to automatic analysis
  • DKG publishing errors are logged but don't crash

πŸ“ Project Structure

hackathon-project/
β”œβ”€β”€ app.py                      # Flask application
β”œβ”€β”€ config.py                   # Configuration
β”œβ”€β”€ requirements.txt            # Dependencies
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ scraper.py             # Content fetching
β”‚   β”œβ”€β”€ embeddings.py          # Vector embeddings
β”‚   β”œβ”€β”€ comparison.py          # Discrepancy detection
β”‚   β”œβ”€β”€ cerebras_analyzer.py   # AI analysis
β”‚   └── dkg_publisher.py       # DKG publishing
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ topics.json            # 54 topics
β”‚   └── api_keys.py            # API key rotation
β”œβ”€β”€ static/
β”‚   β”œβ”€β”€ css/style.css          # Custom styles
β”‚   └── js/script.js           # Utilities
β”œβ”€β”€ templates/
β”‚   β”œβ”€β”€ base.html              # Base template
β”‚   β”œβ”€β”€ index.html             # Dashboard
β”‚   └── comparison.html        # Comparison view
└── .env                        # Environment variables

πŸ› Troubleshooting

Pinecone Connection Issues

  • Verify API key in .env
  • Check index name matches wikipedia-grokipedia
  • Ensure free tier limits not exceeded

Cerebras API Errors

  • System automatically rotates through 8 keys
  • Check logs for specific error messages
  • Falls back to automatic analysis if all keys fail

Wikipedia Fetch Failures

  • Some topics may have disambiguation pages
  • System automatically tries first option
  • Check logs for specific failures

Grokipedia Scraping Issues

  • Update URL format in backend/scraper.py
  • Adjust CSS selectors based on site structure
  • Mock data used if site unavailable

πŸ“ License

MIT License - OriginTrail Global Hackathon 2025

πŸ™ Acknowledgments

  • OriginTrail - DKG infrastructure
  • Cerebras - AI analysis platform
  • Pinecone - Vector database
  • Wikipedia - Baseline content source

About

AI is generating content faster than humans can verify it, creating a trust crisis across education, healthcare, and journalism. TrustGraph solves this by automating trust verification at scale, publishing immutable scores to blockchain.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors