OriginTrail Global Hackathon 2025
An AI-powered content comparison and trust annotation system that analyzes articles from Wikipedia and Grokipedia, detects discrepancies, and publishes Community Notes to the OriginTrail Decentralized Knowledge Graph (DKG).
- Automated Content Fetching: Scrapes 50+ topics from Wikipedia and Grokipedia
- Vector Embeddings: Uses Sentence-Transformers for semantic analysis
- AI-Powered Analysis: Leverages Cerebras AI with 8-key load balancing
- Discrepancy Detection: Identifies length, keyword, and structural differences
- Community Notes: Generates neutral, evidence-based fact-checking notes
- DKG Publishing: Publishes trust annotations to OriginTrail blockchain
- Web Dashboard: Clean, responsive UI with real-time progress tracking
- Python 3.9+ with Flask
- Pinecone - Vector database (free tier)
- Sentence-Transformers - Local embeddings (all-MiniLM-L6-v2)
- Cerebras Cloud SDK - AI analysis with load balancing
- OriginTrail DKG - Decentralized knowledge graph
- BeautifulSoup4 - Web scraping
- Wikipedia API - Content fetching
- HTML5/CSS3/JavaScript
- Bootstrap 5 - Responsive design
- Vanilla JS - No frameworks needed
- Python 3.9+ with pip
- Node.js 16+ with npm
- Pinecone API key (free tier)
- Cerebras API keys (provided)
- OriginTrail DKG Node (local or remote)
- Wallet with testnet tokens (NEURO + TRAC)
- Clone the repository
git clone <repository-url>
cd hackathon-project- Install Python dependencies
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt- Install Node.js dependencies (DKG SDK)
npm install- Configure environment variables
cp .env.example .envEdit .env and add your credentials:
# Pinecone
PINECONE_API_KEY=your_pinecone_api_key
# DKG Edge Node (uses official OriginTrail SDK)
DKG_SERVICE_URL=http://localhost:3000
DKG_ENDPOINT=http://localhost:8900
DKG_PUBLIC_KEY=0xYourPublicAddress
DKG_PRIVATE_KEY=0xYourPrivateKey
# Flask
FLASK_DEBUG=True
FLASK_PORT=5000- Start DKG Edge Node service (separate terminal)
npm start- Start Python Flask app (separate terminal)
python app.py- Access the dashboard
http://localhost:5000
Test DKG integration:
node test-dkg.js- Open the dashboard at
http://localhost:5000 - Click the "π Start Scanning" button
- Watch real-time progress as topics are analyzed
- View results in the table once complete
- Click "View" on any completed topic
- Review:
- Similarity score (color-coded)
- AI analysis from Cerebras
- Detected discrepancies
- Community Note
- Side-by-side content comparison
- On a comparison page, click "Publish to DKG"
- Wait for confirmation
- UAL (Universal Asset Locator) will be displayed
Renders the main dashboard
Renders detailed comparison page for a topic
Returns list of all topics with status
[
{
"name": "Artificial Intelligence",
"similarity": 0.85,
"discrepancies": 2,
"status": "completed",
"ai_analysis_available": true
}
]Starts background scanning process
{
"status": "scanning",
"job_id": "scan_001"
}Returns current scan progress
{
"status": "processing",
"progress": 45,
"current_topic": "Quantum Computing"
}Returns detailed analysis for a specific topic
{
"similarity_score": 0.82,
"discrepancies": [...],
"ai_analysis": "...",
"community_note": "...",
"ual": "did:dkg:..."
}Manually publishes a topic to DKG
{
"topic": "Artificial Intelligence",
"discrepancies": [...],
"similarity_score": 0.82,
"ai_analysis": "..."
}The system publishes Community Notes as JSON-LD Knowledge Assets to the OriginTrail Decentralized Knowledge Graph:
- Format: ActivityStreams JSON-LD with provenance
- Blockchain: NeuroWeb Testnet (Chain ID: 20430)
- Local Node: Connects to
http://localhost:8900 - UAL: Returns Universal Asset Locator for each published note
- Mock Mode: Works without DKG node (generates mock UALs)
See DKG_SETUP.md for detailed setup instructions.
The system rotates through 8 Cerebras API keys using round-robin algorithm to avoid rate limits:
# Automatic rotation on each request
key_rotator.get_next_key()Uses cosine similarity on 384-dimensional embeddings:
- 0.8-1.0: High similarity (green)
- 0.6-0.8: Moderate similarity (yellow)
- 0.0-0.6: Low similarity (red)
Three types of discrepancies:
- Length: >30% difference in content length
- Keyword: <50% overlap in top keywords (TF-IDF)
- Structural: Significant differences in formatting
- Failed topics are skipped, not blocking the scan
- Cerebras failures fall back to automatic analysis
- DKG publishing errors are logged but don't crash
hackathon-project/
βββ app.py # Flask application
βββ config.py # Configuration
βββ requirements.txt # Dependencies
βββ backend/
β βββ scraper.py # Content fetching
β βββ embeddings.py # Vector embeddings
β βββ comparison.py # Discrepancy detection
β βββ cerebras_analyzer.py # AI analysis
β βββ dkg_publisher.py # DKG publishing
βββ data/
β βββ topics.json # 54 topics
β βββ api_keys.py # API key rotation
βββ static/
β βββ css/style.css # Custom styles
β βββ js/script.js # Utilities
βββ templates/
β βββ base.html # Base template
β βββ index.html # Dashboard
β βββ comparison.html # Comparison view
βββ .env # Environment variables
- Verify API key in
.env - Check index name matches
wikipedia-grokipedia - Ensure free tier limits not exceeded
- System automatically rotates through 8 keys
- Check logs for specific error messages
- Falls back to automatic analysis if all keys fail
- Some topics may have disambiguation pages
- System automatically tries first option
- Check logs for specific failures
- Update URL format in
backend/scraper.py - Adjust CSS selectors based on site structure
- Mock data used if site unavailable
MIT License - OriginTrail Global Hackathon 2025
- OriginTrail - DKG infrastructure
- Cerebras - AI analysis platform
- Pinecone - Vector database
- Wikipedia - Baseline content source