Skip to content

A production-grade ML system for predicting content virality using graph neural networks, temporal modeling, and uncertainty quantification.

Notifications You must be signed in to change notification settings

Ayushhgit/virality-forecasting-MLOps

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Virality Forecasting System

A production-grade ML system for predicting content virality using graph neural networks, temporal modeling, and uncertainty quantification.

Architecture Overview

virality-forecasting/
├── ingestion/          # Data collection and streaming
├── features/           # Feature engineering (online/offline)
├── graph/              # Graph construction and GNN models
├── models/             # ML models (baselines, temporal, uncertainty)
├── training/           # Training pipelines with MLflow
├── inference/          # FastAPI prediction service
├── evaluation/         # Metrics and evaluation framework
├── monitoring/         # Drift detection and alerting
└── deployment/         # Docker, K8s, CI/CD

Key Features

  • Multi-source Ingestion: Reddit, Twitter/X collectors with synthetic data generation
  • Real-time Features: Engagement velocity, sentiment volatility, temporal normalization
  • Graph-based Learning: Social graph construction, influence scoring, GNN models
  • Uncertainty Quantification: MC Dropout, temperature scaling, calibration
  • Production Ready: FastAPI service, Kubernetes deployment, comprehensive monitoring

Quick Start

Installation

# Clone and install
pip install -e ".[dev]"

# Or with Docker
docker build -t virality-forecasting -f deployment/docker/Dockerfile .

Running the API

# Local development
python -m inference.api

# With Docker
docker run -p 8000:8000 virality-forecasting

# Production (Kubernetes)
kubectl apply -f deployment/kubernetes/deployment.yaml

Making Predictions

import httpx

response = httpx.post("http://localhost:8000/predict", json={
    "content_id": "post_123",
    "platform": "twitter",
    "created_at": "2024-01-15T10:30:00Z",
    "content": {
        "text": "Breaking news: Major announcement!",
        "has_media": True,
        "hashtags": ["breaking", "news"]
    },
    "author": {
        "author_id": "user_456",
        "follower_count": 50000,
        "verified": True
    }
})

print(response.json())
# {
#   "prediction": 1523.4,
#   "viral_probability": 0.73,
#   "uncertainty": {"epistemic": 120.5, "aleatoric": 85.2}
# }

Model Architecture

Baseline: LightGBM

  • Gradient boosting on tabular features
  • Fast training and inference
  • Strong feature importance

Temporal: Transformer

  • Self-attention for engagement trajectories
  • Captures temporal patterns
  • Variable-length sequence support

Graph: GNN

  • Graph Attention Networks (GAT)
  • Influence propagation modeling
  • User-content heterogeneous graphs

Feature Engineering

Online Features (real-time)

  • Engagement velocity at multiple windows
  • Sentiment volatility tracking
  • Time-of-day normalization

Offline Features (batch)

  • Historical baselines by cohort
  • Author performance statistics
  • Leakage-safe computation

Evaluation

  • Early Window: How early can we predict?
  • Calibration: Are uncertainties reliable?
  • Ranking: Top-K precision and NDCG
  • Delayed Labels: Production-realistic evaluation

Monitoring

  • Feature and prediction drift detection
  • Confidence collapse alerting
  • Multi-channel notifications (Slack, PagerDuty)

Configuration

See config/settings.yaml for all configuration options.

Development

# Run tests
pytest tests/ -v

# Lint
ruff check . && black --check .

# Type check
mypy . --ignore-missing-imports

License

MIT

About

A production-grade ML system for predicting content virality using graph neural networks, temporal modeling, and uncertainty quantification.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published