A production-grade ML system for predicting content virality using graph neural networks, temporal modeling, and uncertainty quantification.
virality-forecasting/
├── ingestion/ # Data collection and streaming
├── features/ # Feature engineering (online/offline)
├── graph/ # Graph construction and GNN models
├── models/ # ML models (baselines, temporal, uncertainty)
├── training/ # Training pipelines with MLflow
├── inference/ # FastAPI prediction service
├── evaluation/ # Metrics and evaluation framework
├── monitoring/ # Drift detection and alerting
└── deployment/ # Docker, K8s, CI/CD
- Multi-source Ingestion: Reddit, Twitter/X collectors with synthetic data generation
- Real-time Features: Engagement velocity, sentiment volatility, temporal normalization
- Graph-based Learning: Social graph construction, influence scoring, GNN models
- Uncertainty Quantification: MC Dropout, temperature scaling, calibration
- Production Ready: FastAPI service, Kubernetes deployment, comprehensive monitoring
# Clone and install
pip install -e ".[dev]"
# Or with Docker
docker build -t virality-forecasting -f deployment/docker/Dockerfile .# Local development
python -m inference.api
# With Docker
docker run -p 8000:8000 virality-forecasting
# Production (Kubernetes)
kubectl apply -f deployment/kubernetes/deployment.yamlimport httpx
response = httpx.post("http://localhost:8000/predict", json={
"content_id": "post_123",
"platform": "twitter",
"created_at": "2024-01-15T10:30:00Z",
"content": {
"text": "Breaking news: Major announcement!",
"has_media": True,
"hashtags": ["breaking", "news"]
},
"author": {
"author_id": "user_456",
"follower_count": 50000,
"verified": True
}
})
print(response.json())
# {
# "prediction": 1523.4,
# "viral_probability": 0.73,
# "uncertainty": {"epistemic": 120.5, "aleatoric": 85.2}
# }- Gradient boosting on tabular features
- Fast training and inference
- Strong feature importance
- Self-attention for engagement trajectories
- Captures temporal patterns
- Variable-length sequence support
- Graph Attention Networks (GAT)
- Influence propagation modeling
- User-content heterogeneous graphs
- Engagement velocity at multiple windows
- Sentiment volatility tracking
- Time-of-day normalization
- Historical baselines by cohort
- Author performance statistics
- Leakage-safe computation
- Early Window: How early can we predict?
- Calibration: Are uncertainties reliable?
- Ranking: Top-K precision and NDCG
- Delayed Labels: Production-realistic evaluation
- Feature and prediction drift detection
- Confidence collapse alerting
- Multi-channel notifications (Slack, PagerDuty)
See config/settings.yaml for all configuration options.
# Run tests
pytest tests/ -v
# Lint
ruff check . && black --check .
# Type check
mypy . --ignore-missing-importsMIT