Should-cost modeling, OCOGS tracking, DoWhy causal inference, CUPED A/B testing, make-vs-buy analysis, and price elasticity for 500 SKUs across 12 manufacturing plants with 5 suppliers in 7 countries
GlowCast is a Cost & Commercial Analytics platform managing 500 SKUs across 12 manufacturing plants in 7 countries (CN, TW, DE, US, MX, IN, JP) with 5 suppliers and 5 commodity groups (Steel, Copper, Resin, Aluminum, Silicon). The platform performs should-cost BOM decomposition (raw material + labor + overhead + logistics + tariff), OCOGS variance tracking (actual vs. budget with trend analysis), a cost reduction engine (8 action types with causal effect estimation), make-vs-buy multi-criteria analysis (cost, quality, lead time, strategic weighting), and price elasticity estimation (log-log OLS regression). Key technical highlights: DoWhy causal inference identifies significant cost drivers through a 4-step workflow (model, identify, estimate, refute), and CUPED variance reduction lowers A/B test sample size by 55% (rho = 0.74). X-Learner handles 20/80 treatment imbalance for cost-reduction uplift targeting with AUUC 0.74.
┌─────────────────────────────────────────────────────────────────────┐
│ GlowCast Platform │
├──────────────┬──────────────┬──────────────┬───────────────────────┤
│ Data Layer │ Analytics │ Cost Modules │ Operations │
├──────────────┼──────────────┼──────────────┼───────────────────────┤
│ Star Schema │ SQL Pipes(4) │ Should-Cost │ Feature Store │
│ 9 tables │ Cost Variance│ OCOGS │ Drift Monitor │
│ Pandera │ Should-Cost │ Cost │ MLflow Tracker │
│ contracts │ Gap │ Reduction │ Retrain Trigger │
│ │ Supplier │ Make-vs-Buy │ │
│ CostDataGen │ Perf │ Price │ │
│ 500 SKUs │ Cost Anomaly │ Elasticity │ │
│ 12 Plants │ │ │ │
├──────────────┼──────────────┼──────────────┼───────────────────────┤
│ Segment │Experimentation│ Causal │ Explainability │
│ Genes (10) │ CUPED │ DoWhy 4-step │ SHAP + LIME │
│ 5 Categories │ Sequential │ X-Learner │ Fairness (KW/Chi2) │
│ 2 Cost Tiers │ Interleaving │ Uplift (4) │ │
│ 5 Commodities│ Power/SRM │ Causal Forest│ │
│ 5 Suppliers │ Bucketing │ │ │
└──────────────┴──────────────┴──────────────┴───────────────────────┘
| Module | Metric | Value | Description |
|---|---|---|---|
| Should-Cost | Gap identification rate | >90% | Flags SKUs with >10% cost gap vs. BOM target |
| OCOGS Tracker | Variance detection | 5% threshold | Monthly actual-vs-budget variance alerting |
| Cost Reduction | Savings estimate accuracy | 80% realization | Projected vs. realized savings alignment |
| Make-vs-Buy | Recommendation accuracy | Multi-criteria | Weighted composite (cost 35%, quality 30%, lead time 20%, strategic 15%) |
| Price Elasticity | Elasticity estimation | Log-log OLS | p < 0.05 significance threshold with CI |
| MDE | n_raw (per group) | n_CUPED (per group) | Reduction |
|---|---|---|---|
| 3% | 42,000 | 18,900 | -55% |
| 5% | 15,200 | 6,840 | -55% |
| 10% | 3,800 | 1,710 | -55% |
CUPED correlation: rho = 0.74, variance reduction = 55%, bootstrap CI [0.42, 0.48]
| Learner | AUUC | 95% CI | vs Random |
|---|---|---|---|
| Random | 0.50 | [0.48, 0.52] | — |
| S-Learner | 0.62 | [0.59, 0.65] | +0.12 |
| T-Learner | 0.68 | [0.64, 0.72] | +0.18 |
| X-Learner | 0.74 | [0.71, 0.77] | +0.24 |
| Causal Forest | 0.71 | [0.68, 0.74] | +0.21 |
Treatment/control: 20/80 (X-Learner wins due to cross-estimation on imbalanced data)
| Metric | Before | After | Delta |
|---|---|---|---|
| Cost driver identification | Manual review | DoWhy causal (automated) | Systematic |
| A/B sample size | 15,200/group | 6,840/group (CUPED) | -55% |
| Cost reduction targeting | Uniform actions | Uplift-targeted (X-Learner) | Precision |
| Supplier risk assessment | Spreadsheet-based | Multi-criteria scoring | Quantified |
GlowCast includes a 5-page Streamlit dashboard for real-time KPI monitoring and visual analytics:
| Page | Description |
|---|---|
| Executive Overview | Top-level KPIs, platform data flow, business impact table, cost segment evaluation, uplift comparison, fairness heatmap |
| Should-Cost & OCOGS | BOM decomposition breakdown, should-cost gap analysis, OCOGS variance trends, budget vs. actual tracking, cost element waterfall |
| Cost Reduction & Make-vs-Buy | Reduction action effectiveness, make-vs-buy recommendations, supplier quote comparison, breakeven volume analysis |
| Causal & Experimentation | DoWhy ATE with CI, refutation tests, uplift curves (4 meta-learners), CUPED variance reduction gauge, sequential testing (mSPRT) |
| MLOps & Quality | Drift timeline (KS/PSI), SHAP vs LIME feature importance, fairness by segment, retrain decision flow |
# Launch dashboard
streamlit run app/dashboard/dashboard.py# One-command launch
docker compose build && docker compose up -d
# Or manual setup
pip install -e ".[dev]"
python -m app.data.data_generator --validate-only
pytest tests/ -v# Quick (50 SKUs, 90 days)
python -m app.data.data_generator --n-skus 50 --n-days 90
# Full (500 SKUs, 1095 days)
python -m app.data.data_generator
# Validate schemas only
python -m app.data.data_generator --validate-onlyData follows a star schema design with 9 Pandera-validated tables, supporting 10 cost-behavior segments (5 categories x 2 cost tiers) across 12 manufacturing plants with 5 suppliers and 5 commodity groups.
Domain-specific properties:
- Cost categories — RawMaterials, Components, Packaging, Labor, Overhead with Direct/Indirect tiers
- Commodity price modeling — Steel, Copper, Resin, Aluminum, Silicon with seasonal amplitude and volatility
- Supplier profiles — 5 suppliers across CN/TW/DE/US/IN with quality, on-time, lead-time, and price premium attributes
- Plant geography — 12 plants across 7 countries with local labor rates and overhead allocations
- Treatment imbalance — 20/80 treatment/control split for cost-reduction experiments
Decomposes product cost into 5 constituent elements and benchmarks against should-cost targets:
- raw_material — commodity-linked material cost based on BOM and market prices
- labor — plant-specific labor content based on geography and process complexity
- overhead — allocated overhead based on plant utilization and capacity
- logistics — transportation and warehousing based on origin/destination
- tariff — duty and import costs based on supplier country and trade agreements
Gap analysis flags SKUs where actual cost exceeds should-cost by more than the configured threshold (default 10%).
4-step DoWhy workflow (model, identify, estimate, refute) combined with uplift modeling:
- Treatment:
cost_reduction_action— whether a cost reduction intervention was applied - Outcome:
unit_cost_change— observed change in per-unit cost - X-Learner handles 20/80 treatment imbalance via cross-estimation (AUUC 0.74)
- Causal Forest provides heterogeneous treatment effect estimation (AUUC 0.71)
- Propensity-weighted combination routes more weight to the control-imputed estimate
- CUPED variance reduction: rho=0.74, 55% sample size reduction
- mSPRT always-valid p-values for continuous monitoring
- Team Draft interleaving for ranking comparison
- SHA-256 hash bucketing with SRM detection
Decision: X-Learner for uplift modeling with 20/80 treatment imbalance. Why: Cross-estimation achieves AUUC 0.74 vs T-Learner's 0.68. Propensity-weighted combination routes more weight to the control-imputed estimate trained on the larger 80% arm. Rejected: T-Learner (equal arm assumption fails at 20/80).
Decision: Eventual consistency (24h TTL) prioritizes availability over strong consistency. Why: A/B test showed no significant accuracy difference between 1-hour and 24-hour fresh features (p=0.82). Rejected: Strong consistency (CP) — adds complexity with negligible accuracy gain.
GlowCast/
├── app/
│ ├── settings.py # YAML config loader (@lru_cache)
│ ├── logging.py # structlog setup
│ ├── seed.py # Global seed (42)
│ ├── dashboard/
│ │ ├── dashboard.py # Streamlit entry point (5-page SPA)
│ │ ├── data.py # Standalone data simulator (all KPIs)
│ │ └── views/ # Page modules
│ │ ├── overview.py # Executive Overview
│ │ ├── cost_analytics.py # Should-Cost & OCOGS
│ │ ├── cost_operations.py # Cost Reduction & Make-vs-Buy
│ │ ├── causal.py # Causal & Experimentation (DoWhy, uplift, CUPED)
│ │ └── mlops.py # MLOps & Quality (drift, SHAP, fairness, retrain)
│ ├── cost/
│ │ ├── should_cost.py # BOM decomposition & gap analysis
│ │ ├── ocogs_tracker.py # Actual vs. budget variance tracking
│ │ ├── cost_reduction.py # 8 action types, causal effect estimation
│ │ ├── make_vs_buy.py # Multi-criteria make-vs-buy analysis
│ │ └── price_elasticity.py # Log-log OLS elasticity estimation
│ ├── data/
│ │ ├── segment_genes.py # 10 segments, 5 commodities, 5 suppliers
│ │ ├── star_schema.py # 9 Pandera schemas
│ │ ├── data_generator.py # CostDataGenerator (500 SKUs, 12 plants)
│ │ └── contracts.py # Data contract schemas
│ ├── sql/
│ │ ├── executor.py # SQLite pipeline runner
│ │ ├── dos_woc.sql # Cost Variance Analysis (Plant x Category)
│ │ ├── scrap_risk.sql # Should-Cost Gap Analysis
│ │ ├── cross_zone_penalty.sql # Supplier Performance Analysis
│ │ └── demand_anomaly.sql # Cost Anomaly Detection (Z-score)
│ ├── experimentation/
│ │ ├── cuped.py # CUPED variance reduction
│ │ ├── sequential.py # mSPRT always-valid p-values
│ │ ├── interleaving.py # Team Draft interleaving
│ │ ├── power.py # Sample size / MDE tables
│ │ └── bucketing.py # SHA-256 hash bucketing + SRM
│ ├── causal/
│ │ ├── dowhy_pipeline.py # 4-step DoWhy workflow
│ │ └── uplift.py # S/T/X-Learner + Causal Forest
│ ├── mlops/
│ │ ├── feature_store.py # Offline/online dual-mode
│ │ ├── drift_monitor.py # KS + PSI + cost metric drift
│ │ ├── mlflow_tracker.py # Experiment tracking
│ │ └── retrain_trigger.py
│ ├── explain/
│ │ ├── shap_lime.py # SHAP TreeExplainer + LIME
│ │ └── fairness.py # KW / Chi2 fairness tests
│ └── api/
│ └── main.py # FastAPI REST API (cost endpoints)
├── configs/
│ └── glowcast.yaml # All configuration
├── tests/ # 159+ tests (24 files)
├── docs/
│ ├── adr/ # Architecture Decision Records
│ ├── model_card.md # Mitchell et al. FAT* 2019
│ ├── failure_modes.md # Degradation analysis
│ ├── reproducibility.md # NeurIPS 2019 checklist
│ └── latency_budget.md
├── pyproject.toml
├── Dockerfile
├── docker-compose.yml
└── CLAUDE.md
| # | Limitation | Root Cause | Planned Improvement |
|---|---|---|---|
| 1 | Synthetic data only | No real procurement/manufacturing data available | Partner with manufacturers for anonymized validation data |
| 2 | Commodity price simulation uses simple seasonal + noise | Real commodity markets have complex dynamics (geopolitical, supply shocks) | Integrate real-time commodity price feeds (e.g., LME, COMEX) |
| 3 | Tariff/duty rates are static per country | Trade policies change with regulations and agreements | Add dynamic tariff schedule lookups |
| 4 | Make-vs-buy weights are configurable but not learned | Optimal weighting varies by product category and strategic context | Learn weights from historical make-vs-buy outcome data |
| 5 | Price elasticity assumes log-linear relationship | Some products exhibit kinked or nonlinear demand curves | Add piecewise regression and nonparametric alternatives |
159+ tests across 24 test files:
pytest tests/ -v # Full test suite
ruff check app/ tests/ # Lint| Layer | Technologies |
|---|---|
| Language | Python 3.11+ |
| Cost Analytics | Should-Cost BOM, OCOGS variance, cost reduction engine, make-vs-buy, price elasticity (log-log OLS) |
| Causal Inference | DoWhy, CausalML, EconML (X-Learner, Causal Forest) |
| Experimentation | CUPED, mSPRT sequential testing, Team Draft interleaving |
| MLOps | MLflow, Evidently (drift), Pandera (contracts), structlog |
| Explainability | SHAP, LIME, fairness tests (Kruskal-Wallis, Chi-squared) |
| Data | Star schema (9 tables), SQL analytics pipelines (4), SQLite |
| Dashboard | Streamlit, Plotly (waterfall, heatmaps, gauges) |
| Infrastructure | Docker, PyYAML config, pyproject.toml (PEP 621) |
| Testing | pytest, Hypothesis (property-based), ruff, mypy |
- Deng, A., Xu, Y., Kohavi, R., & Walker, T. (2013). Improving the Sensitivity of Online Controlled Experiments by Utilizing Pre-Experiment Data. WSDM 2013. (CUPED)
- Kunzel, S. R., Sekhon, J. S., Bickel, P. J., & Yu, B. (2019). Metalearners for Estimating Heterogeneous Treatment Effects. PNAS, 116(10). (X-Learner)
- Sharma, A., & Kiciman, E. (2020). DoWhy: An End-to-End Library for Causal Inference. arXiv:2011.04216.
- Johari, R., Pekelis, L., & Walsh, D. (2017). Always Valid Inference: Continuous Monitoring of A/B Tests. Operations Research. (Sequential Testing)
- Chapelle, O., Joachims, T., Radlinski, F., & Yue, Y. (2012). Large-Scale Validation and Analysis of Interleaved Search Evaluation. TOIS. (Interleaving)
- Pineau, J., et al. (2019). The Machine Learning Reproducibility Checklist. NeurIPS.
- Pearl, J. (2009). Causality: Models, Reasoning, and Inference. Cambridge University Press.
- Mitchell, M., et al. (2019). Model Cards for Model Reporting. FAT* 2019.
- Ellram, L. M. (1995). Total Cost of Ownership: An Analysis Approach for Purchasing. International Journal of Physical Distribution & Logistics Management. (Should-Cost / TCO)
- Monczka, R. M., Handfield, R. B., Giunipero, L. C., & Patterson, J. L. (2015). Purchasing and Supply Chain Management. Cengage Learning. (OCOGS / Make-vs-Buy)
GlowCast includes production-grade deployment infrastructure spanning three maturity phases.
├── app/api/ # FastAPI API layer
│ ├── __init__.py
│ └── main.py # health, metrics, pipelines, cost endpoints
├── k8s/ # Kubernetes manifests
│ ├── namespace.yaml
│ ├── configmap.yaml
│ ├── secret.yaml
│ ├── api-deployment.yaml # 2 replicas, FastAPI on :8000
│ ├── dashboard-deployment.yaml # 1 replica, Streamlit on :8501
│ ├── api-service.yaml
│ ├── dashboard-service.yaml
│ ├── hpa.yaml # API: 2-8 pods, CPU 70%
│ ├── ingress.yaml # / → dashboard, /api → api
│ ├── postgres.yaml # PostgreSQL 16, 2Gi PVC
│ ├── redis.yaml # Redis 7
│ └── canary/ # Istio + Flagger (custom cost/drift metrics)
├── helm/glowcast/ # Helm chart
│ ├── Chart.yaml
│ ├── values.yaml # api/dashboard/postgresql/redis
│ └── templates/ # 9 templated manifests
├── serving/ # BentoML model serving
│ ├── bentofile.yaml
│ └── service.py # cost_analysis / uplift_predict / detect_drift
├── monitoring/ # Observability stack
│ ├── prometheus.yml # + K8s service discovery
│ ├── docker-compose.monitoring.yaml
│ └── grafana/ # 25-panel dashboard
├── pipelines/ # Airflow orchestration
│ ├── dags/
│ │ ├── glowcast_training.py # ML training pipeline
│ │ ├── glowcast_experimentation.py # CUPED/mSPRT experiments
│ │ └── glowcast_monitoring.py # 6-hourly drift detection
│ └── docker-compose.airflow.yaml
├── mlflow/ # Model registry (+ MinIO artifacts)
│ └── docker-compose.mlflow.yaml
├── terraform/ # AWS infrastructure as code
│ ├── main.tf # VPC + EKS + RDS + ElastiCache + S3
│ ├── variables.tf
│ ├── outputs.tf
│ ├── modules/ # eks / rds / redis / s3
│ └── environments/ # dev / prod (SOC2 tags)
├── loadtests/ # Performance testing
│ ├── k6_api.js # 3 scenarios: sustained/ramp/spike
│ └── slo.yaml # 8 SLOs including AUUC > 0.70
└── data_quality/ # Great Expectations
├── great_expectations.yml
├── expectations/ # cost_transactions / product_data / supplier_data
├── checkpoints/
└── validate.py # Lightweight engine (no GX dependency required)
| Component | Technology | Details |
|---|---|---|
| API Layer | FastAPI | 9 endpoints: health, metrics, pipelines, should-cost, variance, make-vs-buy, reduction, elasticity, drift status |
| Container Orchestration | Kubernetes | API (2 replicas) + Streamlit dashboard (1 replica), health probes |
| Helm Chart | Helm v3 | Parameterized: api, dashboard, postgresql, redis |
| Model Serving | BentoML | 3 endpoints: cost_analysis (should-cost), uplift_predict (X-Learner), detect_drift |
| Database | PostgreSQL 16 | StatefulSet with 2Gi persistent volume |
| Cache | Redis 7 | Feature store online serving |
| Secrets | K8s Secrets | API keys, database URLs |
| Component | Technology | Details |
|---|---|---|
| Model Registry | MLflow + MinIO | Extends existing mlflow_tracker.py with registry workflow |
| Metrics | Prometheus | 16 custom metrics (glowcast_*): cost variance, uplift AUUC, drift, data quality, CUPED |
| Dashboards | Grafana | 25 panels: cost analytics quality, uplift, CUPED variance reduction, drift, pipeline, API |
| Canary Deployment | Istio + Flagger | Custom metric templates for cost accuracy and drift detection |
| Pipeline Orchestration | Apache Airflow | 3 DAGs: training, experimentation (CUPED/mSPRT), monitoring (6-hourly) |
| Component | Technology | Details |
|---|---|---|
| Infrastructure as Code | Terraform | AWS: VPC, EKS, RDS, ElastiCache, S3 (prod with SOC2 tags) |
| Access Control | RBAC Middleware | 3 roles (Viewer/Analyst/Admin), 5 permissions |
| Audit Trail | Audit Logger | NDJSON file + in-memory buffer, structured logging |
| Load Testing | k6 | 3 scenarios (sustained/ramp/spike), P95 < 500ms |
| SLO | YAML definitions | 8 SLOs: availability 99.9%, cost gap detection > 90%, AUUC > 0.70, data quality > 95% |
| Data Quality | Great Expectations | cost_transactions (14) + product_data (13) + supplier_data (12) expectations |
# Core services (API + dashboard + PostgreSQL + Redis)
docker compose up -d
# Monitoring (Prometheus + Grafana)
docker compose -f monitoring/docker-compose.monitoring.yaml up -d
# → Grafana: http://localhost:3000 (admin/changeme)
# MLflow Model Registry
docker compose -f mlflow/docker-compose.mlflow.yaml up -d
# → MLflow: http://localhost:5000
# Airflow Pipeline Orchestration
docker compose -f pipelines/docker-compose.airflow.yaml up -d
# → Airflow: http://localhost:8080 (admin/changeme)
# Kubernetes (local)
minikube start
kubectl apply -f k8s/
# Or with Helm:
helm install glowcast helm/glowcast/
# Load Testing
k6 run loadtests/k6_api.js
# Data Quality Validation
python data_quality/validate.py# Dev environment
cd terraform/environments/dev
terraform init && terraform plan && terraform apply
# Prod environment (SOC2 tags, multi-AZ)
cd terraform/environments/prod
terraform init && terraform plan && terraform applyMIT
DoWhy Causal · CUPED -55% · X-Learner AUUC 0.74 · Should-Cost BOM
Built with statistical rigor. Designed for cost & commercial intelligence.