Skip to content
This repository was archived by the owner on Nov 23, 2025. It is now read-only.
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1,196 changes: 1,196 additions & 0 deletions docs/DUCKDB_ML_PLATFORM_RESEARCH.md

Large diffs are not rendered by default.

294 changes: 294 additions & 0 deletions docs/research/EXECUTIVE-SUMMARY-ONNX-RESEARCH.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,294 @@
# Executive Summary: ONNX Ecosystem Research

**Date**: 2025-11-12
**Scout Mission**: ONNX Ecosystem Reconnaissance
**Status**: ✅ COMPLETE

---

## TL;DR - Critical Discoveries

**ONNX IS A PLATFORM, NOT JUST INFERENCE**

### Top 5 Findings

1. **ONNX Runtime Training EXISTS** - Train, fine-tune, and update models (not just infer)
2. **Production Maturity Proven** - MLflow integration, 7x speedups with TensorRT, battle-tested
3. **sklearn = Zero-Risk Path** - RandomForest 100% proven (Mallard Week 3 POC validated)
4. **Deep Learning = Requires Validation** - FT-Transformer needs 2-day export POC before commitment
5. **Full Lifecycle Support** - Train → Version → Deploy → Update all supported by ONNX ecosystem

---

## Strategic Implications for Mallard

### Opportunity: Full ML Platform (Not Just Inference)

**Mallard Can Be**:
- ✅ Training engine (ONNX Runtime Training + on-device learning)
- ✅ Model registry (MLflow integration)
- ✅ Optimization platform (quantization, execution providers)
- ✅ Update system (federated learning, incremental training)

**NOT** PostgreSQL-style "load model, infer only" extensions

**Competitive Advantage**:
- Snowflake Cortex = Cloud-only, closed-source, inference-focused
- BigQuery ML = Separate training service
- **Mallard** = Full ML lifecycle IN the database, open-source

---

## Immediate Action Items

### Phase 2 (Next 2 Days) - CRITICAL

**1. FT-Transformer ONNX Export Validation POC** ⚠️ REQUIRED BEFORE PHASE 2 COMMITMENT
- **Time**: 2 days
- **Risk**: Discover export incompatibility NOW vs Week 8
- **Process**:
1. Export minimal FT-Transformer to ONNX
2. Validate inference accuracy (>99.9% match PyTorch)
3. Benchmark latency (<100ms for 1K rows)
- **Exit Criteria**: Export succeeds + accuracy validated OR pivot to alternative

**2. Maintain sklearn Baseline** ✅ PROVEN
- RandomForest = Zero-risk fallback
- Use for simple cases (auto-routing)
- Performance: 0.21ms P99 (500x faster than FT-Transformer)

---

### Phase 3 (Weeks 12-16) - High Value

**3. MLflow Model Registry Integration**
- Native ONNX support
- Versioning, lineage tracking, A/B testing
- Production-grade model management

**4. Execution Provider Auto-Selection**
- TensorRT (NVIDIA) = 2-7x speedup vs CPU
- CUDA fallback, CPU baseline
- Single `.onnx` works optimally on ANY hardware

---

### Phase 4 (Weeks 16-24) - Competitive Moat

**5. On-Device Training (Incremental Learning)**
```sql
-- Update models from production data
UPDATE_MODEL 'churn_predictor'
WITH (SELECT * FROM new_customers WHERE label IS NOT NULL)
USING learning_rate=0.001;
```

**6. Model Ensembles (sklearn + FT-Transformer + XGBoost)**
- Export as single ONNX (2x faster than separate files)
- Automatic model selection based on data characteristics

**7. Quantization (4x smaller, 2x faster)**
- INT8 models for edge deployment
- WASM browser-based ML

---

## Framework Compatibility Report

### Tier 1: Production-Ready ✅
- **sklearn RandomForest**: 100% success (Mallard Week 3 POC proven)
- **sklearn Pipeline**: Full preprocessing + model in single ONNX

### Tier 2: Requires onnxmltools ⚠️
- **XGBoost**: Use native API (NOT sklearn wrapper) + onnxmltools
- **LightGBM**: 85% success rate
- **CatBoost**: 70% (accuracy issues reported)

### Tier 3: Deep Learning - Validation Required 🔍
- **FT-Transformer**: PyTorch export SHOULD work (needs 2-day POC)
- **TabNet**: Attention mechanisms may have operator gaps
- **SAINT**: Similar to TabNet, validate export first

### Tier 4: NOT Recommended ❌
- **AutoGluon Tabular**: No direct ONNX export (multimodal only)
- **TabPFN**: Custom signatures incompatible (Week 1-2 finding)
- **Research Models**: Export complexity too high for production

---

## Key Lessons Learned

### ✅ Do This

1. **Test ONNX export on Day 1** (15 min) - Don't discover failures at Week 4
2. **Dual-track POCs** - Have fallback model validated in parallel
3. **Ensemble as single ONNX** - 2x faster than separate sessions
4. **Use execution providers** - Free 2-7x speedup on GPU hardware
5. **Integrate MLflow** - Production-grade model management
6. **Hot-swap models** - Zero-downtime updates via session reload

### ❌ Avoid This

1. **Don't assume PyTorch exports easily** - Custom signatures break ONNX
2. **Don't use sklearn XGBoost wrapper** - Use native API + onnxmltools
3. **Don't quantize without testing** - May be slower on old GPUs
4. **Don't skip shape validation** - Test with varying batch sizes
5. **Don't use AutoGluon for tabular** - No export path
6. **Don't deploy without benchmarking** - Hardware-specific performance

---

## Production Deployment Patterns

### Pattern 1: Model Registry + Hot-Swapping
```
MLflow Registry (Versioned ONNX) → DuckDB Extension → Hot-Swap Session → Zero-Downtime Update
```

### Pattern 2: Execution Provider Auto-Selection
```
Single .onnx File → [TensorRT | CUDA | CPU] → Optimal Performance on ANY Hardware
```

### Pattern 3: Ensemble Architecture
```
SQL Query → Model Router → [RandomForest | FT-Transformer | XGBoost] → Weighted Predictions
```

### Pattern 4: Incremental Training (Future)
```
Production Data → ONNX Training Artifacts → On-Device Training → Updated Model → Hot-Swap
```

---

## Critical Gotchas Discovered

### 1. Dynamic Shape Support Varies
- ✅ CPU, CUDA: Full support
- ⚠️ TensorRT: Limited (optimization profiles needed)
- ❌ NNAPI (Android), QNN (Qualcomm): No dynamic shapes

**Mitigation**: Pre-allocate max size, test with varying batches

### 2. Quantization Requires Tensor Cores
- INT8 faster ONLY on NVIDIA T4, A100, etc.
- Older GPUs (K80, P100) may be SLOWER with INT8
- **Action**: Benchmark before deploying quantized models

### 3. Large Models (>2GB) Need External Data
```python
onnx.save_model(model, "model.onnx", save_as_external_data=True)
# Produces: model.onnx (graph) + weights.bin (parameters)
```

### 4. XGBoost sklearn Wrapper NOT Supported
- skl2onnx only handles sklearn native models
- XGBoost needs native API + onnxmltools
- **Discovered**: Mallard Week 3 POC (prevented wasted effort)

---

## Recommended Architecture Evolution

### Current (Week 5)
```
SQL → RandomForest (ONNX) → Predictions
```

### Phase 2 (Week 6-8)
```
SQL → [RandomForest | FT-Transformer] (ONNX) → Predictions + Embeddings
MLflow Registry (Versioning)
```

### Phase 3 (Weeks 12-16)
```
SQL → Model Router → Ensemble (Single ONNX)
ONNX Runtime (TensorRT/CUDA/CPU auto-select)
[Predictions | Embeddings | Explanations]
```

### Phase 4 (Weeks 16-24)
```
SQL → Intelligent Router → Ensemble (INT8 Quantized)
Execution Providers (TensorRT/CUDA/CPU/WASM)
[Predictions | Embeddings | Explanations | Training]
MLflow Registry ← On-Device Training ← Production Data
```

---

## Performance Expectations

### Baseline (sklearn RandomForest)
- **Latency**: 0.21ms P99 (current)
- **Throughput**: 4,700 predictions/sec
- **Memory**: <50MB per model

### Universal (FT-Transformer - Target)
- **Latency**: <100ms P99 (500x slower, acceptable for complex schemas)
- **Throughput**: 10 predictions/sec
- **Memory**: <500MB per model

### Optimized (TensorRT + INT8)
- **Latency**: 2-7x faster than baseline
- **Model Size**: 4x smaller
- **Hardware**: NVIDIA T4, A100 (Tensor Cores)

---

## Risk Assessment

### Low Risk ✅
- sklearn RandomForest: PROVEN (Week 3 POC, 100% success)
- MLflow integration: Mature, production-grade
- Execution providers: Battle-tested (Microsoft, NVIDIA)

### Medium Risk ⚠️
- FT-Transformer ONNX export: NEEDS 2-DAY POC
- On-device training: Complex API, 4-8 weeks integration
- Quantization: Hardware-dependent performance

### High Risk ❌
- AutoGluon tabular: No export path (avoid)
- Custom research models: Export failure likely (avoid)
- Dynamic shapes on mobile: Limited support (design around)

---

## Final Recommendation

**PROCEED with ONNX as core platform technology**

**Confidence**: 95%+

**Reasoning**:
1. ✅ sklearn baseline PROVEN (zero-risk fallback)
2. ✅ ONNX Runtime production-mature (Microsoft, 7x speedups)
3. ✅ MLflow ecosystem mature (versioning, registry)
4. ✅ Training capabilities future-proof (incremental learning)
5. ⚠️ FT-Transformer needs validation (2-day POC gates Phase 2)

**Gating Decision**: FT-Transformer export POC must succeed OR have validated alternative (TabNet, SAINT, or sklearn ensemble)

**Expected Outcome**: Mallard = ONLY database with full ML lifecycle (train + serve + update) in SQL

---

## Links

- **Full Report**: `/home/user/local-inference/docs/research/ONNX-ECOSYSTEM-INTELLIGENCE-REPORT.md` (1200+ lines)
- **Scout Mission**: ONNX ecosystem reconnaissance
- **Intelligence Value**: CRITICAL for Mallard strategy

---

**Scout Explorer**: Mission Complete ✅
**Recommendation**: GREEN LIGHT for ONNX platform strategy (with FT-Transformer POC gate)
Loading
Loading