RoBC is an online learning LLM router designed for dynamic production environments. Using Thompson Sampling and semantic clustering, it continuously adapts to changing model quality—no retraining required.
Production environments are dynamic:
- Model quality changes with updates and API changes
- New models are released frequently
- Retraining pipelines are costly and slow
- Historical training data becomes stale
RoBC adapts in real-time. Static routers don't.
- Online Learning: Improves with every request—no retraining pipeline needed
- Adapts to Drift: Automatically adjusts when model quality changes
- New Model Discovery: Explores and evaluates new models as they're added
- Contextual Routing: Learns which models excel at which task types
- Low Overhead: ~1ms routing decision
We evaluated RoBC against RoRF (static classifier) in a realistic dynamic scenario:
- Quality drift at t=2000 (model rankings change)
- New excellent model added at t=3000
| Phase | RoBC | RoRF | Winner |
|---|---|---|---|
| Cold Start (0-500) | 0.887 | 0.929 | RoRF ✓ (has training data) |
| Stable (500-2000) | 0.921 | 0.930 | ≈ Tie |
| Quality Drift (2000-3000) | 0.887 | 0.877 | RoBC ✓ (adapts) |
| New Model Added (3000-5000) | 0.923 | 0.876 | RoBC ✓ (+5.3%) |
- In static environments: RoRF performs equally well or slightly better (has perfect training data)
- When quality drifts: RoBC adapts, RoRF becomes stale
- When new models added: RoBC explores and uses them, RoRF cannot (+5.3%)
- No retraining required—RoBC adapts continuously
pip install robcgit clone https://github.com/Agentlify/RoBC
cd RoBC
pip install -e .from robc import Controller
import numpy as np
# Initialize with your models
controller = Controller(
models=["openai:gpt-5.2", "google:gemini-2.5-flash", "anthropic:claude-4.5-sonnet"],
n_clusters=10,
)
# Route a request (embedding from your embedding model)
embedding = get_embedding("What is the meaning of life?") # Your embedding function
selected_model = controller.route(embedding)
print(f"Selected: {selected_model}")
# Output: Selected: openai:gpt-5.2
# After getting response quality, update the router
controller.update(selected_model, embedding, quality_score=0.85)import numpy as np
from robc import Controller
# Load your cluster centroids
cluster_centroids = [np.load(f"cluster_{i}.npy") for i in range(10)]
controller = Controller(
models=["openai:gpt-5.2", "google:gemini-2.5-flash", "anthropic:claude-4.5-sonnet"],
cluster_centroids=cluster_centroids,
)
# Route with automatic cluster assignment
embedding = get_embedding("Write a Python function to sort a list")
selected = controller.route(embedding)# Get detailed information about the routing decision
result = controller.route_with_details(embedding)
print(f"Selected: {result['selected_model']}")
print(f"Cluster weights: {result['cluster_weights']}")
print(f"Model scores: {result['samples']}")RoBC consists of three main components:
- Cluster Manager: Assigns prompts to semantic clusters using k-nearest neighbors with softmax weighting
- Posterior Manager: Maintains Bayesian posteriors over model quality for each (model, cluster) pair
- Thompson Sampler: Selects models by sampling from posteriors, naturally balancing exploration and exploitation
RoBC is designed for dynamic production environments where:
| Scenario | RoBC Advantage |
|---|---|
| New models released | Discovers & evaluates instantly (+5.3% when new model added) |
| Model quality drifts | Adapts automatically (no retraining needed) |
| No training pipeline | Works out of the box |
| Continuous feedback | Improves with every request |
Consider static routers (RoRF) when:
- You have comprehensive, up-to-date training data
- Model rankings are stable
- You have a retraining pipeline in place
- You rarely add new models
from robc.cluster import ClusterConfig
config = ClusterConfig(
n_neighbors=2, # k for kNN cluster assignment
softmax_temperature=0.2, # Temperature for weight softmax
min_similarity_threshold=0.5, # Minimum similarity to include cluster
high_confidence_threshold=0.9, # Skip kNN if confidence is high
)from robc.thompson_sampling import SamplerConfig
config = SamplerConfig(
exploration_bonus=0.02, # Bonus for underexplored models
min_variance=0.001, # Minimum sampling variance
)# Save learned posteriors
controller.save_posteriors("posteriors.json")
# Load posteriors in a new session
controller = Controller(
models=["openai:gpt-5.2", "google:gemini-2.5-flash"],
n_clusters=10,
posteriors_path="posteriors.json",
)| Method | Description |
|---|---|
route(embedding) |
Select the best model for the given embedding |
route_with_details(embedding) |
Route with detailed selection information |
update(model, embedding, quality_score) |
Update posteriors with observed quality |
add_model(model) |
Add a new model with uninformative priors |
save_posteriors(path) |
Save learned posteriors to file |
load_posteriors(path) |
Load posteriors from file |
get_stats() |
Get routing statistics |
| Method | Description |
|---|---|
get_cluster_weights(embedding) |
Get weighted cluster assignments |
get_primary_cluster(embedding) |
Get single best cluster |
from_centroids(centroids) |
Create from centroid embeddings |
| Method | Description |
|---|---|
get_posterior(model, cluster_id) |
Get posterior for (model, cluster) |
update(model, cluster_id, outcome) |
Update with observation |
get_aggregated_posterior(model, weights) |
Get weighted posterior |
RoRF (Router on Random Forest) is the industry-leading open-source LLM router developed by Not Diamond. It uses a Random Forest classifier trained on historical data to route requests based on embeddings and complexity features. RoRF represents the state-of-the-art in static routing and serves as the natural benchmark for evaluating RoBC's online learning approach.
| Feature | RoBC | RoRF | Impact |
|---|---|---|---|
| Static Quality | Learns online | Trained offline | ≈ Equal performance |
| Quality Drift | Adapts in real-time | Becomes outdated | RoBC adapts |
| New Models | Explores immediately | Can't use them | +5.3% advantage |
| Cold Start | Explores (slower start) | Has training data | RoRF faster initially |
| Retraining | Never needed | Required regularly | Lower operational cost |
| Best For | Dynamic production | Static environments |
┌────────────────────────────────────────────────────────────────────────┐
│ In STATIC environments: RoBC ≈ RoRF (RoRF slightly better with │
│ perfect training data) │
│ │
│ In DYNAMIC environments: RoBC wins (adapts to drift, uses new models) │
│ │
│ Choose based on your environment's dynamics. │
└────────────────────────────────────────────────────────────────────────┘
RoRF excels when: You have comprehensive historical data and model quality is stable.
RoBC excels when: Model quality changes, new models are added, or retraining is impractical.
Contributions are welcome! Please feel free to submit a Pull Request.
If you use RoBC in your research, please cite:
@software{robc2026,
title = {RoBC: Routing on Bayesian Clustering},
author = {Agentlify},
year = {2026},
url = {https://github.com/agentlifylabs/RoBC}
}Licensed under the Apache License, Version 2.0.


