Skip to content

Commit 437bbb7

Browse files
Wenyuehclaude
andcommitted
Fix docs to match package source; update blog post title and remove duplicate table
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent ca24db4 commit 437bbb7

5 files changed

Lines changed: 15 additions & 19 deletions

File tree

docs/api/results.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ Each evaluated combination produces a `ModelResult`:
4242
| `latency_seconds` | `float` | Mean latency per datapoint |
4343
| `input_tokens` | `Dict[str, int]` | Input tokens by model |
4444
| `output_tokens` | `Dict[str, int]` | Output tokens by model |
45-
| `estimated_price` | `float` | Estimated cost in USD |
45+
| `price` | `float` (property) | Per-sample cost in USD, or `None` if pricing unavailable |
4646
| `is_best` | `bool` | Whether this is the top-ranked combination |
4747
| `datapoint_results` | `List[DatapointResult]` | Per-datapoint breakdown |
4848

@@ -55,6 +55,7 @@ Per-datapoint evaluation detail:
5555
| Field | Type | Description |
5656
|:------|:-----|:------------|
5757
| `datapoint_index` | `int` | Index in the dataset |
58-
| `datapoint_id` | `str` | Unique identifier |
5958
| `score` | `float` | Eval score for this datapoint |
6059
| `latency_seconds` | `float` | Latency for this datapoint |
60+
| `input_tokens` | `Dict[str, int]` | Input tokens by model |
61+
| `output_tokens` | `Dict[str, int]` | Output tokens by model |

docs/api/selectors.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,6 @@ All selectors share a common constructor interface and the `select_best()` metho
1010
| `models` | `Dict[str, List]` | Maps node names to candidate model lists |
1111
| `eval_fn` | `Callable` | `(expected, actual) -> float` score in `[0, 1]` |
1212
| `dataset` | `List[Tuple]` | `[(input_data, expected_answer), ...]` |
13-
| `invoke_fn` | `Callable`, optional | Custom `(agent, input) -> result`. Default: `agent(input)` |
1413
| `model_prices` | `Dict`, optional | Custom pricing: `{"model": {"input_price": x, "output_price": y}}` |
1514
| `tracker` | `LLMTracker`, optional | Custom tracker instance (e.g., with disk cache) |
1615

docs/api/tracker.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,8 @@ from agentopt.proxy import LLMTracker
1212

1313
```python
1414
tracker = LLMTracker(
15-
cache=True, # Enable response caching (default: True)
16-
cache_dir="./llm_cache", # Persist cache to disk (default: None, memory-only)
15+
cache=True, # Enable response caching (default: True)
16+
cache_dir=".agentopt_cache", # Persist cache to disk (default: ".agentopt_cache")
1717
)
1818
```
1919

docs/blog/posts/technical-deep-dive.md

Lines changed: 2 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ categories:
1313
- Model Selection
1414
---
1515

16-
# Why Your Agent Needs a Model Optimizer, Not Just a Model
16+
# Why Your Agent Needs a Model Combo Optimizer, Not Just a Model
1717

1818
*Wenyue Hua\*, Qian Xie, Sripad Karne, Armaan Agrawal, Nikos Pagonas, Kostis Kaffes, Tianyi Peng\**
1919

@@ -183,13 +183,7 @@ Arm Elimination is the consistent winner: it achieves near-brute-force accuracy
183183

184184
### Budget Alternatives
185185

186-
For every benchmark, there exists a combination within 3-5% of the best accuracy that costs 10-100x less:
187-
188-
| Benchmark | Best | Accuracy | Cost | Budget Pick | Accuracy | Cost | Ratio |
189-
|-----------|------|---------|------|------------|---------|------|-------|
190-
| HotpotQA | Ministral + Opus | 74.8% | $2.71 | Qwen3 Next + gpt-oss-120b | 71.3% | $0.13 | 21x |
191-
| MathQA | Opus + Qwen3 Next | 98.8% | $5.89 | Ministral + C3 Haiku | 94.0% | $0.05 | 118x |
192-
| BFCL | Opus | 72.0% | $60.78 | Qwen3 Next | 71.0% | $1.87 | 32x |
186+
As shown in the cost-savings table above, for every benchmark there exists a combination within 3-5% of the best accuracy that costs 10-100x less. You don't need the most expensive model to get near-optimal results.
193187

194188
## Get Started
195189

docs/concepts/algorithms.md

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -121,15 +121,14 @@ selector = ArmEliminationModelSelector(
121121
models=models,
122122
eval_fn=eval_fn,
123123
dataset=dataset,
124-
n_initial=10,
125124
growth_factor=2.0,
126125
confidence=1.0,
127126
)
128127
```
129128

130129
| Parameter | Default | Description |
131130
|:----------|:--------|:------------|
132-
| `n_initial` | `10` | Initial batch size (datapoints) |
131+
| `n_initial` | `None` | Initial batch size. Default: 10% of dataset (`max(1, len(dataset)//10)`) |
133132
| `growth_factor` | `2.0` | Batch size multiplier per round |
134133
| `confidence` | `1.0` | Elimination confidence threshold |
135134

@@ -205,15 +204,18 @@ selector = LMProposalModelSelector(
205204
models=models,
206205
eval_fn=eval_fn,
207206
dataset=dataset,
208-
proposer_model="gpt-4o-mini",
209-
max_combinations=12,
207+
proposer_model="gpt-4.1",
208+
objective="maximize accuracy and then minimize latency and cost",
209+
dataset_preview_size=10,
210210
)
211211
```
212212

213213
| Parameter | Default | Description |
214214
|:----------|:--------|:------------|
215-
| `proposer_model` | `"gpt-4o-mini"` | Model used for proposal generation |
216-
| `max_combinations` | `12` | Max combinations to shortlist |
215+
| `proposer_model` | `"gpt-4.1"` | Model used for proposal generation |
216+
| `proposer_client` | `None` | Custom OpenAI-compatible client; auto-creates `OpenAI()` if omitted |
217+
| `objective` | `"maximize accuracy and then minimize latency and cost"` | Natural-language objective passed to the proposer |
218+
| `dataset_preview_size` | `10` | Number of dataset examples shown to the proposer |
217219

218220
!!! success "When to use"
219221
When you want to leverage an LLM's knowledge about model capabilities to skip obviously bad combinations.

0 commit comments

Comments
 (0)