Skip to content

Commit b428ca5

Browse files
QianJaneXieQian Xie
andauthored
Improve explanation of model selection (#43)
* improve "how it works" * update epsilon for accuracy in [0, 1] * update recommended parameters for model selectors * clarify that "auto" is currently wired to arm_elimination * Mention that "epsilon_lucb" can be a good choice if we want more evlauation savings --------- Co-authored-by: Qian Xie <90579251+JaneQianXie@users.noreply.github.com>
1 parent b98ecff commit b428ca5

3 files changed

Lines changed: 22 additions & 15 deletions

File tree

README.md

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,7 @@ for combo in all_combinations(models): # e.g. {"planner": "gpt-4o", "solve
9191
# rank combos by quality score, latency & cost
9292
```
9393

94-
But AgentOpt does this efficiently with **smart algorithms, parallelization, cost & latency tracking, and caching**. With `method="auto"` (the default), it eliminates clearly worse combinations after just a few datapoints — finding the best model combination with far fewer API calls.
94+
But AgentOpt does this efficiently with **smart algorithms, parallelization, cost & latency tracking, and caching**. With `method="auto"` (the default), it **automatically** homes in on the best combination (wired to `arm_elimination` — strong best-arm identification with far fewer evaluations than `brute_force`), eliminating clearly worse combinations after just a few datapoints.
9595

9696
You just provide four things:
9797

@@ -165,23 +165,25 @@ AgentOpt works with any LLM framework that uses `httpx` under the hood. Here we
165165

166166
AgentOpt includes a rich set of selection algorithms. Advanced users may get significant speedups by choosing the right method for their use case. See the [documentation](https://agentoptimizer.github.io/agentopt/) and [advanced_selection_example.py](examples/advanced_selection_example.py) for details.
167167

168+
If you do not need the strict best model combination and want **more evaluation savings**, `epsilon_lucb` is often a good choice: it stops once an **ε-optimal** arm is found (tune `epsilon` to trade off how close to optimal you need to be versus how many runs you spend).
169+
168170
| `method=` | Best for | How it works |
169171
|-----------|----------|-------------|
170-
| `"auto"` (default) | General use | Automatically picks the best approach |
172+
| `"auto"` (default) | General use | Automatically finds the best combination (wired to `arm_elimination` — strong best-arm identification with lower evaluation cost than `brute_force`) |
171173
| `"brute_force"` | Small search spaces | Evaluates all combinations |
172174
| `"random"` | Quick exploration | Samples a random fraction |
173175
| `"hill_climbing"` | Topology-aware search | Greedy search using model quality/speed rankings |
174-
| `"arm_elimination"` | Early pruning | Eliminates statistically dominated combinations |
175-
| `"epsilon_lucb"` | Best-arm identification | Stops when LUCB confidence gap is within user `epsilon` |
176-
| `"threshold"` | Thresholding objectives | Classifies combinations above/below user `threshold` |
176+
| `"arm_elimination"` | Best-arm identification | Bandit; eliminates statistically dominated combinations |
177+
| `"epsilon_lucb"` | Extra cost savings when ε-optimal is enough | Bandit; stops when an epsilon-optimal best arm is identified |
178+
| `"threshold"` | Thresholding objectives | Bandit; determines whether each combination is above/below a user-defined `threshold` on the performance metric (e.g., mean accuracy) |
177179
| `"lm_proposal"` | LLM-guided search | Uses a proposer LLM to shortlist promising combinations |
178-
| `"bayesian"` | Expensive evaluations | GP-based optimization (requires `pip install "agentopt[bayesian]"`) |
180+
| `"bayesian"` | Expensive evaluations | GP-based Bayesian optimization over categorical model choices; uses correlation between combinations (requires `pip install "agentopt[bayesian]"`) |
179181

180182
```python
181183
selector = ModelSelector(
182184
agent=MyAgent, models=models, eval_fn=eval_fn, dataset=dataset,
183185
method="epsilon_lucb",
184-
epsilon=0.5
186+
epsilon=0.01
185187
)
186188
results = selector.select_best(parallel=True)
187189
```

examples/advanced_selection_example.py

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,9 @@
33
44
The framework-specific examples (custom_agent_example.py, langchain_example.py, etc.)
55
all use method="brute_force" for simplicity. This example demonstrates the other
6-
selection algorithms available via ModelSelector(method=...).
6+
selection algorithms available via ModelSelector(method=...). The default
7+
method="auto" automatically finds the best combination (wired to arm_elimination —
8+
strong best-arm identification with lower evaluation cost than brute_force).
79
810
Prerequisites:
911
1. pip install openai agentopt
@@ -88,7 +90,7 @@ def eval_fn(expected, actual):
8890

8991

9092
def run_auto():
91-
"""method="auto" — automatically picks the best algorithm (default)."""
93+
"""method="auto" — automatically finds the best combination (default; wired to arm_elimination — strong best-arm identification, cheaper than brute_force)."""
9294
selector = ModelSelector(
9395
agent=MyAgent, models=models, eval_fn=eval_fn, dataset=dataset, method="auto",
9496
)
@@ -103,7 +105,7 @@ def run_random():
103105
eval_fn=eval_fn,
104106
dataset=dataset,
105107
method="random",
106-
sample_fraction=0.5, # evaluate 50% of all combinations
108+
sample_fraction=0.25, # evaluate 25% of all combinations
107109
)
108110
return selector.select_best(parallel=True)
109111

@@ -141,7 +143,7 @@ def run_epsilon_lucb():
141143
eval_fn=eval_fn,
142144
dataset=dataset,
143145
method="epsilon_lucb",
144-
epsilon=0.05, # acceptable gap from the true best
146+
epsilon=0.01, # acceptable gap from the true best
145147
)
146148
return selector.select_best(parallel=True)
147149

@@ -154,7 +156,7 @@ def run_threshold():
154156
eval_fn=eval_fn,
155157
dataset=dataset,
156158
method="threshold",
157-
threshold=0.8, # minimum acceptable accuracy
159+
threshold=0.75, # minimum acceptable accuracy
158160
)
159161
return selector.select_best(parallel=True)
160162

@@ -180,6 +182,7 @@ def run_bayesian():
180182
dataset=dataset,
181183
method="bayesian",
182184
batch_size=4,
185+
sample_fraction=0.25, # evaluate 25% of all combinations
183186
)
184187
return selector.select_best(parallel=True)
185188

@@ -207,7 +210,7 @@ def run_bayesian():
207210
formatter_class=argparse.RawDescriptionHelpFormatter,
208211
epilog="""
209212
Available methods:
210-
auto Automatically picks the best algorithm (default)
213+
auto Automatically finds the best combination (wired to arm_elimination; lower evaluation cost than brute_force) (default)
211214
random Evaluate a random subset of combinations
212215
hill_climbing Greedy search using model quality/speed rankings
213216
arm_elimination Eliminate statistically dominated combinations early

src/agentopt/__init__.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -58,8 +58,10 @@ def ModelSelector(
5858
models: Dict mapping step names to lists of candidate models.
5959
eval_fn: Scoring function ``(expected, actual) -> float``.
6060
dataset: List of ``(input_data, expected_output)`` pairs.
61-
method: Selection algorithm. ``"auto"`` (default) picks the best
62-
approach automatically. Other options: ``"brute_force"``,
61+
method: Selection algorithm. ``"auto"`` (default) automatically finds
62+
the best combination (same implementation as ``"arm_elimination"`` —
63+
strong best-arm identification with lower evaluation cost than
64+
``"brute_force"``). Other options: ``"brute_force"``,
6365
``"random"``, ``"hill_climbing"``, ``"arm_elimination"``,
6466
``"epsilon_lucb"``, ``"threshold"``, ``"lm_proposal"``,
6567
``"bayesian"``.

0 commit comments

Comments
 (0)