Skip to content

Commit 3a57701

Browse files
TianyiPengclaude
andauthored
Restructure Quick Start: lead with code + output, then explain (#39)
Show the ModelSelector call and results table first, then the conceptual loop, then break down the four elements (agent, dataset, eval function, models) separately. Add LLM-as-judge note. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 60fe41b commit 3a57701

1 file changed

Lines changed: 42 additions & 45 deletions

File tree

README.md

Lines changed: 42 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,8 @@
2121
</p>
2222

2323
---
24+
25+
## Why AgentOpt
2426
Choosing models for your agent is surprisingly hard. Which family? Small or big? Thinking or non-thinking? And different steps may need different models. The combinatorial space explodes fast — 3 steps × 8 models = **512 combinations** to evaluate.
2527

2628
AgentOpt solves this automatically. Give it your agent and a small evaluation dataset, and it will efficiently search the model combination space to present you with the **Pareto curve of performance/cost/latency tradeoffs** — so you can make an informed choice.
@@ -38,31 +40,58 @@ pip install agentopt
3840
```
3941
## Quick Start
4042

41-
Say you have an agent with two LLM steps (a planner and a solver) and you want to find the best model for each. A naive approach would be:
43+
Say you have an agent with two LLM steps (a planner and a solver) and you want to find the best model for each:
4244

4345
```python
44-
models = {
45-
"planner": ["gpt-4o", "gpt-4o-mini", "gpt-4.1-nano"],
46-
"solver": ["gpt-4o", "gpt-4o-mini", "gpt-4.1-nano"],
47-
} # → 3 × 3 = 9 combinations
46+
from agentopt import ModelSelector
47+
48+
selector = ModelSelector(
49+
agent=MyAgent,
50+
models={
51+
"planner": ["gpt-4o", "gpt-4o-mini", "gpt-4.1-nano"], # 3 options
52+
"solver": ["gpt-4o", "gpt-4o-mini", "gpt-4.1-nano"], # 3 options
53+
}, # → 3 × 3 = 9 combinations to evaluate
54+
eval_fn=eval_fn,
55+
dataset=dataset,
56+
method="brute_force", # or "auto" for smarter selection algorithms
57+
)
58+
59+
results = selector.select_best(parallel=True, max_concurrent=50)
60+
results.print_summary()
61+
```
62+
63+
Output:
64+
```
65+
Model Selection Results
66+
----------------------------------------------------------------------------
67+
Rank Model Accuracy Latency Price
68+
----------------------------------------------------------------------------
69+
>>> 1 planner=gpt-4.1-nano + solver=gpt-4.1-nano 100.00% 0.85s $0.000420
70+
2 planner=gpt-4o-mini + solver=gpt-4o-mini 100.00% 1.20s $0.002372
71+
3 planner=gpt-4o + solver=gpt-4o 100.00% 2.70s $0.014355
72+
...
73+
```
74+
75+
Conceptually, this is what happens under the hood:
4876

77+
```python
4978
for combo in all_combinations(models): # e.g. {"planner": "gpt-4o", "solver": "gpt-4o-mini"}
5079
agent = MyAgent(combo) # build agent with this model combo
5180
for input_data, expected in dataset:
5281
actual = agent.run(input_data) # run on each datapoint
5382
score = eval_fn(expected, actual) # score the output
54-
5583
# rank combos by quality score, latency & cost
5684
```
5785

58-
AgentOpt automates this with **efficient algorithms, parallelization, cost & latency tracking, and caching**. You just provide four things: an agent, model candidates, a dataset, and a score function.
86+
But AgentOpt does this efficiently with **smart algorithms, parallelization, cost & latency tracking, and caching**. With `method="auto"` (the default), it eliminates clearly worse combinations after just a few datapoints — finding the best model combination with far fewer API calls.
5987

60-
**Step 1**: Say you have an agent (implemented in arbitrary way), we simply ask you wrap up your agent into a class with two methods:
88+
You just provide four things:
89+
90+
**Agent** — wrap your agent into a class with `__init__(self, models)` and `run(self, input_data)`:
6191

6292
- `__init__(self, models)` — receive a model configuration and do your agent creation. `models` is a dict that maps each step you want to optimize to a specific model, e.g. `{"planner": "gpt-4o-mini", "solver": "gpt-4o"}`.
6393
- `run(self, input_data)` — run your agent on a single datapoint and return the output.
6494

65-
6695
```python
6796
from openai import OpenAI
6897

@@ -73,13 +102,11 @@ class MyAgent:
73102
self.solver_model = models["solver"]
74103

75104
def run(self, input_data):
76-
# Step 1: Plan
77105
plan = self.client.chat.completions.create(
78106
model=self.planner_model,
79107
messages=[{"role": "user", "content": f"Plan: {input_data}"}],
80108
).choices[0].message.content
81109

82-
# Step 2: Solve
83110
answer = self.client.chat.completions.create(
84111
model=self.solver_model,
85112
messages=[
@@ -90,7 +117,7 @@ class MyAgent:
90117
return answer
91118
```
92119

93-
**Step 2**: Define your evaluation dataset — a list of `(input_data, expected_output)` pairs:
120+
**Dataset** — a list of `(input_data, expected_output)` pairs:
94121

95122
```python
96123
dataset = [
@@ -102,46 +129,16 @@ dataset = [
102129
]
103130
```
104131

105-
**Step 3**: Define your evaluation function. It compares the output of `agent.run(input_data)` against the `expected_output` from the dataset, and returns a score:
132+
**Eval function**compares the agent output against the expected answer, returns a score:
106133

107134
```python
108135
def eval_fn(expected, actual):
109-
"""Score the agent's output (actual) against the expected answer."""
110136
return 1.0 if expected.lower() in str(actual).lower() else 0.0
111137
```
112138

113-
**Step 4**: Run model selection. The `models` dict maps each step name to a **list of candidate models** to try. AgentOpt picks one from each list, constructs the agent, and evaluates it:
114-
115-
```python
116-
from agentopt import ModelSelector
117-
118-
selector = ModelSelector(
119-
agent=MyAgent,
120-
models={
121-
"planner": ["gpt-4o", "gpt-4o-mini", "gpt-4.1-nano"], # 3 options
122-
"solver": ["gpt-4o", "gpt-4o-mini", "gpt-4.1-nano"], # 3 options
123-
}, # → 3 × 3 = 9 combinations to evaluate
124-
eval_fn=eval_fn,
125-
dataset=dataset,
126-
method="brute_force", # or "auto" for smarter selection algorithms
127-
)
139+
LLM-as-judge is also supported — just call your judge LLM inside `eval_fn`.
128140

129-
results = selector.select_best(parallel=True, max_concurrent=50)
130-
results.print_summary()
131-
```
132-
133-
Output:
134-
```
135-
Model Selection Results
136-
----------------------------------------------------------------------------
137-
Rank Model Accuracy Latency Price
138-
----------------------------------------------------------------------------
139-
>>> 1 planner=gpt-4.1-nano + solver=gpt-4.1-nano 100.00% 0.85s $0.000420
140-
2 planner=gpt-4o-mini + solver=gpt-4o-mini 100.00% 1.20s $0.002372
141-
3 planner=gpt-4o + solver=gpt-4o 100.00% 2.70s $0.014355
142-
...
143-
```
144-
With `method="auto"` (the default), AgentOpt uses smart algorithms that eliminate clearly worse combinations after just a few datapoints — finding the best model combination with far fewer API calls. Use `method="brute_force"` to evaluate all combinations exhaustively.
141+
**Models** — a dict mapping each step name to a list of candidate models to try. AgentOpt picks one from each list, constructs the agent, and evaluates it.
145142

146143
## Framework Compatibility
147144

0 commit comments

Comments
 (0)