You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Restructure Quick Start: lead with code + output, then explain (#39)
Show the ModelSelector call and results table first, then the
conceptual loop, then break down the four elements (agent, dataset,
eval function, models) separately. Add LLM-as-judge note.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: README.md
+42-45Lines changed: 42 additions & 45 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,6 +21,8 @@
21
21
</p>
22
22
23
23
---
24
+
25
+
## Why AgentOpt
24
26
Choosing models for your agent is surprisingly hard. Which family? Small or big? Thinking or non-thinking? And different steps may need different models. The combinatorial space explodes fast — 3 steps × 8 models = **512 combinations** to evaluate.
25
27
26
28
AgentOpt solves this automatically. Give it your agent and a small evaluation dataset, and it will efficiently search the model combination space to present you with the **Pareto curve of performance/cost/latency tradeoffs** — so you can make an informed choice.
@@ -38,31 +40,58 @@ pip install agentopt
38
40
```
39
41
## Quick Start
40
42
41
-
Say you have an agent with two LLM steps (a planner and a solver) and you want to find the best model for each. A naive approach would be:
43
+
Say you have an agent with two LLM steps (a planner and a solver) and you want to find the best model for each:
Conceptually, this is what happens under the hood:
48
76
77
+
```python
49
78
for combo in all_combinations(models): # e.g. {"planner": "gpt-4o", "solver": "gpt-4o-mini"}
50
79
agent = MyAgent(combo) # build agent with this model combo
51
80
for input_data, expected in dataset:
52
81
actual = agent.run(input_data) # run on each datapoint
53
82
score = eval_fn(expected, actual) # score the output
54
-
55
83
# rank combos by quality score, latency & cost
56
84
```
57
85
58
-
AgentOpt automates this with **efficient algorithms, parallelization, cost & latency tracking, and caching**. You just provide four things: an agent, model candidates, a dataset, and a score function.
86
+
But AgentOpt does this efficiently with **smart algorithms, parallelization, cost & latency tracking, and caching**. With `method="auto"` (the default), it eliminates clearly worse combinations after just a few datapoints — finding the best model combination with far fewer API calls.
59
87
60
-
**Step 1**: Say you have an agent (implemented in arbitrary way), we simply ask you wrap up your agent into a class with two methods:
88
+
You just provide four things:
89
+
90
+
**Agent** — wrap your agent into a class with `__init__(self, models)` and `run(self, input_data)`:
61
91
62
92
-`__init__(self, models)` — receive a model configuration and do your agent creation. `models` is a dict that maps each step you want to optimize to a specific model, e.g. `{"planner": "gpt-4o-mini", "solver": "gpt-4o"}`.
63
93
-`run(self, input_data)` — run your agent on a single datapoint and return the output.
**Step 2**: Define your evaluation dataset — a list of `(input_data, expected_output)` pairs:
120
+
**Dataset** — a list of `(input_data, expected_output)` pairs:
94
121
95
122
```python
96
123
dataset = [
@@ -102,46 +129,16 @@ dataset = [
102
129
]
103
130
```
104
131
105
-
**Step 3**: Define your evaluation function. It compares the output of `agent.run(input_data)`against the `expected_output` from the dataset, and returns a score:
132
+
**Eval function** — compares the agent output against the expected answer, returns a score:
106
133
107
134
```python
108
135
defeval_fn(expected, actual):
109
-
"""Score the agent's output (actual) against the expected answer."""
**Step 4**: Run model selection. The `models` dict maps each step name to a **list of candidate models** to try. AgentOpt picks one from each list, constructs the agent, and evaluates it:
With `method="auto"` (the default), AgentOpt uses smart algorithms that eliminate clearly worse combinations after just a few datapoints — finding the best model combination with far fewer API calls. Use `method="brute_force"` to evaluate all combinations exhaustively.
141
+
**Models** — a dict mapping each step name to a list of candidate models to try. AgentOpt picks one from each list, constructs the agent, and evaluates it.
0 commit comments