Improve explanation of model selection (#43)

QianJaneXie · Qian Xie · web-flow · commit b428ca52d870 · 2026-03-23T10:47:05.000-04:00
* improve "how it works"

* update epsilon for accuracy in [0, 1]

* update recommended parameters for model selectors

* clarify that "auto" is currently wired to arm_elimination

* Mention that "epsilon_lucb" can be a good choice if we want more evlauation savings

---------

Co-authored-by: Qian Xie &lt;90579251+JaneQianXie@users.noreply.github.com&gt;
diff --git a/README.md b/README.md
@@ -91,7 +91,7 @@ for combo in all_combinations(models):       # e.g. {"planner": "gpt-4o", "solve
 # rank combos by quality score, latency & cost
 ```
 
-But AgentOpt does this efficiently with **smart algorithms, parallelization, cost & latency tracking, and caching**. With `method="auto"` (the default), it eliminates clearly worse combinations after just a few datapoints — finding the best model combination with far fewer API calls.
+But AgentOpt does this efficiently with **smart algorithms, parallelization, cost & latency tracking, and caching**. With `method="auto"` (the default), it **automatically** homes in on the best combination (wired to `arm_elimination` — strong best-arm identification with far fewer evaluations than `brute_force`), eliminating clearly worse combinations after just a few datapoints.
 
 You just provide four things:
 
@@ -165,23 +165,25 @@ AgentOpt works with any LLM framework that uses `httpx` under the hood. Here we
 
 AgentOpt includes a rich set of selection algorithms. Advanced users may get significant speedups by choosing the right method for their use case. See the [documentation](https://agentoptimizer.github.io/agentopt/) and [advanced_selection_example.py](examples/advanced_selection_example.py) for details.
 
+If you do not need the strict best model combination and want **more evaluation savings**, `epsilon_lucb` is often a good choice: it stops once an **ε-optimal** arm is found (tune `epsilon` to trade off how close to optimal you need to be versus how many runs you spend).
+
 | `method=` | Best for | How it works |
 |-----------|----------|-------------|
-| `"auto"` (default) | General use | Automatically picks the best approach |
+| `"auto"` (default) | General use | Automatically finds the best combination (wired to `arm_elimination` — strong best-arm identification with lower evaluation cost than `brute_force`) |
 | `"brute_force"` | Small search spaces | Evaluates all combinations |
 | `"random"` | Quick exploration | Samples a random fraction |
 | `"hill_climbing"` | Topology-aware search | Greedy search using model quality/speed rankings |
-| `"arm_elimination"` | Early pruning | Eliminates statistically dominated combinations |
-| `"epsilon_lucb"` | Best-arm identification | Stops when LUCB confidence gap is within user `epsilon` |
-| `"threshold"` | Thresholding objectives | Classifies combinations above/below user `threshold` |
+| `"arm_elimination"` | Best-arm identification | Bandit; eliminates statistically dominated combinations |
+| `"epsilon_lucb"` | Extra cost savings when ε-optimal is enough | Bandit; stops when an epsilon-optimal best arm is identified |
+| `"threshold"` | Thresholding objectives | Bandit; determines whether each combination is above/below a user-defined `threshold` on the performance metric (e.g., mean accuracy) |
 | `"lm_proposal"` | LLM-guided search | Uses a proposer LLM to shortlist promising combinations |
-| `"bayesian"` | Expensive evaluations | GP-based optimization (requires `pip install "agentopt[bayesian]"`) |
+| `"bayesian"` | Expensive evaluations | GP-based Bayesian optimization over categorical model choices; uses correlation between combinations (requires `pip install "agentopt[bayesian]"`) |
 
 ```python
 selector = ModelSelector(
     agent=MyAgent, models=models, eval_fn=eval_fn, dataset=dataset,
     method="epsilon_lucb",
-    epsilon=0.5
+    epsilon=0.01
 )
 results = selector.select_best(parallel=True)
 ```
diff --git a/examples/advanced_selection_example.py b/examples/advanced_selection_example.py
@@ -3,7 +3,9 @@
 
 The framework-specific examples (custom_agent_example.py, langchain_example.py, etc.)
 all use method="brute_force" for simplicity. This example demonstrates the other
-selection algorithms available via ModelSelector(method=...).
+selection algorithms available via ModelSelector(method=...). The default
+method="auto" automatically finds the best combination (wired to arm_elimination —
+strong best-arm identification with lower evaluation cost than brute_force).
 
 Prerequisites:
     1. pip install openai agentopt
@@ -88,7 +90,7 @@ def eval_fn(expected, actual):
 
 
 def run_auto():
-    """method="auto" — automatically picks the best algorithm (default)."""
+    """method="auto" — automatically finds the best combination (default; wired to arm_elimination — strong best-arm identification, cheaper than brute_force)."""
     selector = ModelSelector(
         agent=MyAgent, models=models, eval_fn=eval_fn, dataset=dataset, method="auto",
     )
@@ -103,7 +105,7 @@ def run_random():
         eval_fn=eval_fn,
         dataset=dataset,
         method="random",
-        sample_fraction=0.5,  # evaluate 50% of all combinations
+        sample_fraction=0.25,  # evaluate 25% of all combinations
     )
     return selector.select_best(parallel=True)
 
@@ -141,7 +143,7 @@ def run_epsilon_lucb():
         eval_fn=eval_fn,
         dataset=dataset,
         method="epsilon_lucb",
-        epsilon=0.05,  # acceptable gap from the true best
+        epsilon=0.01,  # acceptable gap from the true best
     )
     return selector.select_best(parallel=True)
 
@@ -154,7 +156,7 @@ def run_threshold():
         eval_fn=eval_fn,
         dataset=dataset,
         method="threshold",
-        threshold=0.8,  # minimum acceptable accuracy
+        threshold=0.75,  # minimum acceptable accuracy
     )
     return selector.select_best(parallel=True)
 
@@ -180,6 +182,7 @@ def run_bayesian():
         dataset=dataset,
         method="bayesian",
         batch_size=4,
+        sample_fraction=0.25,  # evaluate 25% of all combinations
     )
     return selector.select_best(parallel=True)
 
@@ -207,7 +210,7 @@ def run_bayesian():
         formatter_class=argparse.RawDescriptionHelpFormatter,
         epilog="""
 Available methods:
-  auto              Automatically picks the best algorithm (default)
+  auto              Automatically finds the best combination (wired to arm_elimination; lower evaluation cost than brute_force) (default)
   random            Evaluate a random subset of combinations
   hill_climbing     Greedy search using model quality/speed rankings
   arm_elimination   Eliminate statistically dominated combinations early
diff --git a/src/agentopt/__init__.py b/src/agentopt/__init__.py
@@ -58,8 +58,10 @@ def ModelSelector(
         models: Dict mapping step names to lists of candidate models.
         eval_fn: Scoring function ``(expected, actual) -> float``.
         dataset: List of ``(input_data, expected_output)`` pairs.
-        method: Selection algorithm. ``"auto"`` (default) picks the best
-            approach automatically. Other options: ``"brute_force"``,
+        method: Selection algorithm. ``"auto"`` (default) automatically finds
+            the best combination (same implementation as ``"arm_elimination"`` —
+            strong best-arm identification with lower evaluation cost than
+            ``"brute_force"``). Other options: ``"brute_force"``,
             ``"random"``, ``"hill_climbing"``, ``"arm_elimination"``,
             ``"epsilon_lucb"``, ``"threshold"``, ``"lm_proposal"``,
             ``"bayesian"``.