Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds a step size estimation feature to help users determine optimal step sizes for basin-hopping sampling. The estimator searches for a step size that produces a target escape rate (default 50%), where the escape rate represents the proportion of perturbations that lead to different local optima.
Changes:
- Added
StepSizeEstimatorclass with configuration and result dataclasses to estimate optimal step sizes through a decimal refinement search algorithm - Exported new classes in the public API via
__init__.py
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
| src/lonpy/step_size.py | Implements step size estimation using a search algorithm that progressively refines step sizes to achieve target escape rates |
| src/lonpy/init.py | Exports new StepSizeEstimator, StepSizeEstimatorConfig, and StepSizeResult classes to the public API |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| domain_array = np.array(domain) | ||
| sampler = self._make_sampler() | ||
|
|
||
| step = 0.1 |
There was a problem hiding this comment.
The estimate method doesn't validate the domain parameter. If an empty domain list is passed, domain_array will be empty and could lead to errors in _compute_escape_rate when computing p (line 97) or generating random initial points (line 102). Consider adding validation to ensure domain is non-empty and that each bound tuple has valid lower < upper values.
| n_samples: int = 100 | ||
| n_perturbations: int = 30 | ||
| target_escape_rate: float = 0.5 | ||
| search_precision: int = 4 |
There was a problem hiding this comment.
The search_precision parameter controls the number of refinement iterations in the search algorithm (line 163), but this name is misleading. It's described in the docstring as "Decimal digits of precision for step size search" (line 19), which suggests it should control decimal precision. However, it's used as an iteration count, not precision digits. The actual decimal precision is applied on line 191 where round() is called with search_precision. This creates confusion because search_precision=4 means 4 iterations AND rounds to 4 decimal places, but these are different concepts. Consider renaming to search_iterations or clarifying the documentation to explain this dual purpose.
|
|
||
| n_samples: int = 100 | ||
| n_perturbations: int = 30 | ||
| target_escape_rate: float = 0.5 |
There was a problem hiding this comment.
The target_escape_rate configuration parameter has no validation. Values outside [0, 1] would not make sense for an escape rate (which is a probability/ratio), but there's no check to prevent invalid values. Consider adding validation in init or post_init to ensure 0 <= target_escape_rate <= 1.
| def _compute_escape_rate( | ||
| self, | ||
| func: Callable[[np.ndarray], float], | ||
| domain_array: np.ndarray, | ||
| step_size: float, | ||
| sampler: BasinHoppingSampler, | ||
| ) -> float: | ||
| """Compute the average escape rate for a given step size.""" | ||
| bounds_array = domain_array if self.config.bounded else None | ||
| p = step_size * np.abs(domain_array[:, 1] - domain_array[:, 0]) | ||
|
|
||
| escape_rates = [] | ||
|
|
||
| for _ in range(self.config.n_samples): | ||
| x0 = np.random.uniform(domain_array[:, 0], domain_array[:, 1]) | ||
| res = minimize( | ||
| func, | ||
| x0, | ||
| method=self.config.minimizer_method, | ||
| options=self.config.minimizer_options, | ||
| bounds=bounds_array, | ||
| ) | ||
| optimum = res.x | ||
| optimum_hash = sampler._hash_solution(optimum) | ||
|
|
||
| escapes = 0 | ||
| for _ in range(self.config.n_perturbations): | ||
| x_perturbed = sampler._perturbation(optimum, p, bounds_array) | ||
| res_perturbed = minimize( | ||
| func, | ||
| x_perturbed, | ||
| method=self.config.minimizer_method, | ||
| options=self.config.minimizer_options, | ||
| bounds=bounds_array, | ||
| ) | ||
| new_hash = sampler._hash_solution(res_perturbed.x) | ||
| if new_hash != optimum_hash: | ||
| escapes += 1 | ||
|
|
||
| escape_rates.append(escapes / self.config.n_perturbations) | ||
|
|
||
| return float(np.mean(escape_rates)) | ||
|
|
||
| def estimate( | ||
| self, |
There was a problem hiding this comment.
The _compute_escape_rate method is called multiple times during the search (once per step size tested), and each call runs n_samples * n_perturbations minimizations. With defaults (n_samples=100, n_perturbations=30), that's 3000 minimizations per step size tested. With search_precision=4, this could be ~4-40+ step sizes tested (depending on when the upper bound is found), resulting in 12,000-120,000+ minimizations. For expensive objective functions, this could be very slow. Consider documenting the computational cost in the class or method docstring, or providing guidance on adjusting n_samples/n_perturbations for expensive functions.
| last_tested: tuple[float, float] | None = None | ||
|
|
||
| for _ in range(self.config.search_precision): | ||
| while step <= 1.0 + 1e-12: |
There was a problem hiding this comment.
The condition allows step to exceed 1.0 (step <= 1.0 + 1e-12), but step_size is meant to represent a percentage of domain range. A step_size > 1.0 means perturbations larger than the entire domain range, which may not be meaningful for bounded problems. While this might be intentional to explore edge cases, it could lead to unexpected behavior. Consider documenting why values > 1.0 are allowed, or constraining the search to step <= 1.0 if that's the intended maximum.
| while step <= 1.0 + 1e-12: | |
| while step <= 1.0: |
| n_samples: int = 100 | ||
| n_perturbations: int = 30 | ||
| target_escape_rate: float = 0.5 | ||
| search_precision: int = 4 |
There was a problem hiding this comment.
The search_precision parameter has no validation. If set to 0 or negative, the outer loop (line 163) won't execute any iterations, leading to an empty candidates list and a ValueError when calling min() on line 188. Consider adding validation to ensure search_precision >= 1.
| optimum = res.x | ||
| optimum_hash = sampler._hash_solution(optimum) | ||
|
|
||
| escapes = 0 | ||
| for _ in range(self.config.n_perturbations): | ||
| x_perturbed = sampler._perturbation(optimum, p, bounds_array) | ||
| res_perturbed = minimize( | ||
| func, | ||
| x_perturbed, | ||
| method=self.config.minimizer_method, | ||
| options=self.config.minimizer_options, | ||
| bounds=bounds_array, | ||
| ) | ||
| new_hash = sampler._hash_solution(res_perturbed.x) | ||
| if new_hash != optimum_hash: | ||
| escapes += 1 | ||
|
|
There was a problem hiding this comment.
The step size estimator accesses private methods (_perturbation, _round_value, _hash_solution) of BasinHoppingSampler. While this works, it creates tight coupling and fragility. If these private methods change signatures or behavior, this code will break. Consider either making these methods public (removing the underscore prefix), or creating public helper methods in BasinHoppingSampler that expose this functionality for reuse.
| # Refine: reduce increment and resume from lower bound | ||
| increment /= 10 | ||
| if best_lower is not None: | ||
| step = best_lower[0] + increment | ||
| else: | ||
| step = increment | ||
| break | ||
| else: | ||
| # Reached step > 1.0 without finding upper bound | ||
| break |
There was a problem hiding this comment.
In the search algorithm, when rate >= target (line 172-181), the code breaks out of the inner while loop and continues with the next iteration of the outer for loop. However, the step variable may not be reset appropriately if best_lower is None. In this case, step is set to increment (line 180), but increment was just divided by 10 (line 176). On subsequent outer loop iterations, if rate < target on the first test, step will be incremented by the small increment value and could remain below the previously tested values, causing redundant testing or inefficient search. Consider whether the initialization of step should account for previously explored ranges.
No description provided.