- Clone the repository:
git clone <repository_url> cd LLaMEA-BO
- Install the required dependencies via Poetry:
pip install poetry poetry install
- The raw data files are stored in the Zenodo.
- Replace
GEMINI_API_KEYinrun_es_search.shwith your Gemini API key. - Change
N_POPULATIONinrun_es_search.shas needed. - Run the script:
bash run_es_search.sh
- The results will be saved in
exp_es_search/directory.
- Run the script:
bash run_algo_evaluation.sh
- The results will be saved in
exp_eval/directory.
Follow the instructions in the Benchmarks/Readme.md.
The project follows a modular structure primarily located within the llamevol/ directory.
llamevol/: Contains the core implementation of the LLaMEvol algorithm.llamevol.py: The main class orchestrating the LLaMEvol process.individual.py: Defines theIndividualclass representing a single generated algorithm/solution.llm.py: Handles interactions with the Language Model (LLM).prompt_generators/: Contains classes responsible for generating prompts for the LLM.evaluator/: Includes code for executing and evaluating the performance of generated algorithms, often using benchmark suites like BBOB (via IOHprofiler). It handles code execution, error capture, and metric calculation.population/: Manages the collection (population) ofIndividualalgorithms, implementing selection strategies and diversity maintenance.utils.py: Provides utility functions, including logging, serialization and plotting.
Benchmarks/: Contains scripts and results for running external benchmarks like Bayesmark.Experiments/: Holds scripts for running specific experiments and plotting results.
Below is a simplified example demonstrating how to set up and run the LLaMEvol evolutionary process using the provided components. This example uses an IOHEvaluator, a BaselinePromptGenerator, a gemini-2.0-flash model via LLMmanager, and an ESPopulation.
import logging
from llamevol.evaluator.ioh_evaluator import IOHEvaluator
from llamevol.prompt_generators import BaselinePromptGenerator
from llamevol.population import ESPopulation
from llamevol.llm import LLMmanager
from llamevol import LLaMEvol
from llamevol.utils import setup_logger
# Configure logging
setup_logger(level=logging.INFO)
# 1. Instantiate Evaluator (Example: IOH BBOB)
evaluator = IOHEvaluator(budget=100, dim=5, problems=[2, 4, 6], instances=[[1]]*3, repeat=3)
evaluator.timeout = 30 * 60 # Set timeout(seconds) for each evaluation(including all tasks).
# 2. Instantiate Prompt Generator
prompt_generator = BaselinePromptGenerator()
prompt_generator.is_bo = True # Specify it's for Bayesian Optimization
# 3. Instantiate LLM Manager (Example: Google Gemini)
# Ensure API key is set via environment variable or passed directly
api_key = 'YOUR_API_KEY' # Replace with your actual key or load from env
llm_manager = LLMmanager(model_name='gemini-2.0-flash', api_key=api_key, client_str='google')
# 4. Instantiate Population (Example: (1+1)-ES)
es_options = {
'n_parent': 1,
'n_offspring': 1,
'is_elitist': True,
'log_dir': 'exp_es_search', # Directory to save logs
}
population = ESPopulation(
n_parent=es_options['n_parent'],
n_offspring=es_options['n_offspring'],
use_elitism=es_options['is_elitist']
)
population.save_dir = es_options['log_dir']
population.name = f"evol_{es_options['n_parent']}+{es_options['n_offspring']}"
# 5. Instantiate LLaMEvol orchestrator
llamevol = LLaMEvol()
# 6. Run the evolution
llm_params = {'temperature': 0.7}
llamevol.run_evolutions(
llm=llm_manager,
evaluator=evaluator,
prompt_generator=prompt_generator,
population=population,
n_population=5, # Maximum number of generated individuals
options={'llm_params': llm_params}
)
# 7. Save the final population
population.save(suffix='final')
print("Evolution finished. Results saved in:", population.log_dir)For a runnable script with command-line arguments, see run_es_search.py.
The IOHEvaluator supports several modes for parallelizing the evaluation of algorithms across different IOH problems, instances, and repetitions:
-
Sequential Execution:
- How: This is the default mode if no parallel options are explicitly enabled (i.e.,
max_eval_workersis set to 0 or less, anduse_mpianduse_mpi_futureareFalse). - Description: Each evaluation task (a specific problem/instance/repetition) is executed one after another in the main process.
- How: This is the default mode if no parallel options are explicitly enabled (i.e.,
-
Thread Pool Execution:
- How: Set
max_eval_workersto a positive integer (e.g.,evaluator.max_eval_workers = 10) and ensureuse_multi_processisFalse(default). - Description: Uses Python's
concurrent.futures.ThreadPoolExecutorto run evaluation tasks concurrently in multiple threads within the same process.
- How: Set
-
Process Pool Execution:
- How: Set
max_eval_workersto a positive integer and setuse_multi_process = True(e.g.,evaluator.max_eval_workers = 10; evaluator.use_multi_process = True). - Description: Uses Python's
concurrent.futures.ProcessPoolExecutorto run evaluation tasks in separate processes. Suitable for the algorithm which don't use multiple cores effectively.
- How: Set
-
MPI (Custom Task Manager):
- How: Set
use_mpi = True(e.g.,evaluator.use_mpi = True). Requires MPI environment,mpi4pyinstalled and a specific command to run the script (e.g.,mpiexec python pyfile). An example can be found inrun_algo_evaluation.py. - Description: Utilizes a custom master-worker implementation (
MPITaskManager) built on top ofmpi4py. The main node(rank 0) distributes tasks to worker nodes(rank > 0). Suitable for distributed systems.
- How: Set
-
MPI (mpi4py.futures):
- How: Set
use_mpi_future = True(e.g.,evaluator.use_mpi_future = True). Requires MPI environment,mpi4pyinstalled and a specific command to run the script (e.g.,mpiexec -n numprocs python -m mpi4py.futures pyfile). The details of the command can be found in the documentation ofmpi4py.futures. - Description: Leverages
mpi4py.futures.MPIPoolExecutorfor a higher-level interface to MPI-based parallelism. Similar to the process pool but designed specifically for MPI environments.
- How: Set
Configuration:
These options are typically set as attributes on the IOHEvaluator instance before calling the evaluate method. An example can be found in run_algo_evaluation.py.
The LLaMEvol class (llamevol/llamevol.py) is the central orchestrator of the evolutionary algorithm. It coordinates the interactions between the LLM, Evaluator, Prompt Generator, and Population components to drive the search for optimal algorithms.
Structure & Features:
- Main Loop: Implements the core evolutionary loop (
run_evolutions), managing generations and population size. - Component Integration: Takes instances of
LLMmanager,AbstractEvaluator,PromptGenerator, andPopulationas inputs, delegating specific tasks to each. - Task Determination: Dynamically determines the appropriate task for the LLM based on the state of the parent individuals (e.g.,
INITIALIZE_SOLUTION,FIX_ERRORS,OPTIMIZE_PERFORMANCE) usingupdate_current_task. - LLM Interaction: Handles querying the LLM via the
LLMmanager, including:- Constructing session messages based on prompts from the
PromptGenerator. - Applying LLM parameters (temperature, top_k).
- Managing retries (
n_retry) in case of LLM or extraction failures. - Optional parallel querying using
concurrent.futures.ThreadPoolExecutor(n_query_threads).
- Constructing session messages based on prompts from the
- Evaluation Trigger: Calls the
evaluatemethod of the providedEvaluatoron the code generated by the LLM. - Population Update: Updates
Individualobjects within thePopulationwith the results from the LLM (code, description) and Evaluator (fitness, feedback) using_update_ind_and_handler. - Token Tracking: Logs prompt and response token counts per generation (
LLaMEvolTokenLogItem). - Progression Control: Iterates through generations until a target population size (
n_population) is reached.
Usage:
- Instantiate Components: Create instances of
LLMmanager,AbstractEvaluator,PromptGenerator, andPopulationconfigured for your specific task and resources. - Instantiate LLaMEvol: Create an instance of the
LLaMEvolclass.from llamevol import LLaMEvol llamevol = LLaMEvol()
- Run Evolution: Call the
run_evolutionsmethod, passing the instantiated components and desired parameters.# Assuming llm, evaluator, prompt_generator, population are already created llamevol.run_evolutions( llm=llm_manager, evaluator=evaluator, prompt_generator=prompt_generator, population=population, n_population=20, # Maximum number of individuals n_retry=3, n_query_threads=4, # Number of parallel LLM queries options={'llm_params': {'temperature': 0.7}} )
- Results: The final population (containing evolved individuals and their performance) can be accessed and saved via the
Populationobject after the run completes.
Customization:
- Component Swapping: The primary way to customize
LLaMEvol's behavior is by providing different implementations of its core components (LLM, Evaluator, Prompt Generator, Population). For example, using a differentPopulationclass changes the selection and generation strategy. - Configuration: Adjust parameters passed to
run_evolutions, such asn_population,n_retry,n_query_threads, and LLM-specific settings within theoptionsdictionary.
This module (llamevol/llm.py) acts as a central manager for interacting with various Large Language Models (LLMs).
Features:
- Provides a unified interface (
LLMmanager) to connect to different LLM providers (Groq, Google GenAI, OpenAI-compatible APIs like OpenRouter). - Abstracts away the specific API details for each provider.
- Manages API keys and base URLs, primarily loaded from environment variables.
- Defines a standardized response object (
LLMClientResponse) containing the generated text, token counts, and potential errors. - Supports different client implementations (
OpenAIClient,GoogleGenAIClient,AISuiteClient,RequestClient).
Usage:
- Environment Variables(Optional): Ensure the necessary API keys and base URLs for the desired LLMs are set as environment variables (e.g.,
GROQ_API_KEY,GEMINI_API_KEY, etc.). Copy and rename.env.templateto.envand fill in the required keys.cp .env.example .env # Edit .env to add your API keys - Initialization: Create an instance of
LLMmanagerby providing amodel_keywhich corresponds to an entry in theLLMSdictionary within the script. Alternatively, you can manually specifymodel_name,api_key,base_url, andclient_str. The mapping ofclient_strto the actual client class is handled in theLLMmanagerconstructor.from llamevol.llm import LLMmanager # Using a predefined model key llm_manager = LLMmanager(model_key='llama3-70b-8192') # Or manually configuring (example) # llm_manager = LLMmanager(model_name='some-model', api_key='YOUR_API_KEY', base_url='https://api.example.com/v1', client_str='openai')
- Chat: Use the
chatmethod, passing a list of messages in the standard OpenAI format (list of dictionaries with 'role' and 'content').messages = [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Explain the theory of relativity."} ] response = llm_manager.chat(messages, temperature=0.7) if response.error: print(f"Error: {response.error}") else: print(f"Response Text: {response.text}") print(f"Prompt Tokens: {response.prompt_token_count}") print(f"Response Tokens: {response.response_token_count}")
Customization:
-
Adding Predefined Models: To add support for a new model using an existing provider type, add an entry to the
LLMSdictionary inllm.py. You'll need:- the model name recognized by the API
- the environment variable name for the API key
- the environment variable name for the base URL (if applicable)
- a maximum interval value (Deprecated, designed for rate limiting and retries).
- the client type string (
'groq','google','openai','openrouter','request', orNonefor default handling likeAISuiteClient).
-
Adding New Providers:
- Create a new class that inherits from
LLMClient. - Implement the
raw_completionmethod to handle the specific API request/response logic for the new provider. - Update the
LLMmanager.__init__method to recognize a newclient_strand instantiate your custom client class when that string is provided.
- Create a new class that inherits from
-
Adding New Providers from AISuite:
- Install the the provider-specific package (e.g.,
pip install 'aisuite[anthropic]'). - Add the corresponding
API_KEYin__init__ofAISuiteClient.
- Install the the provider-specific package (e.g.,
This component constructs the prompts sent to the LLM for generating or modifying optimization algorithms.
Structure & Features:
- Abstract Base Classes: Defines
PromptGeneratorandResponseHandlerabstract classes (abstract_prompt_generator.py) to ensure a consistent interface. - Concrete Implementations: Provides specific generators like
BaselinePromptGenerator(for generating algorithms from scratch),BoTunerPromptGenerator(for refining existing algorithms). - Contextual Prompts: Dynamically builds prompts incorporating problem descriptions, existing candidate solutions (code, descriptions, past performance), evaluation feedback (errors, performance metrics like AOC), and potentially information about the broader population of algorithms.
- Task-Specific Instructions: Generates detailed instructions for the LLM based on the task (e.g., "design a novel algorithm", "fix the error in this code", "optimize this algorithm based on feedback").
- Response Parsing: Each
PromptGeneratorhas a correspondingResponseHandlersubclass responsible for parsing the LLM's structured output (e.g., extracting code blocks, justifications, pseudocode) using methods likeextract_response.
Usage:
- Instantiate: Choose and instantiate a specific
PromptGeneratorsubclass. - Generate Prompt: Call the
get_promptmethod, passing theGenerationTask, problem description, and any relevant context (like candidateResponseHandlerobjects or thePopulation). - Query LLM: Use the returned system and user prompts with the
LLMmanager. - Parse Response: Get the corresponding
ResponseHandlerinstance usingget_response_handler()and use itsextract_responsemethod on the LLM's output string.
Customization:
- New Strategies: Create new subclasses inheriting from
PromptGeneratorandResponseHandler. - Implement Methods: Override methods like
get_prompt,task_description,task_instruction,response_format,evaluation_feedback_promptin yourPromptGeneratorsubclass, andextract_responsein yourResponseHandlersubclass to define the new prompting logic and response parsing.
The Evaluator component is responsible for executing the Python code generated by the LLM and assessing its performance on optimization tasks.
Structure & Features:
- Abstract Base: Defines
AbstractEvaluator(evaluator.py) for a consistent interface. - Concrete Implementations: Provides evaluators for standard benchmarks:
IOHEvaluator(ioh_evaluator.py): Evaluates algorithms on the IOHprofiler (BBOB) benchmark suite. Supports parallel execution across multiple problem instances and repetitions.RandomBoTorchTestEvaluator(random_botorch_evaluator.py): Evaluates algorithms on synthetic test functions from the BoTorch library.
- Code Execution: Uses utilities in
exec_utils.py(default_exec) to safely execute the generated Python code, capturing standard output, errors, and execution time. It handles budget constraints viaBOOverBudgetException. - Result Tracking: Employs
EvaluatorResultandEvaluatorBasicResult(evaluator_result.py) to store detailed outcomes for each evaluation run, including:- Best function value found (
best_y). - History of evaluated points (
x_hist,y_hist). - Area Over the Convergence Curve (AOC), including log-scale AOC, calculated using
ConvergenceCurveAnalyzer. - Execution time and any runtime errors.
- Best function value found (
- BO Algorithm Introspection (Optional): Uses
BOInjectorandAlgorithmCritic(bo_injector.py) to inject monitoring code specifically into Bayesian Optimization algorithms. This allows tracking internal metrics during the optimization run, such as:- Surrogate model R² score (on test and training data).
- Surrogate model uncertainty.
- Search space coverage metrics (grid-based, clustering-based using
CoverageCluster). - Exploitation vs. Exploration metrics (distance to best points, acquisition score analysis via
EvaluatorSearchResult).
- Parallelism: Supports parallel evaluation using
MPI(as seen inIOHEvaluator). Specifically,MPITaskManagerprovides an MPI-based master-worker framework, which can be used to distribute evaluation tasks across multiple nodes. This is particularly useful for large-scale evaluations across distributed systems.
Usage:
- Instantiate: Create an instance of a specific evaluator subclass (e.g.,
IOHEvaluator) with configuration like budget, dimension, and target problems/instances. - Evaluate: Call the
evaluatemethod, providing the generated Python code string and the name of the main class within that code. Optional arguments control parallelism (max_eval_workers) and timeouts. - Process Results: The
evaluatemethod returns anEvaluatorResultobject. This object contains a list ofEvaluatorBasicResultobjects, each holding the detailed metrics, history, and potential errors for a single evaluation run (e.g., one IOH instance).
Customization:
- New Benchmarks: Create a new class inheriting from
AbstractEvaluator. Implement the required methods (evaluate,problem_name, etc.). You'll likely need a wrapper for your objective function (similar toIOHObjectiveFn) to manage budget and history tracking. - New Metrics: Extend
EvaluatorBasicResultorEvaluatorSearchResultto store additional metrics. Modify the relevant evaluator or create/modify anExecInjectorsubclass (exec_utils.py,bo_injector.py) to compute and record these metrics during or after code execution.
The Population component (llamevol/population/) manages the collection of candidate algorithms (Individual objects) throughout the evolutionary process.
Structure & Features:
- Abstract Base: Defines
Population(population.py) as an abstract base class, ensuring a consistent interface for different population management strategies. It includes common utilities like saving/loading populations (usingpickle) and calculating diversity metrics. - Concrete Implementations:
ESPopulation(es_population.py): Implements an Evolution Strategy-style population (e.g., (μ+λ) or (μ,λ)).- Manages individuals across discrete generations.
- Supports configurable parent pool size (
n_parent), offspring count (n_offspring), and parents per offspring (n_parent_per_offspring). - Handles selection for the next generation, including optional elitism (
use_elitism). - Implements parent selection logic based on combinations and configurable crossover/mutation rates (
cross_over_rate,exclusive_operations). - Allows plugging in custom parent selection (
get_parent_strategy) and survival selection (selection_strategy) functions.
IslandESPopulation(island_population.py): Implements an island model using multipleESPopulationinstances.- Manages multiple sub-populations (islands) concurrently.
- Introduces island lifecycles (
IslandStatus: INITIAL, GROWING, MATURE, RESETING, KILLED) and geological ages (IslandAge: WARMUP, CAMBRIAN, NEOGENE) to control evolution dynamics. - Implements migration strategies between islands during specific ages (e.g., CAMBRIAN), potentially based on fitness and diversity (using
desc_similarity). - Supports configurable migration parameters (
migration_batch,cyclic_migration). - Allows islands to be reset or killed based on performance.
SequencePopulation(sequence_population.py): A simpler (potentially non-generational) population structure (currently basic).EnsemblePopulation(ensemble_population.py): Designed to combine multiple populations (currently basic).
- Query Items: Uses
PopulationQueryItemto represent tasks for the main loop, specifying parent individuals for generating offspring. - Diversity Metrics: Provides utility functions in
population.pyto assess population diversity:code_diff_similarity: Based on line-by-line code differences.code_bert_similarity: Uses CodeBERT embeddings for semantic code similarity.desc_similarity: Uses sentence transformers on algorithm descriptions.
- Persistence: Populations can be saved to and loaded from disk using
picklevia thesave()andload()methods.
Usage:
- Instantiate: Create an instance of a specific population class (e.g.,
ESPopulation) with desired parameters (e.g.,n_parent,n_offspring). Optionally provide custom strategy functions. - Get Tasks: Call
get_offspring_queryitems()to get a list ofPopulationQueryItemobjects. Each item indicates which parent(s) should be used to generate a new offspring. - Add Individuals: After an offspring is generated and evaluated by the LLM and Evaluator, add the resulting
Individualobject to the population usingadd_individual(individual, generation). - Advance Generation: Call
select_next_generation()to apply the survival selection mechanism and advance the population state to the next generation (primarily forESPopulation). - Retrieve Data: Access individuals using methods like
get_best_individual(),get_individuals(generation),all_individuals().
Customization:
- Strategies: Implement custom functions for parent selection (
get_parent_strategy) and survival selection (selection_strategy) and pass them to the constructor ofESPopulationorIslandESPopulation. - New Population Types: Create a new class inheriting from
Population. Implement all abstract methods (get_population_size,add_individual,remove_individual,get_offspring_queryitems,get_current_generation,get_best_individual,all_individuals) to define a completely new population management scheme. - Diversity Metrics: Add new diversity calculation functions in
population.pyor elsewhere and integrate them into selection or migration strategies.