Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ Before you start training, however, please follow the installation instructions
Then use the same command as before, but provide the CARL environment, in this example CARLCartPoleEnv,
and information about the context distribution as keywords:
```bash
python mighty/run_mighty.py 'algorithm=dqn' 'env=CARLCartPole' 'num_envs=10' '+env_kwargs.num_contexts=10' '+env_kwargs.context_feature_args.gravity=[normal, 9.8, 1.0, -100.0, 100.0]' 'env_wrappers=[mighty.mighty_utils.wrappers.FlattenVecObs]'
python mighty/run_mighty.py 'algorithm=ppo' 'env=CARLCartPole' '+env_kwargs.num_contexts=10' '+env_kwargs.context_feature_args.gravity=[normal, 9.8, 1.0, -100.0, 100.0]' 'env_wrappers=[mighty.mighty_utils.wrappers.FlattenVecObs]' 'algorithm_kwargs.rollout_buffer_kwargs.buffer_size=2048'
```

For more complex configurations like this, we recommend making an environment configuration file. Check out our [CARL Ant](mighty/configs/environment/carl_walkers/ant_goals.yaml) file to see how this simplifies the process of working with configurable environments.
Expand Down
10 changes: 5 additions & 5 deletions examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ python mighty/run_mighty.py 'env=CartPole-v1'
We can also be more specific, e.g. by adding our desired number of interaction steps and the number of parallel environments we want to run:

```bash
python mighty/run_mighty.py 'env=CartPole-v1' 'num_steps=50_000' 'num_envs=10'
python mighty/run_mighty.py 'env=CartPole-v1' 'num_steps=50_000' 'num_envs=16'
```
For some environments, including CartPole-1, these details are pre-configured in the Mighty configs, meaning we can use the environment keyword to set them all at once:

Expand All @@ -98,7 +98,7 @@ python mighty/run_mighty.py 'environment=gymnasium/cartpole' 'algorithm=dqn' 'al
Or to use e.g. an ez-greedy exploration policy for DQN:

```bash
python mighty/run_mighty.py 'environment=gymnasium/cartpole' 'algorithm=dqn' '+algorithm_kwargs.policy_class=mighty.mighty_exploration.EZGreedy'
python mighty/run_mighty.py 'environment=gymnasium/cartpole' 'algorithm=dqn' 'algorithm_kwargs.policy_class=mighty.mighty_exploration.EZGreedy' 'algorithm_kwargs.policy_kwargs=null'
```
You can see that in this case, the value we pass to the script is a class name string which can take the value of any function you want, including custom ones as we'll see further down.
</details>
Expand All @@ -109,7 +109,7 @@ You can see that in this case, the value we pass to the script is a class name s
The meta components are a bit more complex, since they are a list of class names and optional keyword arguments:

```bash
python mighty/run_mighty.py 'env=CartPole-v1' 'num_steps=50_000' 'num_envs=10' '+algorithm_kwargs.meta_methods=[mighty.mighty_meta.RND]'
python mighty/run_mighty.py 'env=CartPole-v1' 'num_steps=50_000' 'num_envs=16' '+algorithm_kwargs.meta_methods=[mighty.mighty_meta.RND]'
```
As this can become complex, we recommend configuring these in Hydra config files.
</details>
Expand All @@ -121,7 +121,7 @@ Hydra has a multirun functionality with which you can specify a grid of argument
Its best use is probably for easily running multiple seeds at once like this:

```bash
python mighty/run_mighty.py 'env=CartPole-v1' 'num_steps=50_000' 'num_envs=10' 'seed=0,1,2,3,4' 'output_dir=examples/multiple_runs' -m
python mighty/run_mighty.py 'env=CartPole-v1' 'num_steps=50_000' 'num_envs=16' 'seed=0,1,2,3,4' 'output_dir=examples/multiple_runs' -m
```
</details>

Expand Down Expand Up @@ -196,7 +196,7 @@ Compare their structure: the custom policy has a fixed set of methods inherited

If you want to run these custom modules, you can do so by adding them by their import path:
```bash
python mighty/run_mighty.py 'algorithm=dqn' '+algorithm_kwargs.policy_class=examples.custom_policy.QValueUCB' '+algorithm_kwargs.policy_kwargs={}'
python mighty/run_mighty.py 'algorithm=dqn' 'algorithm_kwargs.policy_class=examples.custom_policy.QValueUCB' 'algorithm_kwargs.policy_kwargs=null'
```
For the meta-module, it works exactly the same way:
```bash
Expand Down
59 changes: 28 additions & 31 deletions examples/hypersweeper_smac_example_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,49 +17,46 @@ env_kwargs: {}
env_wrappers: []
num_envs: 64

# @package _global_
algorithm: PPO

algorithm_kwargs:
# Hyperparameters
n_policy_units: 128
n_critic_units: 128
soft_update_weight: 0.01
rescale_action: False
tanh_squash: False

rollout_buffer_class:
_target_: mighty.mighty_replay.MightyRolloutBuffer # Using rollout buffer
_target_: mighty.mighty_replay.MightyRolloutBuffer

rollout_buffer_kwargs:
buffer_size: 4096 # Size of the rollout buffer.
gamma: 0.99 # Discount factor for future rewards.
gae_lambda: 0.95 # GAE lambda.
obs_shape: ??? # Placeholder for observation shape
act_dim: ??? # Placeholder for action dimension
buffer_size: 128 # (16 × 128 = 2048 total)
gamma: 0.99
gae_lambda: 0.95
obs_shape: ???
act_dim: ???
n_envs: ???

discrete_action: ???

# Training
learning_rate: 3e-4
batch_size: 1024 # Batch size for training.
gamma: 0.99 # The amount by which to discount future rewards.
n_gradient_steps: 3 # Number of epochs for updating policy.
ppo_clip: 0.2 # Clipping parameter for PPO.
value_loss_coef: 0.5 # Coefficient for value loss.
entropy_coef: 0.01 # Coefficient for entropy loss.
max_grad_norm: 0.5 # Maximum value for gradient clipping.

# Optimiser and update settings
learning_rate: 3e-4
batch_size: 2048 # 16 environments × 128 steps = 2048 total samples
gamma: 0.99
ppo_clip: 0.2
value_loss_coef: 0.5
entropy_coef: 0.01
max_grad_norm: 0.5 # gradient clipping

hidden_sizes: [64, 64]
activation: 'tanh'
hidden_sizes: [256, 256]
activation: "tanh"

n_epochs: 10
minibatch_size: 64
kl_target: 0.01
use_value_clip: True
value_clip_eps: 0.2
n_gradient_steps: 1 # one gradient step per rollout
n_epochs: 10 # ten update epochs per rollout
minibatch_size: 128 # 2048 ÷ 64 = 32 minibatches
kl_target: null # disable KL‑based early stopping
use_value_clip: true

policy_class: mighty.mighty_exploration.StochasticPolicy # Policy class for exploration
policy_class: mighty.mighty_exploration.StochasticPolicy
policy_kwargs:
entropy_coefficient: 0.0 # Coefficient for entropy-based exploration.
entropy_coefficient: 0.0


# Training
eval_every_n_steps: 1e4 # After how many steps to evaluate.
Expand Down
59 changes: 28 additions & 31 deletions examples/optuna_example_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,49 +18,46 @@ env_kwargs: {}
env_wrappers: []
num_envs: 64

# @package _global_
algorithm: PPO

algorithm_kwargs:
# Hyperparameters
n_policy_units: 128
n_critic_units: 128
soft_update_weight: 0.01
rescale_action: False
tanh_squash: False

rollout_buffer_class:
_target_: mighty.mighty_replay.MightyRolloutBuffer # Using rollout buffer
_target_: mighty.mighty_replay.MightyRolloutBuffer

rollout_buffer_kwargs:
buffer_size: 4096 # Size of the rollout buffer.
gamma: 0.99 # Discount factor for future rewards.
gae_lambda: 0.95 # GAE lambda.
obs_shape: ??? # Placeholder for observation shape
act_dim: ??? # Placeholder for action dimension
buffer_size: 128 # (16 × 128 = 2048 total)
gamma: 0.99
gae_lambda: 0.95
obs_shape: ???
act_dim: ???
n_envs: ???

discrete_action: ???

# Training
learning_rate: 3e-4
batch_size: 1024 # Batch size for training.
gamma: 0.99 # The amount by which to discount future rewards.
n_gradient_steps: 3 # Number of epochs for updating policy.
ppo_clip: 0.2 # Clipping parameter for PPO.
value_loss_coef: 0.5 # Coefficient for value loss.
entropy_coef: 0.01 # Coefficient for entropy loss.
max_grad_norm: 0.5 # Maximum value for gradient clipping.

# Optimiser and update settings
learning_rate: 3e-4
batch_size: 2048 # 16 environments × 128 steps = 2048 total samples
gamma: 0.99
ppo_clip: 0.2
value_loss_coef: 0.5
entropy_coef: 0.01
max_grad_norm: 0.5 # gradient clipping

hidden_sizes: [64, 64]
activation: 'tanh'
hidden_sizes: [256, 256]
activation: "tanh"

n_epochs: 10
minibatch_size: 64
kl_target: 0.01
use_value_clip: True
value_clip_eps: 0.2
n_gradient_steps: 1 # one gradient step per rollout
n_epochs: 10 # ten update epochs per rollout
minibatch_size: 128 # 2048 ÷ 64 = 32 minibatches
kl_target: null # disable KL‑based early stopping
use_value_clip: true

policy_class: mighty.mighty_exploration.StochasticPolicy # Policy class for exploration
policy_class: mighty.mighty_exploration.StochasticPolicy
policy_kwargs:
entropy_coefficient: 0.0 # Coefficient for entropy-based exploration.
entropy_coefficient: 0.0


# Training
eval_every_n_steps: 1e4 # After how many steps to evaluate.
Expand Down
4 changes: 3 additions & 1 deletion mighty/mighty_agents/dqn.py
Original file line number Diff line number Diff line change
Expand Up @@ -121,8 +121,10 @@ def __init__(

# Policy Class
policy_class = retrieve_class(cls=policy_class, default_cls=EpsilonGreedy) # type: ignore
if policy_kwargs is None:
if policy_kwargs is None and isinstance(policy_class, EpsilonGreedy):
policy_kwargs = {"epsilon": 0.1} # type: ignore
elif policy_kwargs is None:
policy_kwargs = {}
self.policy_class = policy_class
self.policy_kwargs = policy_kwargs

Expand Down
Loading