Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 13 additions & 4 deletions paper/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,15 +54,15 @@ We introduce *Mighty*: a modular library designed to enable research at the inte

Mighty is designed around three design principles: *flexibility, smooth integration with existing libraries, and environment parallelization*. First, flexibility is central. Mighty exposes transitions, predictions, networks, and environments to meta-methods, enabling a broad range of research patterns including black-box outer loops, algorithm-informed inner loops, and environment-level interventions. Second, Mighty integrates smoothly with Gymnasium [@towers-arxiv24a], Pufferlib [@suarez-rlc25], CARL [@benjamins-tmlr23a], and can interface with tools such as evosax [@evosax2022github] in under $100$ lines of code. This minimizes the glue code while preserving flexibility. Finally, Mighty uses standard Python and PyTorch for optimized networks with vectorized CPU environments for fast environment interaction. This design offers high training speeds, even for purely CPU-based environments, without sacrificing algorithmic modularity or code clarity.

## Existing Tools for RL and Meta RL
## Related Work

The rapidly growing ecosystem of RL libraries spans diverse design philosophies -- from low-level composability [@weng-jmlr22a] to turnkey baselines [@raffin-jmlr21a; @huang-jmlr22a] and massive-scale engines [@toledo-misc24a] -- making direct comparison and tool selection challenging. Modular research frameworks expose the internal building blocks of an RL pipeline as standalone components that can be re-combined to quickly prototype new algorithms.
TorchRL [@bou-arxiv23a] pioneered this approach in the PyTorch ecosystem, introducing the TensorDict abstraction to seamlessly pass the observations, actions and rewards between modules. Tianshou [@weng-jmlr22a] offers a similarly flexible design with separate *Policy*, *Collector*, and *Buffer* classes, enabling researchers to switch custom exploration strategies or data collection schemes with minimal boilerplate. Although these libraries excel at inner loop algorithm development and fine‐grained experimentation, counter to Mighty, they leave higher‐order workflows such as curriculum learning or meta-adaptation across tasks to external scripts or user‐written loops. Monolithic baselines such as stable baselines3 (SB3) [@raffin-jmlr21a] and CleanRL/PureJaxRL [@huang-jmlr22a; @lu-neurips22a] prioritize ease of use and reproducibility. However, this simplicity comes at the cost of extensibility: SB3's algorithms hide most of the training loop behind a single `learn()` call, and CleanRL's single file scripts are not designed for import or extension. Scalable platforms such as RLlib [@liang-icml18a; @liang-neurips21a] and STOIX [@toledo-misc24a] focus on maximizing throughput and supporting distributed execution. Although these systems shine when running large experiments, their APIs do not natively unify component modularity with built‐in meta-learning or curriculum design.
Mighty occupies the middle ground, offering efficient single-node performance via PyTorch, straightforward multicore environment parallelism, and a modular interface within the same cohesive framework.
Mighty occupies the middle ground, offering efficient single-node performance via PyTorch, straightforward multicore environment parallelism, and a modular interface within the same cohesive framework.

## Key Features
## Software Design

Mighty accelerates development and experimentation through an intuitive interface, modular algorithms, and flexible support for meta-methods extending beyond vanilla RL.
Mighty is organized around three abstractions: (i) an Agent assembled from modular components (exploration, buffer, update rule, network parameterization), (ii) a Contextual MDP interface that treats environments as families parameterized by context, and (iii) a meta-layer split into runners (between-run orchestration such as HPO or population methods) and meta-components (within-run interventions via hook points). This separation keeps the training loop stable while enabling extension through Hydra configuration rather than editing core code.

**User Interface:** Mighty prioritizes usability and flexibility. We use Hydra [@yadan-github19a] for structured configuration files that expose all relevant training details without overwhelming new users. This also plugs Mighty into Hydra’s ecosystem for cluster execution and hyperparameter optimization. The algorithm components in Mighty are modular and can be replaced via configurations, allowing users to integrate new components without editing the training loop. *This keeps projects small, maintainable, and research-focused.* For example, to integrate domain randomization [@tobin-iros17] via Syllabus [@sullivan-rlj25], we need around $100$ lines of code each to interface Syllabus and build a custom task wrapper. With the [Mighty project template](https://github.com/automl/mighty_project_template) as a base, *less than $200$ lines of Python code and three configuration files* are enough for a full evaluation, including hyperparameter optimization and cluster deployment (see the [project repository](https://github.com/automl/mighty_dr_example/tree/main) including results).

Expand All @@ -86,6 +86,11 @@ python mighty/run_mighty.py 'algorithm=ppo' 'environment=carl/cartpole' \
python mighty/run_mighty.py --config-name=hypersweeper_smac_example_config -m
```

## Research Impact Statement

Mighty’s research contribution is a unified experimental substrate for studying generalization, Meta-RL, and AutoRL under consistent orchestration. By standardizing the interfaces for contextual environments and outer-loop optimization, Mighty reduces ad hoc scripting, improves comparability across methods, and supports reproducible ablations. Mighty is intended to accelerate research iteration rather than introduce a new RL algorithm or claim state-of-the-art performance by itself.


## Empirical Validation

We validate our implementations by comparing them with OpenRL benchmark results [@huang-arxiv24a]. Our aim is not to outperform existing baselines, but to demonstrate that Mighty achieves comparable performance at similar training budgets.
Expand All @@ -106,4 +111,8 @@ The trends that broadly align are: PPO and DQN on CartPole closely track OpenRL,

We acknowledge contributions from the AutoML community and thank the developers of CARL, DACBench, and other integrated frameworks that make Mighty's unified interface possible.

## AI Usage Disclosure

We used large language model tools in a limited capacity for language editing (clarity, conciseness, and grammar). All technical claims, software design, experiments, and results were produced and verified by the authors, who take full responsibility for the paper and the code.

## References