From 14f88d5b5b7fbea3de8a4fd08e9dfbf606be438e Mon Sep 17 00:00:00 2001
From: Theresa Eimer <t.eimer@ai.uni-hannover.de>
Date: Fri, 19 Dec 2025 12:37:59 +0100
Subject: [PATCH 1/2] Fix missing citations

On line 29: added venue to match bib file
On line 59: I don't think TensorDict was supposed to be a citation, so I just removed the brackets
---
 paper/paper.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/paper/paper.md b/paper/paper.md
index 03b1d77..5cc423f 100644
--- a/paper/paper.md
+++ b/paper/paper.md
@@ -44,7 +44,7 @@ Robust generalization, rapid adaptation, and automated tuning are critical for d
 
 ## Statement of need
 
-Reinforcement learning (RL) has emerged as a powerful decision-making paradigm in complex and dynamic environments. Despite impressive successes in domains such as games [@silver-nature16a; @badia-icml20a; @vasco-rlc24] and robotics [@lee], RL algorithms frequently overfit their training conditions and struggle to generalize to new tasks [@benjamins-tmlr23a; @kirk-jair23a; @mohan-jair24a]. Addressing this challenge requires methods that not only learn efficiently on a single task but also adapt rapidly to novel settings and automatically tune their learning process.
+Reinforcement learning (RL) has emerged as a powerful decision-making paradigm in complex and dynamic environments. Despite impressive successes in domains such as games [@silver-nature16a; @badia-icml20a; @vasco-rlc24] and robotics [@lee-sciro20], RL algorithms frequently overfit their training conditions and struggle to generalize to new tasks [@benjamins-tmlr23a; @kirk-jair23a; @mohan-jair24a]. Addressing this challenge requires methods that not only learn efficiently on a single task but also adapt rapidly to novel settings and automatically tune their learning process.
 
 Recent research has advanced in three complementary directions: (i) Generalization in RL [@benjamins-tmlr23a; @cho-neurips24a; @mohan-jair24a], (ii) Meta-RL methods [@kaushik-iros20a; @beck-arxiv23a], and (iii) Automated RL (AutoRL) [@parkerholder-jair22a; @mohan-automlconf23a; @eimer-icml23a]. Although each has led to promising algorithms, researchers frequently resort to fragmented codebases and ad hoc scripting across environment design, RL training, and meta-optimization. This fragmentation increases engineering effort, impedes rapid iteration, and undermines reproducibility [@paradis-rlc].
 
@@ -57,7 +57,7 @@ Mighty is designed around three design principles: *flexibility, smooth integrat
 ## Existing Tools for RL and Meta RL
 
 The rapidly growing ecosystem of RL libraries spans diverse design philosophies -- from low-level composability [@weng-jmlr22a] to turnkey baselines [@raffin-jmlr21a; @huang-jmlr22a] and massive-scale engines [@toledo-misc24a] -- making direct comparison and tool selection challenging. Modular research frameworks expose the internal building blocks of an RL pipeline as standalone components that can be re-combined to quickly prototype new algorithms.  
-TorchRL [@bou-arxiv23a] pioneered this approach in the PyTorch ecosystem, introducing the [@TensorDict] abstraction to seamlessly pass the observations, actions and rewards between modules. Tianshou [@weng-jmlr22a] offers a similarly flexible design with separate *Policy*, *Collector*, and *Buffer* classes, enabling researchers to switch custom exploration strategies or data collection schemes with minimal boilerplate. Although these libraries excel at inner loop algorithm development and fine‐grained experimentation, counter to Mighty, they leave higher‐order workflows such as curriculum learning or meta-adaptation across tasks to external scripts or user‐written loops. Monolithic baselines such as stable baselines3 (SB3) [@raffin-jmlr21a] and CleanRL/PureJaxRL [@huang-jmlr22a; @lu-neurips22a] prioritize ease of use and reproducibility. However, this simplicity comes at the cost of extensibility: SB3's algorithms hide most of the training loop behind a single `learn()` call, and CleanRL's single file scripts are not designed for import or extension. Scalable platforms such as RLlib [@liang-icml18a; @liang-neurips21a] and STOIX [@toledo-misc24a] focus on maximizing throughput and supporting distributed execution. Although these systems shine when running large experiments, their APIs do not natively unify component modularity with built‐in meta-learning or curriculum design. 
+TorchRL [@bou-arxiv23a] pioneered this approach in the PyTorch ecosystem, introducing the TensorDict abstraction to seamlessly pass the observations, actions and rewards between modules. Tianshou [@weng-jmlr22a] offers a similarly flexible design with separate *Policy*, *Collector*, and *Buffer* classes, enabling researchers to switch custom exploration strategies or data collection schemes with minimal boilerplate. Although these libraries excel at inner loop algorithm development and fine‐grained experimentation, counter to Mighty, they leave higher‐order workflows such as curriculum learning or meta-adaptation across tasks to external scripts or user‐written loops. Monolithic baselines such as stable baselines3 (SB3) [@raffin-jmlr21a] and CleanRL/PureJaxRL [@huang-jmlr22a; @lu-neurips22a] prioritize ease of use and reproducibility. However, this simplicity comes at the cost of extensibility: SB3's algorithms hide most of the training loop behind a single `learn()` call, and CleanRL's single file scripts are not designed for import or extension. Scalable platforms such as RLlib [@liang-icml18a; @liang-neurips21a] and STOIX [@toledo-misc24a] focus on maximizing throughput and supporting distributed execution. Although these systems shine when running large experiments, their APIs do not natively unify component modularity with built‐in meta-learning or curriculum design. 
 Mighty occupies the middle ground, offering efficient single-node performance via PyTorch, straightforward multicore environment parallelism, and a modular interface within the same cohesive framework.  
 
 ## Key Features

From d84f0e9279cd65f18cf8829661274ec3149d620e Mon Sep 17 00:00:00 2001
From: Theresa Eimer <t.eimer@ai.uni-hannover.de>
Date: Tue, 20 Jan 2026 13:50:36 +0100
Subject: [PATCH 2/2] Add missing a

---
 paper/paper.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/paper/paper.md b/paper/paper.md
index 5cc423f..8a5cf3f 100644
--- a/paper/paper.md
+++ b/paper/paper.md
@@ -44,7 +44,7 @@ Robust generalization, rapid adaptation, and automated tuning are critical for d
 
 ## Statement of need
 
-Reinforcement learning (RL) has emerged as a powerful decision-making paradigm in complex and dynamic environments. Despite impressive successes in domains such as games [@silver-nature16a; @badia-icml20a; @vasco-rlc24] and robotics [@lee-sciro20], RL algorithms frequently overfit their training conditions and struggle to generalize to new tasks [@benjamins-tmlr23a; @kirk-jair23a; @mohan-jair24a]. Addressing this challenge requires methods that not only learn efficiently on a single task but also adapt rapidly to novel settings and automatically tune their learning process.
+Reinforcement learning (RL) has emerged as a powerful decision-making paradigm in complex and dynamic environments. Despite impressive successes in domains such as games [@silver-nature16a; @badia-icml20a; @vasco-rlc24] and robotics [@lee-sciro20a], RL algorithms frequently overfit their training conditions and struggle to generalize to new tasks [@benjamins-tmlr23a; @kirk-jair23a; @mohan-jair24a]. Addressing this challenge requires methods that not only learn efficiently on a single task but also adapt rapidly to novel settings and automatically tune their learning process.
 
 Recent research has advanced in three complementary directions: (i) Generalization in RL [@benjamins-tmlr23a; @cho-neurips24a; @mohan-jair24a], (ii) Meta-RL methods [@kaushik-iros20a; @beck-arxiv23a], and (iii) Automated RL (AutoRL) [@parkerholder-jair22a; @mohan-automlconf23a; @eimer-icml23a]. Although each has led to promising algorithms, researchers frequently resort to fragmented codebases and ad hoc scripting across environment design, RL training, and meta-optimization. This fragmentation increases engineering effort, impedes rapid iteration, and undermines reproducibility [@paradis-rlc].