Code for the paper "Beyond Fixed Tasks: Unsupervised Environment Design for Task-Level Pairs"
by Daniel Furelos-Blanco, Charles Pert, Frederik Kelbel, Alex F. Spies, Alessandra Russo, and Michael Dennis.
Published at the AAAI Conference on Artificial Intelligence (AAAI), 2026.
- Miniforge3 (or any Conda distribution).
- For GPU support: CUDA 12.
- Install Miniforge (if not already installed):
curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"- Activate it:
source miniforge3/bin/activate- Create the environment (Python 3.10) and activate it:
conda create --name atlas python=3.10 ffmpeg graphviz -c conda-forge
conda activate atlas- Install the package and requirements:
cd atlas
# For GPU (CUDA 12)
pip install -e . --find-links https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
# For CPU only:
# Edit requirements.txt first: change jax[cuda12] to jax[cpu]
# Then run: pip install -e .- Configure Weights & Biases:
wandb loginThe source code is contained within the atlas folder. We describe the different main folders below.
High-level implementation of the agents used in the experiments, see hrm_conditioned_agent.py specifically:
the input HRM(s) are embedded and passed as an input to an RNN-based agent.
Implements different HRM-embedding strategies:
dummydoes nothing, outputs an empty embedding.vanillaoutputs a one-hot embedding indicating the current RM state.rgcnoutputs the embedding for the current RM state using a graph convolutional network over the RM graph.
Implements the environment-related components, including:
- Levels and level sampling. Specify the parameters of each level and methods for sampling them (mainly focusing on sampling levels with different numbers of rooms).
- Literal embedding. Implements domain-specific methods for embedding the literals labeling the formulas of an HRM.
- Mutators. Implements domain-specific mutation operators (edits).
- Networks. Implements the network for the Minigrid environment.
- Problem Sampling. Implements the level-conditioned and HRM-conditioned problem sampling strategies.
- Labeling Function. Implements the mapping from environment observations to sets of propositions.
- Renderer. Implements operations for rendering levels and environment observations.
Additional details can be found in the comments and in the submitted paper. The Minigrid implementation wraps the XLand-Minigrid one by Nikulin et al. (2024).
Implements different methods for loading a validation set for training.
Implements the HRM formalism as well as different samplers. Here, single_path_flat stands for the sequential sampler
in the paper.
Implements the RNN and a generic actor-critic architecture used in the implemented environments.
Implements the base problem sampler, which samples tasks and levels independently.
Implements the UED algorithms used in the paper: DR and PLR. ACCEL is implicitly implemented within PLR.
Implements some UED-related functions: the replay buffer and different scoring functions.
The buffer includes some modifications from the JaxUED implementation by Coward et al. (2024), including sampling without replacement, and tie-breaking and the use of lower or equal comparisons to decide the problem to substitute upon insertion. Further, we experimented with different scoring functions, and interpolations of them.
The default configuration (with replacement, no tie-breaking, strictly lower comparison, single scoring with MaxMC and PVL) was used in all paper experiments, as the variants did not yield significant improvements. However, we retain these alternative implementations as they may be useful for future work or different domains.
Implements some auxiliary functions for checkpointing, evaluation (rollouts), logging (to Weights and Biases), math operations, plotting (diagnostics uploaded to Weights and Biases), rendering rollouts and training (e.g. PPO).
The problems directory contains the different problem sets involved in the paper:
00-validation-setis the validation set, the only one used at training time.01-cvar-sequentialis the set for computing the CVaR for problems generated with the sequential sampler.02-cvar-dagsis the set for computing the CVaR for problems generated with the random walk-based sampler.03-hand-designedis the set containing the 150 hand-designed problems. The_rendereddirectory contains the illustrations for the levels and the HRMs.
The validation and CVaR sets are automatically generated and filtered to ensure they contain solvable problems (i.e., problems where the task can be realized in the level). See Appendix E.3 for details on the solvability checking approach.
We describe how to run the experiments to reproduce the results in the paper, run the evaluations from the training runs and produce the final plots.
The experiments directory contains a folder for each set of experiments in the paper.
All experiments are determined using configuration files building on the structure determined in the config folder.
The experiments are the following:
00-sweepare the initial sweeps to refine some of the hyperparameters (see Appendix E.1).from-fullcorrespond to experiments using sampling from the full training distribution (PLR, ACCEL).from-scratchcorrespond to experiments using sampling from the simple problem distribution (ACCEL-0).
01-coreare the experiments for the main results (Section 5.2, Appendix E.4) and the problem sampling ablations (Section 5.3, Appendix E.5).02-vanilla-conditioningare the ablation experiments using the vanilla conditioner, i.e. conditioning on the RM state id (see Appendix E.8).03-myopicare the ablation experiments using the graph neural network with a single layer (see Appendix E.8).04-domain-independent-literal-embeddingsare the ablation experiments using domain independent literal embeddings, i.e. not exploiting the proposition structure (see Appendix E.8).05-num-mutationsare the ablation experiments analyzing shorter and longer edit sequences (Section 5.5, Appendix E.7).06-mutation-typesare the ablation experiments where some edit types are removed (Section 5.5, Appendix E.7).07-dag-samplingare the ablation experiments on the task sampling, where the default sequential sampler is substituted with a random walk-based sampler that produces RMs as directed acyclic graphs (see Section 5.4, Appendix E.6).08-pvlare the ablation experiments on the scoring function, where PVL is used instead of MaxMC (Appendix E.9).
To run any of the experiments above, follow these steps:
- Find a file starting with
sweepcorresponding to the experiment to run, e.g.experiments/training/01-core/plr/sweep.yaml. - Open the file and fill the
entityfield with the W&B entity where you want to log the results. Do the same with the correspondingconfig.yaml. - Run the command
wandb sweep experiments/training/01-core/plr/sweep.yaml(using the path to the sweep path you chose). This will create a new sweep in your W&B. The output should be something like the following (the sweep ID will be different):
wandb: Creating sweep from: experiments/training/01-core/plr/sweep.yaml
wandb: Creating sweep with ID: 8h61u6kz
wandb: View sweep at: https://wandb.ai/YOUR_ENTITY/atlas/sweeps/8h61u6kz
wandb: Run sweep agent with: wandb agent YOUR_ENTITY/atlas/8h61u6kz- The next step is to launch an experiment picked from the sweep using the following command (you can queue a sequence by setting
counthigher than 1):
python scripts/sweeping/run_wandb_sweep.py --sweep_id YOUR_SWEEP_ID --count 1 --project atlas --entity YOUR_ENTITYOnce the training runs have been completed in the step above, it is time to run the evaluations on the CVaR sets and the hand-designed set. The evaluations will also be logged into W&B. WARNING: Note that you will need to edit the W&B run identifiers from all files to yours.
Run the following command for sequential and random walk-based sampling.
python experiments/evaluation/cvar/data_collection/run_eval_seq_cond_set.py
python experiments/evaluation/cvar/data_collection/run_eval_rw_cond_set.pyOnce the evaluation is complete, the results from W&B can be dumped into .csv files using:
python experiments/evaluation/cvar/data_collection/dump_eval_cond_set.pyThe results are currently dumped in the file experiments/evaluation/cvar/data_collection/cvar_seq.csv
and experiments/evaluation/cvar/data_collection/cvar_rw.csv.
To evaluate performance only at the end of training:
python experiments/evaluation/handcrafted/data_collection/run_eval_last_checkpoint.pyTo evaluate different checkpoints through training (to later produce the learning curve):
python experiments/evaluation/handcrafted/data_collection/run_eval_checkpoint_seq.pyThe notebook experiments/evaluation/curriculum/curriculum.ipynb produces the plots for the curriculum analysis
shown in Figures 5, 19 and 22. WARNING: the buffer data is already dumped into .csv files (seq_buffer_data.csv
and rw_buffer_data.csv), so there is no need to dump the checkpoints, which occupy a lot of space.
The script experiments/evaluation/cvar/plot_cvar_cond.py produces the CVaR plots shown in Figures 3a, 18a and 21a.
The script experiments/evaluation/generated_samples/render_generated_samples.py dumps some samples generated
samples by the different algorithms at different times during training. This is done directly from artifacts in
W&B, so existing runs are needed.
To produce the learning curve plots (Figures 3b and 18b), run the following command:
python experiments/evaluation/handcrafted/aggregate/iqm_curve.pyTo produce the IQM solve rate at the end of training plots (Figures 18c, 21b, 26-28), run the following command:
python experiments/evaluation/handcrafted/aggregate/iqm.pyTo produce the solve rate per problem plots (Figure 4), run the following command:
python experiments/evaluation/handcrafted/per_problem/per_instance.pyTo produce the solve rate per problem tables (Tables 5-7), run the following command:
python experiments/evaluation/handcrafted/per_problem/latex_table.pyTo obtain how the presence of mutations evolves in the buffer over time (Figure 17),
run the notebook experiments/evaluation/mutations/mutations.ipynb. It requires
substituting the W&B run identifiers.
To obtain the solvability over time (Figures 12, 20, 25), run the following command:
python experiments/evaluation/solvability/buffer_over_time/plot_solvability_over_time.pyThe input .csv, which is already provided, can be obtained using the gen_solvability_over_time_jobs.py script that
generates jobs in a PBS cluster. Alternatively, you can run eval_solvability_over_time.py for each instant run and
desired timestep.
To obtain the percent of solvable problems per batch (Tables 3-4), run the following command:
python experiments/evaluation/solvability/rate_per_batch/eval_solvable_per_batch.pyThe notebooks directory contains several Jupyter notebooks that exemplify how to use
environments and HRMs:
env/01 Environment Interactionshows how to run a rollout for a level-HRM pair.hrms/01 HRM Exampleshows several steps across a complex HRM.hrms/02 HRM Sampling Exampleshows the use of sequential and random walk-based samplers.hrms/03 HRM Rendering Exampleshows how to render HRMs.hrms/04 HRM Traversal Speedperforms some tests on the speed traversal of different HRMs.problems/01 Problem Samplingexemplifies different types of task-level sampling strategies (independent, level-conditioned, HRM-conditioned).
The scripts/xminigrid/manual_control.py script enables interacting with the environment
via keyboard, moving an agent in randomly sampled task-level pairs. WARNING: The first
step will take a bit long since the JAX-compilation of the function will be happening at that
time.
The tests directory contains some automated tests:
hrm/testverifies that HRM traversals are correctly done across diverse HRMs.test_conditionerverifies the correctness of different conditioning strategies. WARNING: The check for RM embedding correctness employs a prototype implementation of HRM embeddings involving more than one RM. Note that in the paper we examine HRMs with a single RM.test_wrappersverifies the HRM wrappers work correctly.test_xminigrid_labeling_functionverifies the correctness of the labeling function for Minigrid.
If you use this code in your research, please cite our paper:
@inproceedings{FurelosBlancoPKSRD26,
author = {Furelos-Blanco, Daniel and Pert, Charles and Kelbel, Frederik and Spies, Alex F. and Russo, Alessandra and Dennis, Michael},
title = {{Beyond Fixed Tasks: Unsupervised Environment Design for Task-Level Pairs}},
booktitle = {{AAAI} Conference on Artificial Intelligence (AAAI)},
year = {2026},
}