This repo contains scripts to make it easier to set up a development environment for METR Task Standard tasks. It is intended to be installed as a CLI tool viv-task-dev.
'Live' development
- No more waiting for your container to build again after every change!
- Make changes to task method and immediately see the results
- Much faster! :D
Better matching of task-dev env with run envs
- Root folder structure basically identical to root folder structure in a run (excluding dotfiles)
- See 'other differences to note' section
VSCode dev environment
- Push and pull the mp4-tasks repo like normal
- Includes your extensions and settings
- Quickly see folder structure and file contents
- Yay debugging!
Start trial runs with an agent from within the container!
Aliases for common task-dev commands
prompt!- Print the prompt for a task to the terminalbuild_steps!- Run the tasksbuild_steps.jsonstepsinstall!- Run a task's install methodrelink!- Refresh the symlinks in/rootthat point to the task family directorystart!- Run a task's start methodscore!- Run a task's score methodtasks!- Run a family's get_tasks methodpermissions!- Run a task's get_permissions methodtrial!- Start a trial run with an agent (not supported with a local instance of Vivaria currently)settask!- Set a 'task' env var for quicker running of other aliases
- Install the docker CLI (if you install docker desktop, this will be included)
- Install and set up vivaria if you haven't already (to the point where you can run an agent on a task)
- Run
curl -fsSL https://raw.githubusercontent.com/METR/viv-task-dev/main/install.sh | sh- To re-use a version of vivaria that you already have checked out, set the
TASK_DEV_VIVARIA_DIRenv var to the path of the vivaria dir. - e.g.
curl -fsSL https://raw.githubusercontent.com/METR/viv-task-dev/main/install.sh | env TASK_DEV_VIVARIA_DIR=/path/to/vivaria sh
- To re-use a version of vivaria that you already have checked out, set the
To start a task dev env for a given family:
cd <task-family-dir>
viv-task-dev <a-container-name> [additional-docker-args]You can pass additional docker args to the container, e.g. --volume <host-dir>:<container-dir> to add extra directories to the container, or --env-file <path-to-env-file> to set env vars for the container.
The container includes aliases for common task-dev commands.
These can be viewed and edited in the container's /root/.bashrc.
Print the prompt for a task to the terminal
Aliases that take a single task can also be run without specifying a task if the DEV_TASK env var is set.
E.g
Runs the families install method
Runs the steps defined in the task's build_steps.json file, to simulate how the steps are added to (and run from) the Dockerfile in Vivaria.
The /root directory in the container contains symlinks pointing to every file and directory in the task family directory at /tasks/$TASK_DEV_FAMILY.
If you add new files to /tasks/$TASK_DEV_FAMILY, these won't be automatically symlinked in /root, and if you delete files the existing symlinks in /root will break. To fix these issues, run relink! to refresh the symlinks in /root.
Run a task's start method
Home agent directory after start
(Note that instructions.txt is not present, since instructions.txt is a special file that is auto created when a run is started - and is not controlled by the task dev)
Set the task to be used by the other aliases.
Usage: settask! <task_name>
(This just appends export DEV_TASK=<task_name> to root's .bashrc and then sources it.)
Runs the task's score method
Runs the families get_tasks method, which returns the dictionary of task dicts.
Also available as get_tasks!
Gets the permissions for the task
Also available as get_permissions!
Agent runs are often very useful for finding task ambiguities or problems.
trial! starts a run on the given task.
- All runs started with
trial!have metadata{"task_dev": true}for easy filtering in later analysis - Uses 4o advising 4om agent (fast and reasonably competent)
- Opens the run in the browser
- Note: The
trial!command does not currently work with a local instance of Vivaria. If you are using a locally installed version of Vivaria, you should run agents outside of this development environment
Can always do python and something like this:
>>> from FAMILY import TaskFamily
>>> tf = TaskFamily()
>>> tf.get_tasks(task)
To distinguish task-dev specific things from what will be available in the run env:
- Task-dev env vars and shell funcs are prefixed with
DEV - All task-dev aliases are suffixed with !
- Where possible, all task-dev specific files are in
/app
- Some functionality is handled by Vivaria code rather than the task code. So doesn't happen in a task-dev env automatically:
- Task dev envs do not populate the
instructions.txtfile with the task's prompt, but the run env does. - Env vars put in
required_environment_variablesin the TaskFamily declaration are not forced to be required in this task-dev env but are in run envs. - Run envs are created with auxiliary VMs if a family has
get_aux_vm_specmethod. This is not done in this task-dev env. - The steps defined in
build_steps.jsonare not added to the Dockerfile, because this is done by Vivaria
- Task dev envs do not populate the
vivis not installed by default in the run env but is in the task-dev env- dotfiles in
/rootshouldn't be relied on to be present or the same in a run - Any env vars prefixed with
DEVwill not be available in a run - Any shell funcs suffixed with
!will not be available in a run - Any files in
/taskswill not be available in a run - Probably others I'm not aware of (please open an issue if you find any)
To update viv-task-dev to the latest version, simply re-run install.sh.
- (Maybe) Call
docker commitcommands from within the container - (Unlikely) Some general way to "undo" TaskFamily methods for easier testing








