Skip to content

Docker Restoration and Fixes#1431

Open
jeff-hykin wants to merge 67 commits intodevfrom
jeff/fix/docker3
Open

Docker Restoration and Fixes#1431
jeff-hykin wants to merge 67 commits intodevfrom
jeff/fix/docker3

Conversation

@jeff-hykin
Copy link
Member

@jeff-hykin jeff-hykin commented Mar 6, 2026

Problem(s)

  • This is a blocker for G1 rosnav stack parity with Go2, ref DIM-569
  • DockerModules stopped working on dev
  • configure_stream does not support different transport types (Blocker for DIM-569)
  • A failed DockerModule build wouldn't stop the container (leak)
  • A new container name was generated every run (huge build up of containers)
  • We have no tests for docker (how it got broken on dev in the first place)
  • No way to pull an image (rather than local dockerfile)
  • No way to change build args (ex: can't add network flag)
  • Image never rebuilds even if dockerfile edited or build args changes
  • DockerModules didn't deploy in parallel
  • There was a bug with error handling of WorkerManager

Closes DIM-662

Solution

  • Add tests
  • Minimal change to ModuleCoordinator to restore functionality: deploy docker modules in parallel
  • remove configure_stream (not needed) in favor of using set_transport work as intended to allow different stream types
  • stop/cleanup fixes

Breaking Changes

None

How to Test

python ./examples/docker_hello_world/hello_docker.py

Contributor License Agreement

  • I have read and approved the CLA.

spomichter and others added 30 commits January 23, 2026 07:34
… Unitree Go2 Navigation & Exploration Beta

Pre-Release v0.0.8: Unitree Go2 Navigation & Exploration Beta, Transport Updates, Documentation updates, Rerun fixes, Person follow, Readme updates

## What's Changed
* Small docs clarification about stream getters by @leshy in #1043
* Fix split view on wide monitors by @jeff-hykin in #1048
* Docs: Install & Develop  by @jeff-hykin in #1022
* Add uv to nix and fix resulting problems by @jeff-hykin in #1021
* v0.0.8 by @paul-nechifor in #1050
* Style changes in docs by @paul-nechifor in #1051
* Revert "Add uv to nix and fix resulting problems" by @leshy in #1053
* Transport benchmarks + Raw ros transport by @leshy in #1038
* feat: default to rerun-web and auto-open browser on startup (browser … by @Nabla7 in #1019
* bbox detections visual check by @leshy in #1017
* fix: only auto-open browser for rerun-web viewer backend by @Nabla7 in #1066
* move slow tests to integration by @paul-nechifor in #1063
* Streamline transport start/stop methods by @Kaweees in #1062
* Person follow skill with EdgeTAM by @paul-nechifor in #1042
* fix: increase costmap floor z_offset to avoid z-fighting by @Nabla7 in #1073
* Fixed issue #1074 by @alexlin2 in #1075
* ROS transports initial by @leshy in #1057
* Fix System Config Values for LCM on MacOS and Refactor by @jeff-hykin in #1065
* SHM Transport basic fixes by @leshy in #1041
* commented out Mem Transport test case by @leshy in #1077
* Docs/advanced streams update 2 by @leshy in #1078
* Fix more tests by @paul-nechifor in #1071
* feat: navigation docker updates from bona_local_dev by @baishibona in #1081
* Fix missing dependencies by @Kaweees in #1085
* Release readme fixes by @spomichter in #1076

## New Contributors
* @baishibona made their first contribution in #1081

**Full Changelog**: v0.0.7...v0.0.8
…HTTPS from SSH, get_data change, LFS changes

v0.0.9 Release Patch: Git clone change to HTTPS from SSH, get_data change, LFS changes
Release v0.0.10: Manipulation Stack, MuJoCo Simulation, DDS Transport, Web and Native Visualization via Rerun


## Highlights

88+ commits, 20 contributors, 700+ files changed.

The TLDR: **a complete manipulation stack**, **MuJoCo simulation**, **DDS transport**, and **a rewritten visualization pipeline**. Agents are no longer bolted on top — they're refactored as native modules with direct stream access. The entire ROS message dependency has been removed from core DimOS, and we've added VR, phone, and arm teleoperation stacks. You can now vibecode a pick-and-place task from natural language to motor commands. Installation has been significantly streamlined — no more direnv, simpler setup, and the web viewer is now the default.

---

## 🚀 New Features

### Simulation
- **MuJoCo simulation module** — Run any DimOS blueprint in simulation with no hardware. Supports xArm and Unitree embodiments, parses MJCF/URDF for robot properties, monotonic clock timing (no `time.sleep`). `dimos --simulation run unitree-go2` ([#1035](#1035)) by @jca0
- **Simulation teleop blueprints** — Added simulation teleop blueprints for Piper, xArm6, and xArm7. ([#1308](#1308)) by @mustafab0

### Manipulation
- **Modular manipulation stack** — Full planning stack with Drake: FK/IK solvers (Jacobian + Drake optimization), RRT path planning, world model with obstacle monitoring, multi-robot management. xArm6/7 and Piper support. ([#1079](#1079)) by @mustafab0
- **Joint servo and cartesian controllers** — Joint position/velocity controllers and cartesian IK task with Pinocchio solver. PoseStamped stream input for real-time control. ([#1116](#1116)) by @mustafab0
- **GraspGen integration** — Grasp generation via Docker-hosted GPU model. Lazy container startup, thread-safe init, RPC `generate_grasps()` returns ranked PoseArray. ([#1119](#1119), [#1234](#1234)) by @JalajShuklaSS
- **Gripper control** — Gripper RPC methods on control coordinator, exposed adapter property for custom implementations. ([#1213](#1213)) by @mustafab0
- **Detection3D and Object support** — Object input topics, TF support on manipulation module, pointcloud-to-convex-hull for Drake imports. ([#1236](#1236)) by @mustafab0
- **Agentic pick and place** — Reimplemented manipulation skills for agent-driven pick-and-place workflows. ([#1237](#1237)) by @mustafab0

### Teleoperation
- **Quest VR teleoperation** — Full WebXR + Deno bridge stack. Quest controller data (pose, trigger, grip) streamed to DimOS modules. Monitor-style locking for control loops. ([#1215](#1215)) by @ruthwikdasyam
- **Phone teleoperation** — Control Go2 from your phone with a web-based teleop interface. ([#1280](#1280)) by @ruthwikdasyam
- **Arm teleop with Pinocchio IK** — Single and dual arm teleoperation using Pinocchio inverse kinematics. Blueprints for xArm, Piper, and dual configurations. ([#1246](#1246)) by @ruthwikdasyam

### Transports & Infrastructure
- **DDS transport protocol** — CycloneDDS transport with configurable QoS (high-throughput and reliable profiles). Optional install, benchmark integration. ([#1174](#1174)) by @Kaweees
- **Pubsub pattern subscriptions** — Glob and regex pattern matching for topic subscriptions. `subscribe_all()` for bridge-style consumers. Topic type encoding in channel strings (`/topic#module.ClassName`). ([#1114](#1114)) by @leshy
- **LCM raw bytes passthrough** — Skip `lcm_encode()` when message is already bytes. ([#1223](#1223)) by @leshy
- **Unified TimeSeriesStore** — Pluggable backends (InMemory, SQLite, Pickle, PostgreSQL) with SortedKeyList for O(log n) operations. Replaces the old replay system and TimestampedCollection. Collection API with slice, range, and streaming methods. ([#1080](#1080)) by @leshy
- **DimosROS benchmark tests** — Benchmark suite for ROS transport performance. ([#1087](#1087)) by @leshy

### Navigation
- **FASTLIO2 support** — Hardware-verified localization with arm64 support. Docker deployment with FAR Planner, terrain analysis, and bagfile playback mode. Builds or-tools from source on arm64. ([#1149](#1149)) by @baishibona
- **Native Livox + FASTLIO2 module** — First-class DimOS native module for Livox Mid-360 lidar with FASTLIO2 localization. ([#1235](#1235)) by @leshy

### Visualization
- **RerunBridge module and CLI** — New bridge that subscribes to all LCM messages and logs those with `to_rerun()` to Rerun viewer. GlobalConfig singleton, web viewer support. Replaces the old rerun initialization system. ([#1154](#1154)) by @leshy
- **Webcam rerun visualization** — Camera module logs to Rerun with pinhole projection for 3D visualization. ([#1117](#1117)) by @ruthwikdasyam
- **Default viewer switched to rerun-web** — Browser-based viewer is now the default for broader compatibility. No native viewer install needed. ([#1324](#1324)) by @spomichter

### Agents
- **Agent refactor** — Restructured agent module with cleaner imports and global config integration. ([#1211](#1211)) by @paul-nechifor
- **Timestamp knowledge** — Agents now have timestamp awareness in prompts for temporal reasoning. ([#1093](#1093)) by @ClaireBookworm
- **Observe skill** — Go2 can now observe (capture and describe) its environment via agent skill. ([#1109](#1109)) by @paul-nechifor

### Platform & Hardware
- **G1 without ROS** — Unitree G1 blueprints decoupled from ROS dependency. Lazy imports for fast startup. ([#1221](#1221)) by @jeff-hykin
- **ARM (aarch64) support** — DimOS runs on ARM hardware. Platform-conditional dependencies, open3d source builds for arm64. ([#1229](#1229)) by @jeff-hykin
- **Universal joint/hardware schema** — `HardwareComponent` dataclass with `JointState`, `JointName` type aliases. Backend registry with auto-discovery for SDK adapters. ([#1040](#1040), [#1067](#1067)) by @mustafab0

---

## 🔧 Improvements

- **Optional Dask** — Start without Dask using `--no-dask` flag. Startup time reduced from ~60s to ~45s. ([#1111](#1111), [#1232](#1232)) by @paul-nechifor
- **RPC rework** — Renamed `ModuleBlueprint` → `_BlueprintAtom`, `ModuleBlueprintSet` → `Blueprint`, `ModuleConnection` → `Stream`. Added `ModuleRef`, improved type hints throughout. ([#1143](#1143)) by @jeff-hykin
- **Image class simplification** — Rewritten as pure NumPy dataclass. Removed CUDA backend, unused methods (solve_pnp, csrt_tracker), and image_impls/ directory. ([#1161](#1161)) by @leshy
- **Odometry message cleanup** — Simplified Odometry message type. ([#1256](#1256)) by @leshy
- **Remove all ROS message dependencies** — Purged ROS message types from core DimOS. Refactored rosnav to use ROSTransport. Removed dead ROS bridge code. ([#1230](#1230)) by @alexlin2
- **Removed bad function serialization** — Eliminated unnecessary serialization of Python functions. ([#1121](#1121)) by @paul-nechifor
- **Benchmark IEC units** — Switched bandwidth benchmarks from SI to IEC units for accuracy. ([#1147](#1147)) by @leshy
- **Pubsub typing improvements** — Thread-safety locks on `subscribe_new_topics` and `subscribe_all`. Proper type params across pubsub stack. ([#1153](#1153)) by @leshy
- **Autogenerated blueprint list** — Blueprints are now auto-discovered and listed. ([#1100](#1100)) by @paul-nechifor
- **Generic Buttons message** — Renamed `QuestButtons` to `Buttons` with generic field names for cross-platform teleop. ([#1261](#1261)) by @ruthwikdasyam
- **Dev container uses ros-dev image** — `./bin/dev` now runs the ROS-enabled dev image. ([#1170](#1170)) by @leshy
- **LSP support** — Added python-lsp-server and python-lsp-ruff to dev dependencies. ([#1169](#1169)) by @leshy
- **Lazy-load pyrealsense2** — RealSense camera module uses lazy imports to avoid errors in simulation environments without the SDK. ([#1309](#1309)) by @spomichter
- **Removed unused mmcv and mmengine** — Dead Detic dependencies removed, eliminating slow source builds from install. ([#1319](#1319)) by @spomichter
- **Simplified installation** — Removed direnv requirement, streamlined install instructions across all platforms. ([#1315](#1315)) by @spomichter
- **DDS extra excluded from --all-extras** — `cyclonedds` requires a source build, so `dds` is now excluded from `uv sync --all-extras` by default. ([#1318](#1318)) by @spomichter
- **Nix pre-commit skip** — Skip pre-commit install if hooks already exist. ([#1162](#1162)) by @leshy
- **Removed base-requirements** — Consolidated dependency management. ([#1098](#1098)) by @paul-nechifor
- **Removed old graspnet** — Cleaned up deprecated graspnet version. ([#1248](#1248)) by @paul-nechifor
- **Code cleanup** — Removed `tofix` markers ([#1216](#1216)), fixed ruff issues ([#1112](#1112)), removed old README_installation.md ([#1101](#1101)) by @paul-nechifor

---

## 🐛 Bug Fixes

- Fix LFS updating (move from .local to venv) ([#1090](#1090)) by @jeff-hykin
- Launch hotfixes: git clone HTTPS, get_data main branch ([#1091](#1091)) by @spomichter
- Fix camera demo not showing in Rerun ([#1148](#1148)) by @jeff-hykin
- Default to rerun native viewer ([#1099](#1099)) by @Nabla7
- Fix exploration blocking agent loop ([#1258](#1258)) by @paul-nechifor
- Fix person-follow blocking agent loop ([#1278](#1278)) by @paul-nechifor
- Skip metric3d tests on unsupported xformers GPUs (Blackwell compute capability >9.0) ([#1225](#1225)) by @leshy
- Fix manipulation tests ([#1218](#1218), [#1247](#1247)) by @jeff-hykin, @paul-nechifor
- Fix control coordinator e2e test ([#1212](#1212)) by @mustafab0
- Fix xarm7-sim broken e2e tests ([#1294](#1294)) by @paul-nechifor
- Pin langchain to restore supported providers ([#1241](#1241)) by @spomichter
- Fix missing library dependencies in Nix flake ([#1240](#1240)) by @Kaweees
- Fix discord invite link ([#1122](#1122)) by @spomichter
- macOS edgecase fix ([#1096](#1096)) by @jeff-hykin
- Fix second N in logo ([#1250](#1250)) by @jeff-hykin
- Fix Unitree Go2 minor issues ([#1307](#1307)) by @paul-nechifor
- Fix broken tests ([#1305](#1305)) by @ruthwikdasyam
- Fix `uv sync` for some macOS systems ([#1322](#1322)) by @jeff-hykin
- Fix mmcv install ([#1313](#1313)) by @paul-nechifor
- Fix mypy issues ([#1150](#1150), [#1167](#1167), [#1257](#1257)) by @leshy, @paul-nechifor, @jeff-hykin
- Fix Nix install uv pip extras ([#1321](#1321)) by @spomichter

---

## 📚 Documentation

- **Major docs overhaul** — New README with feature grid, hardware table, quickstart. Navigation, transports, data streams, and agent docs. ([#1295](#1295)) by @leshy
- **Day 1 docs** — Comprehensive getting started guide, development docs, contributing guide, architecture overview. Executable blueprint docs via md-babel-py. ([#1064](#1064)) by @jeff-hykin
- **Arm integration guide** — How-to for integrating new robotic arms with DimOS. ([#1238](#1238)) by @mustafab0
- **MCP documentation update** — Updated MCP install and usage instructions. ([#1251](#1251)) by @Kaweees
- **Docker docs** — First pass on Docker deployment documentation. ([#1151](#1151)) by @leshy
- **Transports documentation** — Encode/decode mixins, SHM examples, ROS/DDS transport docs. ([#1107](#1107)) by @leshy
- **Rerun API examples** — Updated examples for the new RerunBridge API. ([#1262](#1262)) by @jeff-hykin
- **PR template** added ([#1172](#1172)) by @christiefhyang
- **Simplified install instructions** — Removed direnv, streamlined across all platforms. ([#1315](#1315)) by @spomichter
- **Python example restored** — Added back the Python usage example. ([#1317](#1317)) by @jeff-hykin
- **Nix install updated** — Replaced uv with pip for Nix compatibility. ([#1326](#1326)) by @ruthwikdasyam
- **README improvements** ([#1311](#1311)) by @paul-nechifor
- **Simplified writing docs** — Consolidated writing_docs to a single markdown file. ([#1254](#1254)) by @jeff-hykin

---

## 🏗️ CI & Build

- **ci-complete gate** — Dynamic branch protection via single aggregated status check. MD-only PRs no longer blocked. ([#1279](#1279)) by @spomichter
- **Path-based test filtering** — Test jobs fully skip (no container spin-up) when no relevant code changed. ([#1284](#1284), [#1286](#1286)) by @spomichter
- **Navigation docker build workflow** — CI builds for the ROS navigation stack. ([#1259](#1259)) by @spomichter
- **CUDA test marker** — `@pytest.mark.cuda` for GPU-dependent tests. ([#1220](#1220)) by @jeff-hykin
- **e2e test marker** — Marked end-to-end tests for selective CI runs. ([#1110](#1110)) by @paul-nechifor
- **pytest stdin fix** — Added `-s` to default addopts for LCM autoconf compatibility. ([#1320](#1320)) by @spomichter

---

## ⚠️ Breaking Changes

- **RPC renames**: `ModuleBlueprint` → `_BlueprintAtom`, `ModuleBlueprintSet` → `Blueprint`, `ModuleConnection` → `Stream` ([#1143](#1143))
- **Image class rewrite**: `CudaImage` and `NumpyImage` removed. Image is now a pure NumPy dataclass. Methods like `solve_pnp`, `csrt_tracker`, `from_depth`, `to_depth_meters` removed. ([#1161](#1161))
- **ROS messages removed from core**: All `to_ros`/`from_ros` conversion methods removed. Use `ROSTransport` instead. ([#1230](#1230))
- **QuestButtons → Buttons**: Renamed with generic field names. ([#1261](#1261))
- **RerunBridge replaces old rerun init**: `dimos.dashboard.rerun_init` removed. Use `RerunBridgeModule` or the `rerun-bridge` CLI. ([#1154](#1154))
- **Unitree directory restructuring**: `unitree_go2` → `unitree/go2`, `unitree_g1` → `unitree/g1`. Blueprint names updated. ([#1221](#1221))
- **Default viewer is now rerun-web**: Use `--viewer-backend rerun` to restore native viewer. ([#1324](#1324))

---

## Quickstart

```bash
# Install
uv pip install dimos[base,unitree]

# Try it (no hardware needed)
# NOTE: First run downloads ~2.4 GB from LFS
dimos --replay run unitree-go2

# Simulate
uv pip install dimos[base,unitree,sim]
dimos --simulation run unitree-go2
```

---

## New Contributors 🎉

- @ruthwikdasyam — Quest VR teleoperation, phone teleop, arm teleop, webcam rerun viz
- @JalajShuklaSS — GraspGen integration
- @jca0 — MuJoCo simulation module
- @christiefhyang — PR template

---

**Full Changelog**: [v0.0.9...v0.0.10](v0.0.9...v0.0.10)
docs(go2): add getting started guide (#1339)
Comment on lines +230 to +258
reconnect = False
if _is_container_running(config, self._container_name):
if config.docker_reconnect_container:
logger.info(f"Reconnecting to running container: {self._container_name}")
reconnect = True
else:
logger.info(f"Stopping existing container: {self._container_name}")
_run(
[_docker_bin(config), "stop", self._container_name],
timeout=DOCKER_STOP_TIMEOUT,
)

if not reconnect:
_remove_container(config, self._container_name)
cmd = self._build_docker_run_command()
logger.info(f"Starting docker container: {self._container_name}")
r = _run(cmd, timeout=DOCKER_RUN_TIMEOUT)
if r.returncode != 0:
raise RuntimeError(
f"Failed to start container.\nSTDOUT:\n{r.stdout}\nSTDERR:\n{r.stderr}"
)
self.rpc.start()
self._running = True
# docker run -d returns before Module.__init__ finishes in the container,
# so we poll until the RPC server is reachable before returning.
self._wait_for_rpc()
except Exception:
with suppress(Exception):
self._cleanup()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reconnect mode destroys the pre-existing container on cleanup failure

When docker_reconnect_container=True and the existing container is reached (reconnect = True), the _remove_container / docker run steps are correctly skipped. However, if _wait_for_rpc() subsequently raises (e.g. RPC timeout), the except Exception block calls _cleanup(), which unconditionally runs both docker stop <name> and docker rm -f <name> — permanently destroying the container the caller wanted to preserve.

# _cleanup() always does:
_run([_docker_bin(self.config), "stop", self._container_name], ...)
_remove_container(self.config, self._container_name)

A user who sets docker_reconnect_container=True almost certainly wants the container to survive a failed reconnect attempt. Consider tracking whether __init__ created the container itself (vs reconnected to one that was already running) and only stopping/removing it in _cleanup() when DockerModule was responsible for starting it:

self._owns_container: bool = not reconnect   # set just before rpc.start()

# _cleanup():
if self._owns_container:
    with suppress(Exception):
        _run([_docker_bin(self.config), "stop", self._container_name], ...)
    with suppress(Exception):
        _remove_container(self.config, self._container_name)

Comment on lines +135 to +140
def start_all_modules(self) -> None:
modules = list(self._deployed_modules.values())
if isinstance(self._client, WorkerManager):
with ThreadPoolExecutor(max_workers=len(modules)) as executor:
list(executor.map(lambda m: m.start(), modules))
else:
for module in modules:
module.start()
if not modules:
raise ValueError("No modules deployed. Call deploy() before start_all_modules().")
with ThreadPoolExecutor(max_workers=len(modules)) as executor:
list(executor.map(lambda m: m.start(), modules))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

start_all_modules silently discards all-but-first exception

executor.map returns an ordered iterator; list() re-raises the first exception it encounters and discards all later ones. When two or more module.start() calls fail concurrently (e.g. two Docker RPC calls time out), only the first failure is surfaced to the caller — the rest are silently swallowed, making diagnosis very hard.

DockerModule.start() does call self.stop() on its own failure, so containers are cleaned up individually, but the exception itself is lost.

Consider replacing with safe_thread_map (already available in this PR) and an on_errors handler that re-raises an ExceptionGroup containing all failures:

def _start_one(m):
    m.start()

def _on_start_errors(outcomes, successes, errors):
    raise ExceptionGroup("start_all_modules failed", errors)

safe_thread_map(modules, _start_one, _on_start_errors)

@jeff-hykin jeff-hykin marked this pull request as ready for review March 7, 2026 10:44
…ll.sh

- Include docker_build_ssh in build hash so toggling SSH triggers rebuild
- Capture stderr on build failure for actionable error messages
- Change _get_image_build_hash to take cfg instead of raw docker_bin str
- Remove private names from __all__ in docker_build.py
- Add helpful TypeError when DockerModule payload isn't JSON-serializable
- Replace ThreadPoolExecutor.map in start_all_modules with safe_thread_map
  to surface all failures via ExceptionGroup instead of losing all but first
- Restore scripts/install.sh and README.md (accidentally removed)
- Add intent comments on deploy_parallel and safe_thread_map design choices
Comment on lines +261 to +263
except Exception:
with suppress(Exception):
self._cleanup()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_cleanup() destroys pre-existing container when docker_reconnect_container=True

This was flagged in the previous thread and the request was to recheck. The issue is still present in the current code.

When docker_reconnect_container=True is set, a pre-existing container is intentionally preserved (reconnect = True). However, if _wait_for_rpc() subsequently times out or raises, the except Exception block calls _cleanup(), which unconditionally runs docker stop <name> and docker rm -f <name>, permanently destroying the container the caller wanted to preserve.

_cleanup() has no knowledge of whether it created the container or reconnected to one. A simple fix is to track this with an instance variable:

self._owns_container: bool = False  # set to True only when DockerModule created the container

Then in _cleanup(), gate the stop/rm on self._owns_container. This way a failed reconnect attempt leaves the container running exactly as the user expects.

Comment on lines 336 to 345
@@ -290,10 +345,10 @@ def __getattr__(self, name: str) -> Any:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docker pull runs silently — no user feedback for potentially 10-minute operation

_run() captures both stdout and stderr with capture_output=True, so the entire pull progress (layer download bars, etc.) is swallowed until either success or failure. With a default docker_pull_timeout of 600 seconds, users will see zero output for up to 10 minutes and may assume the process has hung.

Consider using subprocess.run without capture_output (or with stdout=None) so pull progress is streamed to the terminal, while still capturing stderr for the error message:

result = subprocess.run(
    [config.docker_bin, "pull", config.docker_image],
    text=True,
    stderr=subprocess.PIPE,
    timeout=config.docker_pull_timeout,
)
if result.returncode != 0:
    raise RuntimeError(
        f"Failed to pull image '{config.docker_image}'.\nSTDERR:\n{result.stderr}"
    )

This matches the pattern already used in build_image() for streaming build output.

jeff-hykin and others added 19 commits March 7, 2026 14:31
Release v0.0.11

82 PRs, 10 contributors, 396 files changed.

This release brings a production CLI, MCP tooling, temporal memory, and first-class support for coding agents. Dask has been removed. The entire stack now runs from `dimos run` through `dimos stop`.

### Agent-Native Development

DimOS is now built to be driven by coding agents. Point OpenClaw, Claude Code, or Cursor at [AGENTS.md](AGENTS.md) and they can build, run, and debug Dimensional applications using the CLI and MCP interfaces directly.

- **AGENTS.md** — comprehensive onboarding doc: architecture, CLI reference, skill rules, blueprint quick-reference. Your agent reads this and starts coding.
- **MCP server** — all `@skill` methods exposed as HTTP tools. External agents call `dimos mcp call relative_move --arg forward=0.5` or connect via JSON-RPC.
- **MCP CLI** — `dimos mcp list-tools`, `dimos mcp call`, `dimos mcp status`, `dimos mcp modules`
- **Agent context logging** — MCP tool calls and agent messages logged to per-run JSONL for debugging and replay.

### CLI & Daemon

Full process lifecycle — no more Ctrl-C in tmux.

- `dimos run --daemon` — background execution with health checks and run registry
- `dimos stop [--force]` — graceful shutdown with SIGTERM → SIGKILL fallback
- `dimos restart` — replays the original CLI arguments
- `dimos status` — PID, blueprint, uptime, MCP port
- `dimos log -f` — structured per-run logs with follow, JSON output, filtering
- `dimos show-config` — resolved GlobalConfig with source tracing

### Temporal-Spatial Memory

Robots in physical space ingest hours of video and lidar. Temporal-spatial memory gives them a human-like understanding of the world — causal object relationships, entity tracking through time and physical space, and the ability to answer complex temporal queries:

*Who spends the most time in the kitchen? What time on average do I wake up? Which set of switches toggles the main lights? Who was at the office at 9am last Thursday?*

Traditional frame-level embeddings (CLIP, ViT) lose temporal context and don't scale beyond a handful of frames. Video transformers are expensive and don't operate in RGB-D. Dimensional agents work with video + lidar natively, tracking entities across hours and days.

```bash
dimos --replay --replay-dir unitree_go2_office_walk2 run unitree-go2-temporal-memory
```

### Interactive Viewer

Custom Rerun fork (`dimos-viewer`) is now the default. Click-to-navigate: click a point in the 3D view → PointStamped → A* planner → robot moves.

- Camera | 3D split layout on Go2, G1, and drone blueprints
- Native keyboard teleop in the viewer
- `--viewer rerun|rerun-web|rerun-connect|foxglove|none`

### Drone Support

Drone blueprints modernized to match Go2 composition pattern. `drone-basic` and `drone-agentic` work with replay, Rerun, and the full CLI.

```bash
dimos --replay run drone-basic
dimos --replay run drone-agentic
```

### More

- **Go2 fleet control** — multi-robot with `--robot-ips` (#1487)
- **Replay `--replay-dir`** — select dataset, loops by default (#1519, #1494)
- **Interactive install** — `curl -fsSL .../install.sh | bash` (#1395)
- **Nix on non-Debian Linux** (#1472)
- **Remove Dask** — native worker pool (#1365)
- **Remove asyncio dependency** (#1367)
- **Perceive loop** — continuous observation module for agents (#1411)
- **Worker resource monitor** — `dtop` TUI (#1378)
- **G1 agent wiring fix** (#1518)
- **Rerun rate limiting** — prevents viewer OOM on continuous streams (#1509, #1521)
- **RotatingFileHandler** — prevents unbounded log growth (#1492)
- **Test coverage** (#1397), draft PR CI skip (#1398), manipulation test fixes (#1522)

### Breaking Changes

- `--viewer-backend` renamed to `--viewer`
- Dask removed — blueprints using Dask workers need migration to native worker pool
- Default viewer changed from `rerun-web` to `rerun` (native dimos-viewer)

### Contributors

@spomichter, @PaulNechifor, @ruthwikdasyam, @summeryang, @MustafaBhadsorawala, @leshy, @sambull, @JeffHykin, @RadientBrain

## Contributor License Agreement

- [x] I have read and approved the [CLA](https://github.com/dimensionalOS/dimos/blob/main/CLA.md).
chore: bump version to 0.0.11 (#1529)
docs: README additions: temporal-memory fix, formatting (#1535)
Merged origin/dev changes while preserving Docker support:
- module_coordinator.py: Use dev's deploy signature with global_config
  while keeping Docker module routing
- worker_manager.py: Keep error handling cleanup in deploy_parallel,
  add Iterable import from dev

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- docker_worker_manager: accept ModuleSpec format, pass global_config
- module_coordinator: add type: ignore for ModuleBase→Module cast
- worker_manager: convert Iterable to list for len() check
- test_docker_deployment: fix Path import, update test assertions for
  new global_config signature

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Merged origin/dev, resolving import conflict in docker_runner.py:
- Updated LCMRPC import path to dimos.protocol.rpc.pubsubrpc (from dev)
- Kept ModuleProxyProtocol import and lazy docker_build imports (PR intent)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Pre-existing mypy errors: onnxruntime is excluded from install
(--no-extra cuda) so import-not-found needs to be ignored alongside
import-untyped.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove comment section markers (dashed lines) that violate the
  no-section-markers test policy
- Remove .venv symlink from git tracking (already in .gitignore)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants