-
Notifications
You must be signed in to change notification settings - Fork 22
Description
Review Plan
This is the plan for reviewing and merging PR #2326 (port node launcher from Python to Rust).
The key principle: this is a code port/refactor, not new logic. Splitting into smaller PRs would create overhead and lose focus on that fact.
Steps
- Make sure basic flows are working (partially done)
- Update scripting and testing infra (partially done)
- Do a relatively quick review to locate main logical changes and areas of interest — done via AI-assisted analysis (see PR Analysis below)
- Address any critical issues and merge the code as-is
- Priority: verify EmitEvent payload format parity (Finding 3 in analysis)
- Create follow-up issues to address relevant pieces (listed below)
- Address follow-up issues
- After all is done, remove old Python launcher
PR Analysis
Summary
PR #2326 replaces the Python-based TEE launcher (tee_launcher/) with a Rust implementation (crates/tee-launcher/). The launcher is responsible for securely pulling, validating, and starting the MPC node Docker container — optionally inside a TEE (Intel TDX via dstack).
62 files changed (+2514 / -2134)
Key Architectural Changes
- Config format:
.envkey=value pairs -> TOML with[launcher_config]and[mpc_node_config]sections - MPC config delivery:
docker run --env MPC_*flags -> TOML file at/mnt/shared/mpc-config.tomlviastart-with-config-file - Container launch:
docker runwith CLI flags ->docker compose up -dwith rendered templates - dstack interaction:
curlto unix socket (GetQuote + EmitEvent) ->dstack-sdkRust crate (emit_event()) - TEE config to node:
MPC_IMAGE_HASH+MPC_LATEST_ALLOWED_HASH_FILEenv vars -> Injected[tee]TOML section with authority type, quote_upload_url, image_hash, file path - Python launcher deleted: All files under
tee_launcher/removed - Deployment configs migrated:
.conffiles replaced with.toml, files moved todeployment/cvm-deployment/ - CI updated: New jobs for launcher non-TEE check and reproducible Docker image build/verification
Functionality Comparison: Equivalent (no gaps)
| Area | Details |
|---|---|
| Image hash selection logic | Both: override > approved file (newest first) > default |
| Multi-platform manifest resolution | Both use deque/VecDeque, filter amd64/linux, queue child digests |
| Image validation flow | Both: registry token -> manifest fetch -> docker pull by digest -> docker inspect -> compare config digest |
| Retry/backoff | Both: exponential, factor 1.5, cap 60s, configurable max attempts + timeout + interval |
| Container security | Both: no-new-privileges:true, /tapp:ro, same named volumes |
| Approved hashes JSON format | Both use {"approved_hashes": [...]} key, NonEmpty constraint, newest-first ordering |
| RTMR3 extension | Both emit event "mpc-image-digest" with image hash. Skipped in non-TEE |
| Container removal | Both: docker rm -f mpc-node before launch, errors ignored |
DOCKER_CONTENT_TRUST |
Both require =1 |
Intentional Differences (by design)
| Area | Python | Rust | Risk |
|---|---|---|---|
| Config format | .env key=value |
TOML with sections | None -- better structure |
| MPC config delivery | --env MPC_* flags on docker run |
TOML file at /mnt/shared/mpc-config.toml |
None -- eliminates env var attack surface |
| Container launch | docker run with CLI flags |
docker compose up -d with template |
None -- equivalent |
| dstack client | curl to unix socket (GetQuote + EmitEvent) |
dstack-sdk Rust crate (emit_event only) |
See findings |
| TEE config to node | MPC_IMAGE_HASH + MPC_LATEST_ALLOWED_HASH_FILE as env vars |
Injects [tee] section into TOML |
Improvement -- richer metadata |
Findings: Potential Gaps and Behavioral Differences
Finding 1: GetQuote call removed
- Python calls
GetQuotebeforeEmitEvent; Rust only callsemit_event()via dstack SDK - Risk: Low. GetQuote was likely a sanity check.
emit_event()will fail if socket is unreachable.
Finding 2: dstack socket existence check removed
- Python explicitly checks
is_unix_socket()before TEE operations; Rust has no pre-check - Risk: Low -- error still surfaces, just with less descriptive message.
Finding 3: EmitEvent payload format difference
- Python sends bare hex string (without
sha256:prefix); Rust sendsimage_hash.as_ref().to_vec()as bytes - Risk: Medium. RTMR3 measurements may differ between launchers for the same image. Could break attestation verification.
Finding 4: Env var passthrough completely removed
- Python passes
RUST_LOG,RUST_BACKTRACE,NEAR_BOOT_NODES, andMPC_*as--envflags - Rust passes zero env vars (except
DSTACK_ENDPOINTin TEE via compose template) - Risk: High if not handled. Must confirm node reads these from TOML config or they're set another way.
Finding 5: MPC_IMAGE_HASH and MPC_LATEST_ALLOWED_HASH_FILE env vars removed
- Python sets these as container env vars; Rust injects via
[tee]TOML section - Risk: Low if node reads from TOML. Verify no code still reads from env.
Finding 6: No env var sanitization -- different threat model
- Python had 14+ validation steps (control chars, LD_PRELOAD, length limits, denied keys)
- Rust only validates reserved
[tee]key. Arbitrary TOML passed through. - Risk: Low. TOML file on volume has no shell interpretation. MPC node never accepted
p2p_private_keyoraccount_skfrom config anyway (keys always generated internally).
Finding 7: Auth token URL hardcoded to Docker Hub
- Both Python and Rust have same limitation. No gap.
Test Coverage Comparison
| Category | Python | Rust | Gap? |
|---|---|---|---|
| Config parsing | 5 | 11 | Rust has more |
| Port/host validation | 3 | 4 | Rust has more |
| Env var security | 14 | 0 | N/A -- no env passthrough in Rust |
| LD_PRELOAD prevention | 5 | 1 | N/A -- no shell passthrough |
| Image hash selection | 5 | 6 | Rust has more |
| Platform parsing | 3 | 0 | N/A -- clap validates |
| RTMR3 / dstack | 4 | 0 | Gap |
| Docker command/compose | 2 | 8 | Rust has more |
| Config interception | 0 | 9 | Rust-only, well tested |
| Manifest parsing | 0 | 5 | Rust has more |
| Registry resolution | 1 | 6 | Rust has more |
| Full E2E flow | 2 | 1 | Gap |
| Total | ~61 | 46 |
Meaningful test gaps in Rust:
- No dstack/RTMR3 failure tests (Medium)
- No full E2E flow test (Medium)
- No retry/backoff behavior test (Low)
Follow-Up Issues to Create
After merging PR #2326, the following issues should be created and addressed:
Attestation
- Verify EmitEvent payload parity between Python and Rust launchers -- Confirm
image_hash.as_ref().to_vec()produces same bytes as Python'sget_bare_digest(). If different, RTMR3 measurements won't match, breaking attestation continuity. (Finding 3) - Add dstack/RTMR3 failure path tests -- Test that
emit_event()errors propagate asDstackEmitEventFailed. Cover: socket missing, emit failure, NONTEE skip. (Test gap)
Testing
- Add E2E flow test for Rust launcher -- Mock filesystem + registry + docker commands and run through
run()to verify complete orchestration. Python hadtest_main_nontee_builds_expected_mpc_docker_cmdequivalent. (Test gap) - Add retry/backoff behavior tests -- Verify exponential backoff sequence and max attempt enforcement in registry manifest resolution. (Test gap, low priority)
Config & Env Vars
- Confirm
RUST_LOG/RUST_BACKTRACEpassthrough works -- These were previously passed as container env vars by Python launcher. Verify MPC node reads them from TOML config file or they're set another way. (Finding 4) - Verify no MPC node code reads
MPC_IMAGE_HASHfrom env -- Grep forstd::env::var("MPC_IMAGE_HASH")andMPC_LATEST_ALLOWED_HASH_FILEenv reads. These are now in[tee]config section. (Finding 5)
Code Structure & Quality
- Refactor
main.rsinto smaller logical modules -- The file is ~1,279 lines mixing orchestration, registry interaction, validation, compose rendering, and config injection. Recommended split:registry.rs--RegistryInfotrait,DockerRegistry,get_manifest_digest()(~450 lines)validation.rs--validate_image_hash(), docker pull/inspect logic (~150 lines)compose.rs--render_compose_file(),launch_mpc_container()(~150 lines)selection.rs--select_image_hash()(~100 lines)config.rs--intercept_node_config(),insert_reserved()(~100 lines)main.rs-- entry pointsmain()andrun()only (~100-150 lines)
- Replace
expect()/unwrap()calls in production code with proper error handling -- 6 instances in non-test code that could panic:main.rs:307-- Bearer token header parse (panics on malformed registry token)main.rs:475-- Docker inspect output digest parse (panics on unexpected output)main.rs:173-- TeeConfig TOML serializationmain.rs:144-- mpc_node_config TOML serializationmain.rs:504-- Port list JSON serializationmain.rs:138-- IMAGE_DIGEST_FILE path parse
- Fix duplicate error messages in
LauncherError--DockerRunFailedandDockerRunFailedExitStatus(error.rs lines 18 & 24) have identical#[error()]messages, making it impossible to distinguish them in logs. - Use
DockerSha256Digesttype forManifestEntry::digest-- CurrentlyString(docker_types.rs line 33), inconsistent withManifestConfig::digestwhich correctly uses the typed wrapper. Could allow malformed digests to propagate. - Add validation for
LauncherConfigRPC parameters --rpc_request_timeout_secs,rpc_request_interval_secs,rpc_max_attempts(types.rs lines 62-66) have no bounds checking. Zero values cause silent failures. ConsiderNonZeroU64/NonZeroU32or avalidate()method. - Add manifest traversal depth limit --
get_manifest_digest()(main.rs line 302) processes tags from a deque that multi-platform manifests add to. Theoretically unbounded. Add aMAX_MANIFEST_DEPTHconstant.
Security
- Consider adding dstack socket pre-check in TEE mode -- Python explicitly verified socket existence before TEE operations for a clear error message. Rust could add a similar check for better UX. (Finding 2)
Cleanup
- Remove old Python launcher -- Delete
tee_launcher/directory, remove Python test dependencies, clean upflake.nixPYTHONPATH reference. (Step 7 of plan) - Address TODO(tee-launcher: remove hard-coded Docker Hub registry URL from auth flow #2479): registry auth endpoint -- Currently hardcoded to Docker Hub (
auth.docker.io). If non-Docker-Hub registries are needed, theRegistryInfotrait needs a configurable auth endpoint. (main.rs line 252) - Remove or document unused
HostEntrystruct -- (types.rs line 74) is defined and tested but doesn't appear in any function signatures or runtime code paths.
Created Follow-Up Issues
Attestation
- [rust-launcher] Verify EmitEvent payload parity between Python and Rust launchers #2603 Verify EmitEvent payload parity between Python and Rust launchers
- [rust-launcher] Add dstack/RTMR3 failure path tests #2604 Add dstack/RTMR3 failure path tests
Testing
- [rust-launcher] Add E2E flow test for launcher orchestration #2605 Add E2E flow test for launcher orchestration
- [rust-launcher] Add retry/backoff behavior tests for registry manifest resolution #2606 Add retry/backoff behavior tests for registry manifest resolution
Config & Env Vars
- [rust-launcher] Confirm RUST_LOG / RUST_BACKTRACE passthrough to MPC node #2607 Confirm RUST_LOG / RUST_BACKTRACE passthrough to MPC node
- [rust-launcher] Verify no MPC node code reads MPC_IMAGE_HASH from env #2608 Verify no MPC node code reads MPC_IMAGE_HASH from env
Code Structure & Quality
- [rust-launcher] Refactor main.rs into smaller logical modules #2609 Refactor main.rs into smaller logical modules
- [rust-launcher] Replace expect()/unwrap() calls in production code with proper error handling #2602 Replace expect()/unwrap() calls in production code with proper error handling
- [rust-launcher] Fix duplicate error messages in LauncherError #2610 Fix duplicate error messages in LauncherError
- [rust-launcher] Use DockerSha256Digest type for ManifestEntry::digest #2611 Use DockerSha256Digest type for ManifestEntry::digest
- [rust-launcher] Add validation for LauncherConfig RPC parameters #2612 Add validation for LauncherConfig RPC parameters
- [rust-launcher] Add manifest traversal depth limit #2613 Add manifest traversal depth limit
Security
- [rust-launcher] Add dstack socket pre-check in TEE mode #2614 Add dstack socket pre-check in TEE mode
Cleanup
- [rust-launcher] Remove old Python launcher #2615 Remove old Python launcher
- [rust-launcher] Address TODO(#2479): configurable registry auth endpoint #2616 Address TODO(tee-launcher: remove hard-coded Docker Hub registry URL from auth flow #2479): configurable registry auth endpoint
- [rust-launcher] Remove or document unused HostEntry struct #2617 Remove or document unused HostEntry struct