Skip to content

Conversation

@nishu-builder
Copy link
Contributor

@nishu-builder nishu-builder commented Jan 16, 2026

Cross-account isolation for job runner: jobs run in eval account with no Observatory access, using S3 presigned URLs for all data exchange.

  • Presigned URL mode in episode runner
  • Dispatcher generates presigned URLs when EVAL_S3_BUCKET configured
  • Watcher reads results from S3
  • LocalStack + eval-jobs namespace for local testing

Nishad and others added 13 commits January 16, 2026 00:10
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add LocalStack process to process-compose.yaml
- Add eval-jobs namespace alongside jobs namespace
- Make JOB_NAMESPACE configurable via config
- Add --s3 flag to server and watcher commands
- Update help text with S3 mode instructions

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Server uses host.docker.internal:4566 so presigned URLs work from pods.
Watcher uses localhost:4566 since it runs on host.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@nishu-builder nishu-builder changed the title Update job runner spec with cross-account architecture design Job runner cross-account isolation Jan 16, 2026
@nishu-builder nishu-builder marked this pull request as ready for review January 16, 2026 23:36
Copy link
Contributor Author


How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

  • add-to-merge-queue - adds this PR to the back of the merge queue
  • add-to-merge-queue-as-hotfix - for urgent hot fixes, skip the queue and merge this PR next

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3679fe6746

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +51 to +55
policy_s3_paths: list[str] = []

urls = generate_job_presigned_urls(
job_id=job.id,
policy_s3_paths=policy_s3_paths,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Populate policy URIs before generating presigned URLs

When EVAL_S3_BUCKET is enabled, policy_s3_paths is hard-coded to an empty list and immediately passed into generate_job_presigned_urls, which then overwrites the job spec’s policy_uris with an empty list. In S3 mode this means every job spec sent to the runner has no policies, so PureSingleEpisodeJob validation fails (assignments are out of range) and the job cannot run. This needs to derive policy S3 paths from the existing job spec (or preserve its policy_uris) before generating the presigned URLs.

Useful? React with 👍 / 👎.

Comment on lines 224 to +227
if phase == "Succeeded":
results = read_results_from_s3(job_id)
if results:
_update_job_with_results(stats_client, job_id, results)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Persist episode_id when consuming S3 results

In S3 mode, the watcher reads results.json from S3 and writes that dict directly into JobRequest.result, but the results payload is a PureSingleEpisodeResult (rewards/stats/steps) and does not include an episode_id. Downstream components (e.g., tournament scoring via JobRequest.episode_id) rely on result["episode_id"], so completed S3-mode jobs will never produce episodes or scores. The watcher should call write_single_episode_to_observatory (or otherwise set episode_id) when it consumes S3 results.

Useful? React with 👍 / 👎.

@nishu-builder
Copy link
Contributor Author

Replaced with stacked PRs #4944-#4949

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants