-
Notifications
You must be signed in to change notification settings - Fork 47
Job runner cross-account isolation #4940
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add LocalStack process to process-compose.yaml - Add eval-jobs namespace alongside jobs namespace - Make JOB_NAMESPACE configurable via config - Add --s3 flag to server and watcher commands - Update help text with S3 mode instructions Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Server uses host.docker.internal:4566 so presigned URLs work from pods. Watcher uses localhost:4566 since it runs on host. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
How to use the Graphite Merge QueueAdd either label to this PR to merge it via the merge queue:
You must have a Graphite account in order to use the merge queue. Sign up using this link. An organization admin has enabled the Graphite Merge Queue in this repository. Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue. This stack of pull requests is managed by Graphite. Learn more about stacking. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3679fe6746
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| policy_s3_paths: list[str] = [] | ||
|
|
||
| urls = generate_job_presigned_urls( | ||
| job_id=job.id, | ||
| policy_s3_paths=policy_s3_paths, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Populate policy URIs before generating presigned URLs
When EVAL_S3_BUCKET is enabled, policy_s3_paths is hard-coded to an empty list and immediately passed into generate_job_presigned_urls, which then overwrites the job spec’s policy_uris with an empty list. In S3 mode this means every job spec sent to the runner has no policies, so PureSingleEpisodeJob validation fails (assignments are out of range) and the job cannot run. This needs to derive policy S3 paths from the existing job spec (or preserve its policy_uris) before generating the presigned URLs.
Useful? React with 👍 / 👎.
| if phase == "Succeeded": | ||
| results = read_results_from_s3(job_id) | ||
| if results: | ||
| _update_job_with_results(stats_client, job_id, results) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Persist episode_id when consuming S3 results
In S3 mode, the watcher reads results.json from S3 and writes that dict directly into JobRequest.result, but the results payload is a PureSingleEpisodeResult (rewards/stats/steps) and does not include an episode_id. Downstream components (e.g., tournament scoring via JobRequest.episode_id) rely on result["episode_id"], so completed S3-mode jobs will never produce episodes or scores. The watcher should call write_single_episode_to_observatory (or otherwise set episode_id) when it consumes S3 results.
Useful? React with 👍 / 👎.

Cross-account isolation for job runner: jobs run in eval account with no Observatory access, using S3 presigned URLs for all data exchange.