Skip to content

feat: new module function design#81

Merged
seankhl merged 12 commits intomainfrom
feat/new-design
Mar 24, 2026
Merged

feat: new module function design#81
seankhl merged 12 commits intomainfrom
feat/new-design

Conversation

@seankhl
Copy link
Collaborator

@seankhl seankhl commented Mar 23, 2026

No description provided.

In the rush-py tutorial, the `size` field is currently informational only (often 0) but is intended to carry the output size in bytes in future revisions.

In rush-py, `exess.fetch_outputs` converts the main EXESS JSON output into Python dataclasses in memory. `exess.save_outputs` downloads the raw output objects to the local workspace and returns an `ExessSavedResult` with `calc` and optional `exports` path fields. For export-heavy runs, the second output is stored as a compressed archive in the object store; `save_outputs` decompresses it and extracts the HDF5 file automatically.
In rush-py, `run.fetch()` converts the main EXESS JSON output into Python dataclasses in memory. `run.save()` downloads the raw output objects to the local workspace and returns an `exess.ResultPaths` object with `calc` and optional `exports` path fields. For export-heavy runs, the second output is stored as a compressed archive in the object store; `save()` decompresses it and extracts the HDF5 file automatically.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to self: the thing about compressed archives isn't clear to me or gemini, might need more info (or if all auto taken care of for the user, might be able to remove that part entirely)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely a docs TODO to figure out what to explain and how for this and similar outputs that carry over some implementation details.

```

When `convert_hdf5_to_json=True` is set on the EXESS run, `save_outputs` saves the exported data as JSON instead of HDF5. Example JSON structure:
When `convert_hdf5_to_json=True` is set on the EXESS run, `run.save()` saves the exported data as JSON instead of HDF5. Example JSON structure:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aren't the results already in json in calc? still not clear to me what goes to calc and what goes to exports

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

calc -> outputs we always get from energy calcs
exports -> requested via ExportKeywords, will be empty if no exports were requested

They could be merged since it's a little bit of an implementation detail, but I think we should just update the docs to be clear about this (and in the exports tutorial).

### Basic usage

The rush-py EXESS wrapper accepts the same topology input format (JSON). Most users should prefer the module-specific wrappers and only use `exess.exess(...)` when they need to set the EXESS driver explicitly:
The rush-py EXESS wrapper accepts the same topology input format (JSON). Most users should prefer the module-specific wrappers and only use `exess.calculate(...)` when they need to set the EXESS driver explicitly:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to self: same topology input format as what?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to self check if we explain what exess drivers are

### Outputs and object store paths

Rush returns outputs as object store references (UUID paths plus format info). Use the EXESS output helpers to download the results:
Rush returns outputs as object store references (UUID paths plus format info). Use the run or result-reference helpers to download the results:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels like we're doubling up on info

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little; this can definitely be cleaned up to just mention the run itself.

Also provided is a `save_json` function that allows saving a dict as JSON, by default into the workspace directory, for convenient parallel usage with `save_object`.

## Output Saving Helpers
The Rush client module provides `client.upload_object` and `client.save_object`, which allow for uploading and saving `RushObject` instances to the Rush object store to and from local filesystem paths. Also, each module's `ResultRef` class provides `ResultRef.fetch()` and `ResultRef.save() functions. These fetch a module's results and return its data directly in memory, and save an object into the workspace directory with arguments that allow for configuring how it gets named.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is save_object for now that we have fetch and save fns? are there any objects that aren't run results that someone might need to download?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These operate on individual objects rather than on entire output sets. save_object() is used internally by each module's .save() function to save each part of the output.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be worth just removing mention of them everywhere except a dedicated section to avoid confusion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha

@seankhl
Copy link
Collaborator Author

seankhl commented Mar 23, 2026

I'm thinking some kind of master table might be useful:

thing type fetch to local variable(s) save to disk upload to object store
run RushRun run.fetch() run.save() x
result reference module.ResultRef result_ref.fetch() result_ref.save() x
single object store reference RushObject object.fetch() object.save() Object.upload(path)

Go from:

  • RushRun to module.ResultRef via run.collect()
  • RushRun to module.Result via run.fetch()
  • RushRun to module.ResultPaths via run.save()
  • module.ResultRef to module.Result via result.fetch()
  • module.ResultRef to module.ResultPaths via result.save()
  • RushObject to dict (JSON object) or bytes (binary object) via object.fetch()
  • RushObject to Path via object.save()
  • Path to RushObject via RushObject.upload(path)

@seankhl seankhl marked this pull request as ready for review March 24, 2026 00:04
@seankhl seankhl requested review from kayleigh222 and ryanswrt March 24, 2026 00:11
Copy link
Contributor

@ryanswrt ryanswrt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit but otherwise lgtm

### Asynchronous Runs

Rush modules can take a long time, so by default run asynchronously: the function that triggers the run will return once the run is submitted. In order to obtain the output synchronously for this same call, pass `collect=True` as we've done above. You can also collect the run later:
Rush modules can take a long time, so by default they return a `RushRun` handle as soon as the run is submitted. To wait for completion and get the result, call `.fetch()`:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to be clear on fetch/save semantics here.

Suggested change
Rush modules can take a long time, so by default they return a `RushRun` handle as soon as the run is submitted. To wait for completion and get the result, call `.fetch()`:
Rush modules can take a long time, so by default they return a `RushRun` handle as soon as the run is submitted. To wait for completion and get the result as a variable, call `.fetch()` or to wait and save as a file, call `.save`:

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was for the first run to be ultra simple. Maybe what we should have here a line like "check here (w/ a link) for other ways to access the results of a run" or something.

@seankhl seankhl changed the base branch from feat/structured-outputs to main March 24, 2026 18:57
seankhl and others added 9 commits March 24, 2026 13:59
…ructure

Replace the overload-based collect=True/False pattern with a consistent
RushRun[ResultRef] design across all modules. Every submission function
now returns a RushRun handle; users call .fetch() for parsed results,
.save() for disk paths, or .collect() for the lightweight ResultRef.

Key changes:
- Convert exess to a package (exess/) merging energy, optimization, and
  qmmm under one namespace with domain-specific verb names:
  exess.energy(), exess.optimization(), exess.qmmm()
- Port all modules to the new pattern: nnxtb.energy(), pbsa.solvation_energy(),
  mmseqs2.search(), boltz.fold(), auto3d.generate(), prepare_protein.prepare(),
  prepare_complex.prepare()
- Three-tier result types per module: ResultRef (remote refs, no download),
  Result (parsed in-memory), ResultPaths (saved to disk)
- Move RushObject and ObjectID into client.py; implement RushObject.save()
  with save_object() as a thin wrapper
- Extract TRCRef and TRCPaths into _trc.py for shared TRC fetch/save logic
- Support multi-model PDBs in prepare_protein (returns list[TRC])
- Make boltz and auto3d return lazy iterators from .fetch()/.save()
- Require path, size, and format keys in RushObject.from_dict()
- Drop all backward-compat aliases; this is a major release

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Rename client.RushRun → RushRunInfo to not shadow the generic RushRun
- Fix fragexess collect=False to submit jobs and return RushRun handles
- Call super().__init__() in _ComplexRun to properly initialize base class
- Export FragmentRef and RushRunInfo/fetch_run_info from rush.__init__

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- prepare.protein() replaces prepare_protein.prepare()
- prepare.protein_ligand() replaces prepare_complex.prepare()
- Shared types (ResultRef, _upload_trc) live in prepare/_protein.py
- Update all test imports and example tutorials/READMEs
- Fix fragexess InteractionEnergyResultRef → exess.ResultRef type error

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- boltz: parse samples into TRCRef + RushObject in from_raw_output,
  remove _fetch_trc_output helper
- auto3d: parse conformers into TRCRef in from_raw_output, fix
  _unwrap_raw single-element collapse bug, rename raw → _inputs
- mmseqs2: rename outputs → msas, simplify Result/ResultPaths to NewType
- Add __getitem__/__len__/__iter__ to ResultRef types w/ list internals
  (boltz, auto3d, mmseqs2, prepare)
- Update all output helper tests for new monkeypatch paths

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: OpenAI Codex (GPT-5.4 High) <codex@openai.com>
We automatically convert to and upload the necessary data.
Also, document its `max_wait_time` parameter.
@seankhl seankhl merged commit 7b691fe into main Mar 24, 2026
2 checks passed
@seankhl seankhl deleted the feat/new-design branch March 24, 2026 20:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants