Skip to content

Conversation

@djriffle
Copy link
Member

This pull request adds support for specifying an optional reference dataset in the CLI workflow, improving flexibility for users who need to work with both a primary and reference dataset. The changes include updates to command-line options, user prompts, resource management, and sandbox setup to ensure both datasets are handled correctly.

Support for Reference Dataset:

  • Added a new --reference-dataset (-ref) CLI option to allow users to specify an optional reference dataset file, with corresponding updates to user prompts and context handling. [1] [2]
  • Introduced a static in-container path (SANDBOX_REF_DATA_PATH) for the reference dataset, ensuring consistent handling within the sandbox environment.

Resource and Sandbox Management:

  • Updated resource collection and mounting logic to include the reference dataset if provided, ensuring it is available inside the sandbox container. [1] [2]
  • Modified sandbox setup and data copying procedures to bind mount or copy both the primary and reference datasets as needed for both Docker and Singularity backends. [1] [2]

User Experience Improvements:

  • Improved dataset prompts to clarify the distinction between primary and reference datasets, enhancing usability.

These changes collectively make it easier for users to work with workflows that require both a primary and a reference dataset, while maintaining backward compatibility and clear user interaction.

@djriffle djriffle merged commit 0552704 into main Aug 22, 2025
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants