Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
e1bcd88
Initial implementation
tomdurrant Sep 30, 2025
6722b64
Polish and tests
tomdurrant Sep 30, 2025
3537db0
Replaced subprocess docker calls with docker python library
tomdurrant Sep 11, 2025
dbdc2be
Update tests/integration/test_docker_backend.py
tomdurrant Sep 11, 2025
9d357b6
Fixed failing test
tomdurrant Sep 11, 2025
e953b15
log_box imported twice
rafa-guedes Oct 1, 2025
7e691c1
Replace deprecated utcnow
rafa-guedes Oct 1, 2025
ac72904
Suppressing numpy incompatibility warnings
rafa-guedes Oct 2, 2025
7432dcc
Definitive fix for the numpy warning in the tests
rafa-guedes Oct 2, 2025
2499cf0
Run ruff across the repo
rafa-guedes Oct 3, 2025
ccde632
Add to extra dependencies the remote dependencies for cloudpathlib
rafa-guedes Oct 9, 2025
33c149d
Merge branch 'main' into slurm-backend
tomdurrant Oct 20, 2025
5367959
Added slurm examples
tomdurrant Oct 20, 2025
516d1e8
Clened up example backends
tomdurrant Oct 20, 2025
c0e1898
Fixed loging in dockers
tomdurrant Oct 20, 2025
d2260e5
Added basic backed run examples
tomdurrant Oct 20, 2025
d6c9d93
fixed testing
tomdurrant Oct 20, 2025
5bedde0
Address comments in PR
tomdurrant Oct 29, 2025
2000dd5
Address incomplete implementation of command in slurm config
tomdurrant Dec 5, 2025
1c8cedf
Resolved conflicts with main
tomdurrant Dec 5, 2025
1a9f3dd
added missing imports, command parameter is required
benjaminleighton Dec 11, 2025
7c530c1
remove duplicates in pyproj toml
benjaminleighton Dec 11, 2025
594c801
fixed warning / error on slashes in f string
benjaminleighton Dec 11, 2025
c711630
fixed tabs to spaces
benjaminleighton Dec 11, 2025
c253153
Changed integration tests to unit tests with mocks
tomdurrant Dec 12, 2025
023e277
removed gpu requirements, must specify output directory for basic tes…
benjaminleighton Dec 15, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
400 changes: 400 additions & 0 deletions examples/backends/05_slurm_backend_run.py

Large diffs are not rendered by default.

167 changes: 62 additions & 105 deletions examples/backends/README.md
Original file line number Diff line number Diff line change
@@ -1,129 +1,86 @@
# Backend Examples
# ROMPY SLURM Backend Examples

This directory contains examples demonstrating how to use ROMPY's backend configuration system to execute models in different environments.
This directory contains examples of how to use ROMPY with SLURM for HPC cluster execution.

## Overview
## Examples

ROMPY uses Pydantic-based backend configurations to provide type-safe, validated execution parameters for different environments. This system enables precise control over model execution while maintaining flexibility and extensibility.
### 05_slurm_backend_run.py
A comprehensive tutorial showing different ways to configure and use the SLURM backend:

## Available Examples
- Basic SLURM execution
- Advanced SLURM configuration with multiple parameters
- Custom commands on SLURM
- Creating configurations from dictionaries
- Configuration validation

### 1. Basic Local Run (`01_basic_local_run.py`)
Demonstrates the simplest use case:
- Local execution with `LocalConfig`
- Basic timeout and command configuration
- No-op postprocessing

### 2. Docker Run (`02_docker_run.py`)
Shows Docker container execution:
- Using pre-built Docker images
- Volume mounting for data access
- Environment variable configuration
- Resource limits (CPU, memory)

### 3. Custom Postprocessor (`03_custom_postprocessor.py`)
Illustrates custom postprocessing:
- Creating custom postprocessor classes
- Processing model outputs after execution
- Error handling and result reporting

### 4. Complete Workflow (`04_complete_workflow.py`)
Demonstrates a full workflow:
- Model execution with local backend
- Custom postprocessing with file analysis
- Comprehensive logging and error handling

## Backend Configuration Types

### LocalConfig
For execution on the local system:
```python
from rompy.backends import LocalConfig

config = LocalConfig(
timeout=3600, # 1 hour
command="python run_model.py",
env_vars={"OMP_NUM_THREADS": "4"},
shell=True,
capture_output=True
)
```

### DockerConfig
For execution in Docker containers:
```python
from rompy.backends import DockerConfig

config = DockerConfig(
image="python:3.9-slim",
cpu=2,
memory="2g",
timeout=7200,
volumes=["/data:/app/data:rw"],
env_vars={"MODEL_CONFIG": "production"}
)
Run the example:
```bash
python 05_slurm_backend_run.py
```

## Running the Examples
### basic_model_run.py
Creates a basic ModelRun configuration that can be used to test different backend configurations. This provides a consistent model configuration that works across all backends.

Each example can be run directly:
### test_backends_with_modelrun.py
Demonstrates using the basic ModelRun with different backend configurations (Local, Docker, SLURM). This example shows how the same model run can be configured to work across different execution environments.

Run the example:
```bash
# Basic local execution
python 01_basic_local_run.py
python test_backends_with_modelrun.py
```

# Docker execution (requires Docker)
python 02_docker_run.py
## Configuration Files

# Custom postprocessing
python 03_custom_postprocessor.py
### slurm_backend.yml
A basic configuration file for running jobs on SLURM with minimal parameters.

# Complete workflow
python 04_complete_workflow.py
```
### slurm_backend_examples.yml
A collection of different SLURM configuration examples:
- Basic SLURM configuration
- Advanced GPU job configuration
- High-memory job configuration
- Custom working directory configuration

## Key Features

- **Type Safety**: All configurations are validated using Pydantic
- **IDE Support**: Full autocompletion and inline documentation
- **Flexibility**: Easy to extend with custom backends and postprocessors
- **Error Handling**: Clear validation errors and execution feedback
- **Serialization**: Configurations can be saved/loaded as YAML/JSON

## Configuration Validation
The ROMPY SLURM backend supports:

Backend configurations provide comprehensive validation:
- Timeout values must be between 60 and 86400 seconds
- Working directories must exist if specified
- Docker image names must follow valid conventions
- Volume mounts must reference existing host paths
- **Resource allocation**: Specify nodes, tasks, and CPU cores
- **Queue/partition selection**: Run on different SLURM partitions
- **Time limits**: Set job time limits in HH:MM:SS format
- **Environment variables**: Set environment variables for your job
- **Job notifications**: Email notifications on job start/end/failure
- **Custom commands**: Run custom commands instead of the default model run
- **Additional SLURM options**: Pass any additional SLURM options via `additional_options`
- **GPU resources**: Support for GPU allocation via `--gres` options

## Best Practices
## Usage

1. **Set appropriate timeouts** based on your model complexity
2. **Use environment variables** for sensitive configuration
3. **Validate configurations** before execution
4. **Handle errors gracefully** in your postprocessors
5. **Use resource limits** appropriately in Docker configurations
To use the SLURM backend in your application:

## Output Structure
```python
from rompy.backends import SlurmConfig
from rompy.model import ModelRun

# Create SLURM configuration
config = SlurmConfig(
queue="gpu", # SLURM partition
nodes=2, # Number of nodes
ntasks=8, # Number of tasks
cpus_per_task=4, # CPU cores per task
time_limit="02:00:00", # Time limit
account="research_project", # Account for billing
additional_options=["--gres=gpu:v100:2"], # GPU allocation
)

All examples create output in the `./output` directory with the following structure:
# Create and run your model
model = ModelRun(...)
model.run(backend=config)
```
output/
├── <run_id>/
│ ├── INPUT # Generated model input file
│ ├── datasets/ # Placeholder for input datasets
│ ├── outputs/ # Placeholder for model outputs
│ └── <additional files> # Any files created during execution
```

## Extending the Examples

You can extend these examples by:
- Creating custom backend configurations
- Implementing custom postprocessors
- Adding new execution environments
- Integrating with workflow orchestration systems
## Validation

For more detailed information, see the [Backend Configurations documentation](../../docs/source/backend_configurations.rst).
The SLURM backend includes comprehensive validation:
- Time limit format validation (HH:MM:SS)
- Bounds checking for nodes, CPUs, etc.
- Required field validation
57 changes: 57 additions & 0 deletions examples/backends/basic_model_run.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
#!/usr/bin/env python3
"""
Basic ModelRun Configuration for Backend Testing

This script creates a simple ModelRun configuration that can be used to test
different backend configurations (local, docker, slurm).
"""

import tempfile
from datetime import datetime
from pathlib import Path

from rompy.core.time import TimeRange
from rompy.model import ModelRun


def create_basic_model_run():
"""
Create a basic model run configuration for testing backends.
This creates a minimal model run that can execute a simple command
using different backends.
"""
# Create a temporary directory for output
temp_dir = Path(tempfile.mkdtemp(prefix="rompy_test_"))

# Create a basic model run
model_run = ModelRun(
run_id="test_backend_run",
period=TimeRange(
start=datetime(2023, 1, 1),
end=datetime(2023, 1, 2),
interval="1H",
),
output_dir=temp_dir,
delete_existing=True,
)

return model_run


if __name__ == "__main__":
# Create the basic model run
model = create_basic_model_run()

print("Basic ModelRun Configuration Created")
print("="*40)
print(f"Run ID: {model.run_id}")
print(f"Output Directory: {model.output_dir}")
print(f"Time Period: {model.period.start} to {model.period.end}")
print(f"Time Interval: {model.period.interval}")
print(f"Delete Existing: {model.delete_existing}")
print()
print("This basic configuration can be used to test different backends.")
print("For example:")
print(" - Local backend: Executes commands on the local machine")
print(" - Docker backend: Runs commands in Docker containers")
print(" - SLURM backend: Submits jobs to HPC clusters")
Loading