Implement Well-Defined Return Schemas Using Pydantic Models

## Title: Implement Well-Defined Return Schemas Using Pydantic Models

## Description:

### Description

Currently, Rompy uses dictionaries to return results from various functions such as `postprocess()`, `pipeline()`, and other operations. While functional, this approach lacks type safety, clear documentation, and consistency across the codebase. This issue proposes implementing well-defined return schemas using Pydantic models to improve API consistency, developer experience, and maintainability.

### Current State
Rompy already uses Pydantic extensively for configuration objects (e.g., `ModelRun`, `LocalConfig`, `DockerConfig`, etc.) but does not use Pydantic for return values. Functions like `postprocess()` and `pipeline()` return dictionaries with varying structures:

```python
def postprocess(self, processor: str = "noop", **kwargs) -> Dict[str, Any]:
    return {
        "success": True,
        "message": "No postprocessing requested - validation only",
        "run_id": model_run.run_id,
        "output_dir": str(check_dir),
        "validated": validate_outputs,
    }
```

### Advantages of Implementing Structured Response Schemas

1. **Type Safety**: Pydantic models provide runtime type validation, preventing type-related errors at runtime.

2. **Documentation**: Schema definitions serve as self-documenting code, making it clear what fields are returned and their expected types.

3. **Consistency**: Standardized return types across the codebase improve API predictability and reduce cognitive load for developers.

4. **IDE Support**: Better autocomplete and type hints in development environments.

5. **API Evolution**: Easier to track changes to return values and maintain backward compatibility.

6. **Validation**: Built-in validation for complex return structures ensures data integrity.

### Specific Use Cases and Examples

#### 1. Postprocess Result Schema
Current return in `postprocess()`:
```python
return {
    "success": True,
    "message": "No postprocessing requested - validation only",
    "run_id": model_run.run_id,
    "output_dir": str(check_dir),
    "validated": validate_outputs,
}
```

Proposed Pydantic schema:
```python
from pydantic import BaseModel, Field
from typing import Optional, Dict, Any

class PostprocessResult(BaseModel):
    success: bool = Field(..., description="Whether the postprocessing was successful")
    run_id: str = Field(..., description="The run ID associated with the postprocessing")
    output_dir: str = Field(..., description="Path to the output directory")
    validated: bool = Field(..., description="Whether validation was performed")
    message: Optional[str] = Field(None, description="Optional message about the operation")
    error: Optional[str] = Field(None, description="Error message if success is False")
    metadata: Optional[Dict[str, Any]] = Field(None, description="Additional metadata")
```

#### 2. Pipeline Result Schema
Current return in `pipeline()`:
```python
return {
    "success": False,
    "run_id": model_run.run_id,
    "stages_completed": [],
    "run_backend": run_backend,
    "processor": processor,
    # ... more fields depending on execution path
}
```

Proposed Pydantic schema:
```python
from enum import Enum
from datetime import datetime
from typing import List

class PipelineStage(str, Enum):
    GENERATE = "generate"
    RUN = "run"
    POSTPROCESS = "postprocess"

class PipelineResult(BaseModel):
    success: bool = Field(..., description="Whether the pipeline completed successfully")
    run_id: str = Field(..., description="The run ID associated with the pipeline")
    stages_completed: List[PipelineStage] = Field(
        default_factory=list, description="List of completed pipeline stages"
    )
    run_backend: str = Field(..., description="Backend used for the run stage")
    processor: str = Field(..., description="Processor used for the postprocess stage")
    staging_dir: Optional[str] = Field(None, description="Path to the staging directory")
    run_success: Optional[bool] = Field(None, description="Whether the run stage was successful")
    postprocess_results: Optional[Dict[str, Any]] = Field(
        None, description="Results from the postprocess stage"
    )
    stage: Optional[str] = Field(None, description="Stage where failure occurred, if any")
    message: Optional[str] = Field(None, description="Optional message about the operation")
    error: Optional[str] = Field(None, description="Error message if success is False")
    start_time: Optional[datetime] = Field(None, description="Pipeline start time")
    end_time: Optional[datetime] = Field(None, description="Pipeline end time")
    duration_seconds: Optional[float] = Field(None, description="Pipeline execution duration in seconds")
```

#### 3. Model Run Result Schema
Current approach in `ModelRun.run()` returns boolean, but could return structured data:
```python
class ModelRunResult(BaseModel):
    success: bool = Field(..., description="Whether the model run was successful")
    run_id: str = Field(..., description="The run ID")
    output_dir: str = Field(..., description="Path to output directory")
    start_time: datetime = Field(..., description="Model run start time")
    end_time: datetime = Field(..., description="Model run end time")
    duration_seconds: float = Field(..., description="Model run duration in seconds")
    backend_used: str = Field(..., description="Backend used for execution")
    workspace_dir: Optional[str] = Field(None, description="Workspace directory used")
    message: Optional[str] = Field(None, description="Optional message about the run")
    error: Optional[str] = Field(None, description="Error message if success is False")
```

### Implementation Recommendations

1. **Create Response Schema Module**: Create a new module `src/rompy/core/responses.py` to house all response Pydantic models.

2. **Gradual Migration**: Implement the new schemas incrementally, starting with the most critical return values (pipeline, postprocess).

3. **Backward Compatibility**: Initially maintain both dictionary and Pydantic return options with a configuration flag, then deprecate dictionary returns in a future major version.

4. **Update Type Hints**: Update all function signatures to return the appropriate Pydantic models.

5. **Documentation**: Update docstrings to reflect the new structured return values.

6. **Testing**: Add tests to verify the structure and validation of response schemas.

### Potential Impact on Codebase

1. **Breaking Changes**: This will be a breaking change for users who rely on specific dictionary structures in return values.

2. **Improved Reliability**: Better type safety will reduce runtime errors related to unexpected return value structures.

3. **Code Clarity**: More explicit return value definitions will improve code maintainability.

4. **Migration Effort**: Existing code that processes return values will need to be updated to work with Pydantic models instead of raw dictionaries.

### References to Existing Code Patterns

The codebase already extensively uses Pydantic for configuration objects (e.g., `LocalConfig`, `DockerConfig`, `ModelRun`, etc.) in `/src/rompy/backends/config.py` and `/src/rompy/core/types.py`. This proposal extends the same pattern to return values, maintaining consistency with the existing architecture.

The `RompyBaseModel` class in `/src/rompy/core/types.py` already provides a foundation for Pydantic models in the codebase with custom configuration settings like `protected_namespaces=()` and `extra="forbid"`.

## Implementation Steps

1. Create response schema models
2. Update function return types
3. Update documentation and examples
4. Add tests for response schemas
5. Update CLI to handle Pydantic responses
6. Provide migration guide for users

This enhancement will significantly improve the developer experience and API consistency of the Rompy library while maintaining its existing Pydantic-based configuration architecture.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Well-Defined Return Schemas Using Pydantic Models #15

Title: Implement Well-Defined Return Schemas Using Pydantic Models

Description:

Description

Current State

Advantages of Implementing Structured Response Schemas

Specific Use Cases and Examples

1. Postprocess Result Schema

2. Pipeline Result Schema

3. Model Run Result Schema

Implementation Recommendations

Potential Impact on Codebase

References to Existing Code Patterns

Implementation Steps

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implement Well-Defined Return Schemas Using Pydantic Models #15

Description

Title: Implement Well-Defined Return Schemas Using Pydantic Models

Description:

Description

Current State

Advantages of Implementing Structured Response Schemas

Specific Use Cases and Examples

1. Postprocess Result Schema

2. Pipeline Result Schema

3. Model Run Result Schema

Implementation Recommendations

Potential Impact on Codebase

References to Existing Code Patterns

Implementation Steps

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions