Skip to content

Implement Well-Defined Return Schemas Using Pydantic Models #15

@tomdurrant

Description

@tomdurrant

Title: Implement Well-Defined Return Schemas Using Pydantic Models

Description:

Description

Currently, Rompy uses dictionaries to return results from various functions such as postprocess(), pipeline(), and other operations. While functional, this approach lacks type safety, clear documentation, and consistency across the codebase. This issue proposes implementing well-defined return schemas using Pydantic models to improve API consistency, developer experience, and maintainability.

Current State

Rompy already uses Pydantic extensively for configuration objects (e.g., ModelRun, LocalConfig, DockerConfig, etc.) but does not use Pydantic for return values. Functions like postprocess() and pipeline() return dictionaries with varying structures:

def postprocess(self, processor: str = "noop", **kwargs) -> Dict[str, Any]:
    return {
        "success": True,
        "message": "No postprocessing requested - validation only",
        "run_id": model_run.run_id,
        "output_dir": str(check_dir),
        "validated": validate_outputs,
    }

Advantages of Implementing Structured Response Schemas

  1. Type Safety: Pydantic models provide runtime type validation, preventing type-related errors at runtime.

  2. Documentation: Schema definitions serve as self-documenting code, making it clear what fields are returned and their expected types.

  3. Consistency: Standardized return types across the codebase improve API predictability and reduce cognitive load for developers.

  4. IDE Support: Better autocomplete and type hints in development environments.

  5. API Evolution: Easier to track changes to return values and maintain backward compatibility.

  6. Validation: Built-in validation for complex return structures ensures data integrity.

Specific Use Cases and Examples

1. Postprocess Result Schema

Current return in postprocess():

return {
    "success": True,
    "message": "No postprocessing requested - validation only",
    "run_id": model_run.run_id,
    "output_dir": str(check_dir),
    "validated": validate_outputs,
}

Proposed Pydantic schema:

from pydantic import BaseModel, Field
from typing import Optional, Dict, Any

class PostprocessResult(BaseModel):
    success: bool = Field(..., description="Whether the postprocessing was successful")
    run_id: str = Field(..., description="The run ID associated with the postprocessing")
    output_dir: str = Field(..., description="Path to the output directory")
    validated: bool = Field(..., description="Whether validation was performed")
    message: Optional[str] = Field(None, description="Optional message about the operation")
    error: Optional[str] = Field(None, description="Error message if success is False")
    metadata: Optional[Dict[str, Any]] = Field(None, description="Additional metadata")

2. Pipeline Result Schema

Current return in pipeline():

return {
    "success": False,
    "run_id": model_run.run_id,
    "stages_completed": [],
    "run_backend": run_backend,
    "processor": processor,
    # ... more fields depending on execution path
}

Proposed Pydantic schema:

from enum import Enum
from datetime import datetime
from typing import List

class PipelineStage(str, Enum):
    GENERATE = "generate"
    RUN = "run"
    POSTPROCESS = "postprocess"

class PipelineResult(BaseModel):
    success: bool = Field(..., description="Whether the pipeline completed successfully")
    run_id: str = Field(..., description="The run ID associated with the pipeline")
    stages_completed: List[PipelineStage] = Field(
        default_factory=list, description="List of completed pipeline stages"
    )
    run_backend: str = Field(..., description="Backend used for the run stage")
    processor: str = Field(..., description="Processor used for the postprocess stage")
    staging_dir: Optional[str] = Field(None, description="Path to the staging directory")
    run_success: Optional[bool] = Field(None, description="Whether the run stage was successful")
    postprocess_results: Optional[Dict[str, Any]] = Field(
        None, description="Results from the postprocess stage"
    )
    stage: Optional[str] = Field(None, description="Stage where failure occurred, if any")
    message: Optional[str] = Field(None, description="Optional message about the operation")
    error: Optional[str] = Field(None, description="Error message if success is False")
    start_time: Optional[datetime] = Field(None, description="Pipeline start time")
    end_time: Optional[datetime] = Field(None, description="Pipeline end time")
    duration_seconds: Optional[float] = Field(None, description="Pipeline execution duration in seconds")

3. Model Run Result Schema

Current approach in ModelRun.run() returns boolean, but could return structured data:

class ModelRunResult(BaseModel):
    success: bool = Field(..., description="Whether the model run was successful")
    run_id: str = Field(..., description="The run ID")
    output_dir: str = Field(..., description="Path to output directory")
    start_time: datetime = Field(..., description="Model run start time")
    end_time: datetime = Field(..., description="Model run end time")
    duration_seconds: float = Field(..., description="Model run duration in seconds")
    backend_used: str = Field(..., description="Backend used for execution")
    workspace_dir: Optional[str] = Field(None, description="Workspace directory used")
    message: Optional[str] = Field(None, description="Optional message about the run")
    error: Optional[str] = Field(None, description="Error message if success is False")

Implementation Recommendations

  1. Create Response Schema Module: Create a new module src/rompy/core/responses.py to house all response Pydantic models.

  2. Gradual Migration: Implement the new schemas incrementally, starting with the most critical return values (pipeline, postprocess).

  3. Backward Compatibility: Initially maintain both dictionary and Pydantic return options with a configuration flag, then deprecate dictionary returns in a future major version.

  4. Update Type Hints: Update all function signatures to return the appropriate Pydantic models.

  5. Documentation: Update docstrings to reflect the new structured return values.

  6. Testing: Add tests to verify the structure and validation of response schemas.

Potential Impact on Codebase

  1. Breaking Changes: This will be a breaking change for users who rely on specific dictionary structures in return values.

  2. Improved Reliability: Better type safety will reduce runtime errors related to unexpected return value structures.

  3. Code Clarity: More explicit return value definitions will improve code maintainability.

  4. Migration Effort: Existing code that processes return values will need to be updated to work with Pydantic models instead of raw dictionaries.

References to Existing Code Patterns

The codebase already extensively uses Pydantic for configuration objects (e.g., LocalConfig, DockerConfig, ModelRun, etc.) in /src/rompy/backends/config.py and /src/rompy/core/types.py. This proposal extends the same pattern to return values, maintaining consistency with the existing architecture.

The RompyBaseModel class in /src/rompy/core/types.py already provides a foundation for Pydantic models in the codebase with custom configuration settings like protected_namespaces=() and extra="forbid".

Implementation Steps

  1. Create response schema models
  2. Update function return types
  3. Update documentation and examples
  4. Add tests for response schemas
  5. Update CLI to handle Pydantic responses
  6. Provide migration guide for users

This enhancement will significantly improve the developer experience and API consistency of the Rompy library while maintaining its existing Pydantic-based configuration architecture.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions