Skip to content

Conversation

@benedekaibas
Copy link

Enhance Type Safety and YAML Schema Validation

Description

This PR introduces a structured validation layer for loading YAML configuration files, specifically addressing the "Any" loophole that allows malformed data to cause runtime crashes.

The Problem

Previously, GatorGrade loaded YAML data as an Any type. If a user provided a list where a string was expected (e.g., in a description field), the error would propagate silently through main.py and parse_config.py, only crashing with a cryptic TypeError deep inside the recursive logic of in_file_path.py.

The Solution

  • Validation Layer: Implemented a chain of validation steps (using Pydantic models) that checks the structure and types of the YAML data immediately after loading.
  • Better Error Message: The program now exits with a clear, user-friendly error message at the parsing stage if the configuration is invalid, preventing deep-logic crashes.
  • Concrete Type Hinting: Replaced Any with structured types/models in the in_file_path.py This enables static analysis tools like Mypy to verify the code's logic during development.

I have chosen Pydantic because:

  • Its models serve as both runtime validators and static type hints. This directly solves the Any type issue while providing the schema enforcement we need.
  • Pydantic has excellent support for type checkers, ensuring that the safety gained at the parsing stage carries through to the rest of the application.

If we run uv run gatorgrade --config yaml_type_safety_test.yml after my applied changes we get the following result:

Configuration Error in: yaml_type_safety_test.yml
The YAML data does not match the required format.
Details: 1 validation error for GatorCheck
description
  Input should be a valid string [type=string_type, input_value=['This', 'is', 'a', 'list', 'not', 'a', 'string'], input_type=list]
    For further information visit https://errors.pydantic.dev/2.12/v/string_type

Now we are also able to analyze files with different type checkers.

I have implemented the following code snippet for gatorgrade/input/in_file_path.py -before my changes- and ran mypy on it:

def add_checks_to_list(
    path: Optional[str], data_list: List[Any], reformatted_data: List[Any]
) -> None:
    current_path = path
    for check_item in data_list:
        # INTRODUCED TYPE ERROR:
        # Trying to call a string method on the item.
        # Since check_item is 'Any', Mypy remains silent.
        print(check_item.lower()) 

        ddict = check_item # Assuming it's already a dict in the old version
        for item in ddict:
            pass

The output of mypy: Success: no issues found in 1 source file

This code examples clearly states that before my changes it was not possible to detect type related errors in the codebase.

I have implemented the same "buggy" code after my changes and this is the result I have got:

def add_checks_to_list(
    path: Optional[str], data_list: List[GatorCheck], reformatted_data: List[Any]
) -> None:
    current_path = path
    for check_item in data_list:
        # INTRODUCED TYPE ERROR:
        # I am still trying to call '.lower()' on the object.
        # Mypy now knows check_item is a 'GatorCheck' model and flags this.
        print(check_item.lower()) 

        # Another Logic Error: Trying to treat a string attribute as a list
        check_item.description.append("This will fail")

        ddict = check_item.model_dump()
        for item in ddict:
            # ... existing recursive logic remains the same ...
            if isinstance(ddict[item], list):
                pass

The output of mypy:

in_file_path.py:56: error: "GatorCheck" has no attribute "lower"  [attr-defined]
in_file_path.py:59: error: "str" has no attribute "append"  [attr-defined]
Found 2 errors in 1 file (checked 1 source file)

Now we are able to detect type related errors in the codebase. To replicate the changes copy the two code examples into the in_file_path.py file (one for the main repository and the other one for my fork) and test it with using the latest version of mypy that I have used to test it.

As of right now, I have only did the changes for the in_file_path.py file to show that by simply replacing Any with a concrete model, the current implementation immediately benefits from static analysis, allowing type checkers to catch type related errors that were previously invisible during development.

Linked Issues

closes: #175

Type of Change

  • Feature
  • Bug fix
  • Documentation

Contributors

Reminder

  • All GitHub Actions should be in a passing state before any pull request is merged.
  • All PRs must be reviewed by at least one team member and one member of the Integration team!
  • Any issues this PR closes are tagged in the description!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Type checkers cannot analyze configuration files because YAML is treated as Any

1 participant