Conversation
WalkthroughIntroduces a new ONNX validation module for time-series forecasting, supporting model download, inference, and metrics; adds config and environment files; provides an SVR-to-ONNX training utility; updates base interfaces and validation runner for task-scoped modules and test mode; adds an end-to-end ONNX test; and a VSCode setting. Changes
Sequence Diagram(s)sequenceDiagram
participant Caller
participant ValidationRunner
participant ONNXModule
participant HFHub
participant HTTP as DataSource (HTTP)
participant ORT as ONNX Runtime
Caller->>ValidationRunner: perform_validation(task_id, module="onnx", input_data)
ValidationRunner->>ONNXModule: instantiate (per task_id, config)
ValidationRunner->>ONNXModule: validate(input_data, task_id)
ONNXModule->>HFHub: download model (repo_id, filename, rev)
HFHub-->>ONNXModule: model path
ONNXModule->>ORT: create InferenceSession(model)
ONNXModule->>HTTP: GET test_data_url
HTTP-->>ONNXModule: CSV data
ONNXModule->>ONNXModule: preprocess/split features, target
ONNXModule->>ORT: run inference(features)
ORT-->>ONNXModule: predictions
ONNXModule->>ONNXModule: compute metrics (mae, rmse, mape, ...)
ONNXModule-->>ValidationRunner: ONNXMetrics
ValidationRunner-->>Caller: result
sequenceDiagram
participant Test as test_onnx.py
participant Runner as ValidationRunner(test_mode=True)
participant Module as ONNXModule
participant Mocks as Mocked deps
Test->>Mocks: patch requests.get, FedLedger
Test->>Runner: perform_validation(...)
alt error occurs
Module-->>Runner: raise Recoverable/ValueError
Runner-->>Test: return None (no exit due to test_mode)
else success
Module-->>Runner: ONNXMetrics
Runner-->>Test: metrics
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 14
🔭 Outside diff range comments (1)
validator/modules/base.py (1)
30-35: Validate return type consistency with base implementationThe method signature indicates it returns
BaseMetrics, but the base implementation returnsNone(viapass). This could cause type checking issues. Consider either making it abstract or providing a proper base implementation.- @abstractmethod def validate(self, data: BaseInputData, **kwargs) -> BaseMetrics: """ Download/prep the repo/revision, run validation, and return metrics parsed into a Pydantic model. """ - pass + raise NotImplementedError("Subclasses must implement validate()")
🧹 Nitpick comments (12)
configs/onnx.json (1)
7-7: Add trailing newline to JSON fileJSON files should end with a newline character for better compatibility with various tools and to follow common conventions.
"evaluation_metrics": ["mae", "rmse", "mape", "smape"], "output_dir": "/tmp/onnx_validation" -} +} +validator/modules/base.py (1)
17-21: Consider making task_type a class-level constantThe
task_typeattribute should be defined as a class-level constant (e.g.,task_type: ClassVar[str]) since it's meant to identify the module type rather than being an instance attribute.+from typing import ClassVar from abc import ABC, abstractmethod from pydantic import BaseModel class BaseValidationModule(ABC): config_schema: type[BaseConfig] input_data_schema: type[BaseInputData] metrics_schema: type[BaseMetrics] - task_type: str + task_type: ClassVar[str]validator/modules/onnx/train_svm.py (2)
129-131: Redundant traceback importThe
tracebackmodule is imported twice (lines 129 and 161). Consider importing it once at the module level.Add at the top of the file with other imports:
import pandas as pd import numpy as np +import traceback from sklearn.svm import SVRThen remove the local imports:
except Exception as e: print(f"Error: Could not convert to ONNX format: {e}") - import traceback - traceback.print_exc()And:
except Exception as e: print(f"Error during training: {e}") - import traceback - traceback.print_exc()
143-145: Consider using sys.exit for error conditionsWhen the training data is not found, the function returns silently. Consider using
sys.exit(1)to indicate failure more explicitly.+ import sys + if not csv_path.exists(): print(f"Error: Training data not found at {csv_path}") - return + sys.exit(1)validator/modules/onnx/__init__.py (5)
4-4: Remove unused imports flagged by static analysisBoth imports are unused and flagged by Ruff (F401). Drop them to keep the module clean.
-from pydantic import BaseModel @@ -import osAlso applies to: 14-14
92-93: Force CPU provider to avoid environment-specific provider issuesExplicitly set providers to ensure consistent behavior (e.g., avoid CUDA provider selection on systems without GPU).
- self.model = ort.InferenceSession(model_path) + self.model = ort.InferenceSession(model_path, providers=["CPUExecutionProvider"])
74-79: Remove unuseddata_processorattribute and related cleanupIt’s never used; removing it simplifies the module and avoids confusion.
def __init__(self, config: ONNXConfig, **kwargs): """Initialize the OONX validation module""" self.config = config self.model = None - self.data_processor = None @@ def cleanup(self): """Clean up resources""" if self.model is not None: del self.model self.model = None - if self.data_processor is not None: - del self.data_processor - self.data_processor = NoneAlso applies to: 284-287
243-271: Minor: Validate output and shape alignment before metric computationDouble-check shapes to avoid subtle mismatches before computing metrics. Optional, but protects against unexpected ONNX output shapes.
You already verify lengths match pre-inference; after inference, add an assert (or explicit check) that predictions shape matches ground truth shape exactly before flattening and computing metrics. If desired, I can provide a small helper to enforce this.
33-46: Naming/consistency: keep module “task_type” consistent with input “task_type”The module declares task_type = "time_series_forecasting" while ONNXInputData carries a free-form task_type string consumed in metric selection. Consider either:
- Normalize incoming task_type values to a known set; or
- Drive metric selection purely from config/required_metrics for determinism.
Also applies to: 66-73
test_onnx.py (3)
1-1: Remove unused imports to satisfy Ruff and keep test leanThese imports are unused.
-import numpy as np @@ -from validator.modules.onnx import ( - ONNXValidationModule, - ONNXConfig, - ONNXInputData, - ONNXMetrics, -) +from validator.modules.onnx import ONNXInputDataAlso applies to: 13-17
57-62: Prefer groupby+transform for rolling features to preserve index alignmentUsing rolling().mean().values relies on positional alignment. transform keeps index alignment explicit and is more idiomatic.
- for window in [3, 7, 14]: - df[f"number_sold_rolling_{window}"] = ( - df.groupby(["store", "product"])["number_sold"] - .rolling(window=window) - .mean() - .values - ) + for window in [3, 7, 14]: + df[f"number_sold_rolling_{window}"] = ( + df.groupby(["store", "product"])["number_sold"] + .transform(lambda s: s.rolling(window=window).mean()) + )
153-170: Strengthen assertions instead of print-based checksReplace prints with assertions to make the test actionable and CI-friendly. For example, assert metrics is not None and individual metrics are finite/non-negative.
If you’d like, I can provide a pytest-style version that asserts MAE/RMSE are finite and within a loose bound.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (7)
.DS_Storeis excluded by!**/.DS_Storevalidator/.DS_Storeis excluded by!**/.DS_Storevalidator/modules/.DS_Storeis excluded by!**/.DS_Storevalidator/modules/onnx/.DS_Storeis excluded by!**/.DS_Storevalidator/modules/onnx/demo_data/.DS_Storeis excluded by!**/.DS_Storevalidator/modules/onnx/demo_data/test.csvis excluded by!**/*.csvvalidator/modules/onnx/demo_data/train.csvis excluded by!**/*.csv
📒 Files selected for processing (8)
.vscode/settings.json(1 hunks)configs/onnx.json(1 hunks)test_onnx.py(1 hunks)validator/modules/base.py(3 hunks)validator/modules/onnx/__init__.py(1 hunks)validator/modules/onnx/environment.yml(1 hunks)validator/modules/onnx/train_svm.py(1 hunks)validator/validation_runner.py(5 hunks)
🧰 Additional context used
🪛 YAMLlint (1.37.1)
validator/modules/onnx/environment.yml
[warning] 11-11: wrong indentation: expected 6 but found 4
(indentation)
[error] 12-12: no new line character at the end of file
(new-line-at-end-of-file)
🪛 Ruff (0.12.2)
validator/modules/onnx/train_svm.py
19-19: f-string without any placeholders
Remove extraneous f prefix
(F541)
test_onnx.py
1-1: numpy imported but unused
Remove unused import: numpy
(F401)
13-13: validator.modules.onnx.ONNXValidationModule imported but unused
Remove unused import
(F401)
14-14: validator.modules.onnx.ONNXConfig imported but unused
Remove unused import
(F401)
16-16: validator.modules.onnx.ONNXMetrics imported but unused
Remove unused import
(F401)
validator/modules/onnx/__init__.py
4-4: pydantic.BaseModel imported but unused
(F401)
14-14: os imported but unused
(F401)
99-99: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
153-153: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
277-277: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
🔇 Additional comments (4)
validator/modules/base.py (1)
5-6: LGTM! Good use of frozen models for immutabilityUsing
frozen=Trueon the base classes ensures configuration and data integrity throughout the validation pipeline, preventing accidental mutations.validator/validation_runner.py (3)
9-9: Public import surface looks goodExposing only BaseValidationModule, BaseInputData, BaseMetrics simplifies dependency boundaries.
25-26: test_mode flag addition is appropriateThis enables non-exiting behavior during tests; good addition for testability.
72-72: Passing task_id into validate aligns with per-task module isolationLooks correct with updated module API.
| { | ||
| "githubPullRequests.ignoredPullRequestBranches": [ | ||
| "main" | ||
| ] | ||
| } No newline at end of file |
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
Consider adding this file to .gitignore
IDE-specific configuration files like .vscode/settings.json are typically excluded from version control as they contain user-specific preferences. Consider adding .vscode/ to .gitignore to avoid committing personal workspace settings.
🏁 Script executed:
#!/bin/bash
# Description: Check if .vscode is already in .gitignore
# Check if .gitignore exists and if it contains .vscode
if [ -f .gitignore ]; then
echo "=== Current .gitignore content related to .vscode ==="
grep -E "^\.vscode|^\*\.vscode" .gitignore || echo "No .vscode entry found in .gitignore"
else
echo ".gitignore file does not exist"
fi
# Check if there are other VS Code settings files in the repository
echo -e "\n=== Other VS Code configuration files in the repository ==="
fd -t f . .vscode --exec echo "{}"Length of output: 462
Action: Add .vscode/ to .gitignore or remove committed workspace settings
Verified: .vscode/settings.json is tracked in the repo and .gitignore does not contain a .vscode entry. Workspace-specific VS Code settings are typically not committed.
Files to address:
- .vscode/settings.json — decide whether this should be shared. If not, remove it from the repo.
- .gitignore — add an entry to ignore VS Code workspace settings.
Suggested change (add to .gitignore):
+.vscode/
If you want to remove the tracked file after updating .gitignore:
- git rm --cached .vscode/settings.json
- git commit -m "Remove user VS Code settings and ignore .vscode/"
🤖 Prompt for AI Agents
.vscode/settings.json around lines 1 to 5: workspace-specific VS Code settings
are committed but should not be tracked; add ".vscode/" to .gitignore to ignore
workspace settings, then remove the tracked file from the index with git rm
--cached .vscode/settings.json and commit (suggest commit message: "Remove user
VS Code settings and ignore .vscode/"), or if these settings must be shared
intentionally, remove the .vscode/ entry from .gitignore and keep the file
committed.
| def test_onnx_validation_works(mock_requests, mock_fedledger): | ||
| """Test that ONNX validation can complete successfully using real HuggingFace model""" | ||
|
|
||
| test_csv = load_and_preprocess_demo_data() | ||
| if test_csv is None: | ||
| print("Failed to load demo data") | ||
| return False | ||
|
|
||
| # Mock API | ||
| mock_api = MagicMock() | ||
| mock_api.list_tasks.return_value = [ | ||
| {"id": 1, "task_type": "onnx", "title": "Test", "data": {}} | ||
| ] | ||
| mock_api.mark_assignment_as_failed = MagicMock() | ||
| mock_fedledger.return_value = mock_api | ||
|
|
||
| # Mock HTTP requests for CSV data (use real HuggingFace download for model) | ||
| def mock_get_side_effect(url): | ||
| response = MagicMock() | ||
| response.raise_for_status.return_value = None | ||
| response.text = test_csv # CSV contains both features and target | ||
| return response | ||
|
|
||
| mock_requests.side_effect = mock_get_side_effect | ||
|
|
||
| runner = ValidationRunner( |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Make the test fully offline and deterministic by mocking model download/inference
The test currently performs a real HuggingFace download and actual ONNX inference, which is brittle in CI/offline environments. Mock hf_hub_download to a local test artifact and mock onnxruntime.InferenceSession.run to return plausible outputs.
Example (concise):
from unittest.mock import MagicMock, patch
@patch("validator.validation_runner.FedLedger")
@patch("requests.get")
@patch("validator.modules.onnx.hf_hub_download", return_value="dummy.onnx")
@patch("validator.modules.onnx.ort.InferenceSession")
def test_onnx_validation_works(mock_sess, mock_hf, mock_requests, mock_fedledger):
# Configure InferenceSession mock to output a vector matching input rows
sess = MagicMock()
sess.get_inputs.return_value = [MagicMock(name="input_0")]
# Suppose test_features has shape (N, F); return (N,1) predictions
def run_side_effect(_, feed):
x = next(iter(feed.values()))
return [x[:, :1]]
sess.run.side_effect = run_side_effect
mock_sess.return_value = sess
...Also applies to: 129-145
| @abstractmethod | ||
| def __init__(self, config: BaseConfig, **kwargs): | ||
| """ | ||
| """. |
There was a problem hiding this comment.
Fix incomplete docstring
The docstring appears to be incomplete with just "". at the beginning.
- """.
+ """📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| """. | |
| """ |
🤖 Prompt for AI Agents
In validator/modules/base.py around line 25 the module/class/function docstring
is incomplete and currently contains only '""'. Replace it with a complete,
descriptive triple-quoted docstring that briefly explains the purpose and
responsibilities of the module/class/function, any important
parameters/attributes or return values if applicable, and include a short
example or notes if helpful; ensure proper formatting ("""...""") and
punctuation so linters and documentation tools pick it up.
| except Exception as e: | ||
| print(f"Error loading model from {model_repo_id}: {e}") | ||
| raise ValueError(f"Failed to load ONNX model: {e}") | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
Preserve original exception context with raise ... from e
Exception chaining will significantly improve debuggability and aligns with Ruff B904.
@@
- except Exception as e:
- print(f"Error loading model from {model_repo_id}: {e}")
- raise ValueError(f"Failed to load ONNX model: {e}")
+ except Exception as e:
+ print(f"Error loading model from {model_repo_id}: {e}")
+ raise ValueError(f"Failed to load ONNX model: {e}") from e
@@
- except Exception as e:
- print(f"Error during inference: {e}")
- raise ValueError(f"ONNX inference failed: {e}")
+ except Exception as e:
+ print(f"Error during inference: {e}")
+ raise ValueError(f"ONNX inference failed: {e}") from e
@@
- except Exception as e:
- if isinstance(e, InvalidTimeSeriesDataException):
- raise e
- else:
- raise ValueError(f"Validation failed: {str(e)}")
+ except Exception as e:
+ if isinstance(e, InvalidTimeSeriesDataException):
+ raise e
+ else:
+ raise ValueError(f"Validation failed: {str(e)}") from eAlso applies to: 151-154, 273-278
🧰 Tools
🪛 Ruff (0.12.2)
99-99: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
🤖 Prompt for AI Agents
In validator/modules/onnx/__init__.py around lines 97-100 (and likewise update
occurrences at 151-154 and 273-278), the code catches exceptions and re-raises a
new ValueError without preserving the original exception context; change the
re-raise to use exception chaining (raise ValueError("Failed to load ONNX model:
{e}") from e) so the original exception is attached to the new one, ensuring
full traceback and better debuggability.
| response = requests.get(data_url) | ||
| response.raise_for_status() | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
Add HTTP timeouts and raise recoverable exceptions on CSV load failures
Avoid hanging on network calls and surface data issues as recoverable to align with the module’s error semantics.
@@
- response = requests.get(data_url)
+ response = requests.get(data_url, timeout=30)
response.raise_for_status()
@@
- except Exception as e:
- print(f"Error loading CSV data: {e}")
- return None, None
+ except requests.RequestException as e:
+ # Network-related issues: recoverable after retry
+ raise InvalidTimeSeriesDataException(f"Error fetching CSV data from {data_url}: {e}") from e
+ except pd.errors.ParserError as e:
+ # Bad CSV: recoverable if data source is fixed
+ raise InvalidTimeSeriesDataException(f"Failed to parse CSV data from {data_url}: {e}") from e
+ except Exception as e:
+ raise InvalidTimeSeriesDataException(f"Unexpected error loading CSV data: {e}") from eAlso applies to: 130-133
🤖 Prompt for AI Agents
In validator/modules/onnx/__init__.py around lines 107-109 and 130-133, network
calls to requests.get should include a timeout and CSV parsing failures should
be converted into the module's recoverable exception type: add a reasonable
timeout argument to requests.get (e.g. timeout=10) and wrap the call in
try/except catching requests.Timeout and requests.RequestException to raise the
module's RecoverableError (or the existing recoverable exception class used by
this module) with a descriptive message; likewise wrap CSV load/parsing code in
try/except catching parsing errors (e.g. csv.Error, pandas errors, ValueError)
and raise the same RecoverableError so data issues are surfaced as recoverable
rather than causing an uncaught crash.
| print(f"First few rows:") | ||
| print(df.head()) |
There was a problem hiding this comment.
Remove unnecessary f-string prefix
Line 19 uses an f-string without any placeholders.
- print(f"First few rows:")
+ print("First few rows:")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| print(f"First few rows:") | |
| print(df.head()) | |
| print("First few rows:") | |
| print(df.head()) |
🧰 Tools
🪛 Ruff (0.12.2)
19-19: f-string without any placeholders
Remove extraneous f prefix
(F541)
🤖 Prompt for AI Agents
In validator/modules/onnx/train_svm.py around lines 19 to 20, the first print
uses an f-string without placeholders; remove the unnecessary f prefix so it
becomes a normal string literal (i.e., change print(f"First few rows:") to
print("First few rows:")) and keep the following print(df.head()) unchanged.
| for window in [3, 7, 14]: | ||
| df[f"number_sold_rolling_{window}"] = ( | ||
| df.groupby(["store", "product"])["number_sold"] | ||
| .rolling(window=window) | ||
| .mean() | ||
| .values | ||
| ) |
There was a problem hiding this comment.
Potential alignment issues with rolling window calculations
Using .values after a grouped rolling operation can cause misalignment if the groups have different sizes or if the DataFrame order changes. Consider using .reset_index(drop=True) or .transform() for safer assignment.
for window in [3, 7, 14]:
- df[f"number_sold_rolling_{window}"] = (
- df.groupby(["store", "product"])["number_sold"]
- .rolling(window=window)
- .mean()
- .values
- )
+ df[f"number_sold_rolling_{window}"] = df.groupby(["store", "product"])["number_sold"].transform(
+ lambda x: x.rolling(window=window, min_periods=1).mean()
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for window in [3, 7, 14]: | |
| df[f"number_sold_rolling_{window}"] = ( | |
| df.groupby(["store", "product"])["number_sold"] | |
| .rolling(window=window) | |
| .mean() | |
| .values | |
| ) | |
| for window in [3, 7, 14]: | |
| df[f"number_sold_rolling_{window}"] = df.groupby(["store", "product"])["number_sold"].transform( | |
| lambda x: x.rolling(window=window, min_periods=1).mean() | |
| ) |
🤖 Prompt for AI Agents
In validator/modules/onnx/train_svm.py around lines 40 to 46, the code uses
.values after a grouped rolling mean which can misalign results across groups;
replace that pattern with a group-aligned transform such as
df[f"number_sold_rolling_{window}"] =
df.groupby(["store","product"])["number_sold"].transform(lambda x:
x.rolling(window=window).mean()) so the rolling means stay aligned to the
original index and you avoid using .values.
| if X.shape[0] > 100000: | ||
| subset_size = 100000 | ||
| indices = np.random.choice(X.shape[0], subset_size, replace=False) | ||
| X = X.iloc[indices] | ||
| y = y.iloc[indices] | ||
| print(f"Using {subset_size} samples for training") |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Set random seed for reproducibility
The random sampling for large datasets should use a fixed seed to ensure reproducible results across different training runs.
if X.shape[0] > 100000:
subset_size = 100000
+ np.random.seed(42) # Set seed for reproducibility
indices = np.random.choice(X.shape[0], subset_size, replace=False)
X = X.iloc[indices]
y = y.iloc[indices]
print(f"Using {subset_size} samples for training")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if X.shape[0] > 100000: | |
| subset_size = 100000 | |
| indices = np.random.choice(X.shape[0], subset_size, replace=False) | |
| X = X.iloc[indices] | |
| y = y.iloc[indices] | |
| print(f"Using {subset_size} samples for training") | |
| if X.shape[0] > 100000: | |
| subset_size = 100000 | |
| np.random.seed(42) # Set seed for reproducibility | |
| indices = np.random.choice(X.shape[0], subset_size, replace=False) | |
| X = X.iloc[indices] | |
| y = y.iloc[indices] | |
| print(f"Using {subset_size} samples for training") |
🤖 Prompt for AI Agents
In validator/modules/onnx/train_svm.py around lines 84 to 89 the random sampling
uses np.random.choice without a fixed seed which makes training
non-reproducible; modify the sampling to use a deterministic RNG (for example
create rng = np.random.default_rng(SEED) with a chosen constant SEED and call
rng.choice(X.shape[0], subset_size, replace=False)) or set np.random.seed(SEED)
immediately before np.random.choice, and apply the returned indices to X and y
as before, then keep the same print message.
| # Map task_id to module instance | ||
| self.task_id_to_module: dict[str, BaseValidationModule] = {} | ||
| for task_id in self.task_ids: | ||
| config = load_config_for_task(task_id, self.module, module_cls.config_schema) | ||
| self.module_config_to_module.setdefault(config, module_cls(config=config)) | ||
| self.task_id_to_module[task_id] = self.module_config_to_module[config] | ||
| # Create a new module instance for each task_id | ||
| self.task_id_to_module[task_id] = module_cls(config=config) | ||
|
|
There was a problem hiding this comment.
Critical: model_validate_json misused on dict in run(); should use model_validate
The call in run() uses model_validate_json on a Python dict, which will error. It should be model_validate(merged_data), or serialize to JSON string first.
Here’s the minimal fix (outside this hunk):
# In run(), line where input_data is constructed:
# Before:
input_data = module_obj.input_data_schema.model_validate_json(merged_data)
# After:
input_data = module_obj.input_data_schema.model_validate(merged_data)Optionally, if the upstream API returns a JSON string, then pass that to model_validate_json instead.
🤖 Prompt for AI Agents
In validator/validation_runner.py around lines 58 to 64, the code calls
model_validate_json on a Python dict which will raise an error; replace that
call with model_validate(merged_data) so the dict is validated directly (or, if
you actually have a JSON string from upstream, pass that string to
model_validate_json instead). Ensure the change is made where input_data is
constructed in run().
🛠️ Refactor suggestion
Normalize task_id keys to strings to avoid dict misses and type drift
self.task_ids is annotated as list[str], but tests pass ints. Normalize keys on both store and lookup to prevent subtle bugs.
- # Map task_id to module instance
- self.task_id_to_module: dict[str, BaseValidationModule] = {}
- for task_id in self.task_ids:
- config = load_config_for_task(task_id, self.module, module_cls.config_schema)
- # Create a new module instance for each task_id
- self.task_id_to_module[task_id] = module_cls(config=config)
+ # Map task_id to module instance
+ self.task_id_to_module: dict[str, BaseValidationModule] = {}
+ for task_id in self.task_ids:
+ config = load_config_for_task(task_id, self.module, module_cls.config_schema)
+ # Create a new module instance for each task_id
+ self.task_id_to_module[str(task_id)] = module_cls(config=config)Outside this hunk, update lookups accordingly:
# perform_validation(...)
module_obj = self.task_id_to_module[str(task_id)]
# run()
module_obj = self.task_id_to_module[str(task_id)]🤖 Prompt for AI Agents
In validator/validation_runner.py around lines 58 to 64, task_id keys are stored
using the original task_id type which can be int in tests and string elsewhere,
causing lookup misses; normalize task_id to str when creating the mapping and
update all lookups to use str(task_id) (e.g., in perform_validation and run) so
the dict always uses string keys and lookups consistently convert incoming
task_id values to strings.
Fix invalid import of Base from unittest.mock (module import will fail)
There is no Base in unittest.mock. This will raise ImportError on import, breaking the runner.
Outside this hunk, remove or correct the import:
# Remove this invalid line near the top of the file:
# from unittest.mock import Base🤖 Prompt for AI Agents
In validator/validation_runner.py near the top of the file (around lines 1–20)
there is an invalid import `from unittest.mock import Base` which does not exist
and will raise ImportError; remove that import line, and if a symbol named Base
is actually required elsewhere, replace it with the correct source (or define it
locally) and update usages accordingly so the module imports succeed.
| if not self.test_mode: | ||
| sys.exit(1) | ||
| return None | ||
| except (RuntimeError, ValueError) as e: | ||
| logger.error(e) | ||
| self.api.mark_assignment_as_failed(assignment_id) | ||
| sys.exit(1) | ||
| if not self.test_mode: | ||
| sys.exit(1) | ||
| return None |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Unify retry behavior and failure marking across exception types
Currently (RuntimeError, ValueError) mark the assignment as failed immediately (even on first attempt) and return, while generic Exception retries up to 3 times. Align this to only mark as failed on the final attempt.
- except (RuntimeError, ValueError) as e:
- logger.error(e)
- self.api.mark_assignment_as_failed(assignment_id)
- if not self.test_mode:
- sys.exit(1)
- return None
+ except (RuntimeError, ValueError) as e:
+ logger.error(e)
+ if attempt == 2:
+ self.api.mark_assignment_as_failed(assignment_id)
+ if not self.test_mode:
+ sys.exit(1)
+ return None
+ # Try again on next loop iteration
+ continue📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if not self.test_mode: | |
| sys.exit(1) | |
| return None | |
| except (RuntimeError, ValueError) as e: | |
| logger.error(e) | |
| self.api.mark_assignment_as_failed(assignment_id) | |
| sys.exit(1) | |
| if not self.test_mode: | |
| sys.exit(1) | |
| return None | |
| if not self.test_mode: | |
| sys.exit(1) | |
| return None | |
| except (RuntimeError, ValueError) as e: | |
| logger.error(e) | |
| if attempt == 2: | |
| self.api.mark_assignment_as_failed(assignment_id) | |
| if not self.test_mode: | |
| sys.exit(1) | |
| return None | |
| # Try again on next loop iteration | |
| continue |
🤖 Prompt for AI Agents
In validator/validation_runner.py around lines 77 to 85, the current except
block for (RuntimeError, ValueError) immediately marks the assignment as failed
and returns on the first hit while generic Exception retries up to 3 times;
change this so RuntimeError/ValueError follow the same retry loop and only call
self.api.mark_assignment_as_failed when the final attempt is exhausted.
Concretely: remove the immediate mark_assignment_as_failed and return from that
except branch, let the loop continue (or re-raise to be handled by the shared
retry logic), and when attempts reach the max retry count (the same check used
for generic Exception) call self.api.mark_assignment_as_failed(assignment_id)
and then, if not self.test_mode, call sys.exit(1) before returning None to keep
behavior consistent.
Summary by CodeRabbit
New Features
Tests
Chores