eval: add typed training config to Hydra and feedback requests by kfallah · Pull Request #38 · kfallah/CLaaS

kfallah · 2026-02-25T17:20:58Z

Summary

add a nested training section to eval Hydra config defaults
add explicit eval-side training schema and convert it once into runtime TrainingConfig
enforce strict type invariance for runtime (HarnessConfig.training must be TrainingConfig)
pass typed training config through every FeedbackItem generated by eval runner
update eval docs and tests for nested overrides (training.is_clip, training.learning_rate)

Validation

uv run pytest tests/test_eval_config.py tests/test_eval_runner.py -q
23 passed

Summary by CodeRabbit

New Features
- Added configurable training parameters (learning rate, alpha, clipping, max-grad-norm, KL weight, teacher_top_k) and runtime validation for training settings.
- CLI and programmatic interfaces now support overriding nested training hyperparameters and explicit output directory.
Configuration
- Batch processing frequency increased to 4 steps per batch.
- Default output directory now uses a timestamped path when not overridden.
Tests
- Added tests covering training config validation, overrides, and mismatch rejection.

coderabbitai · 2026-02-25T17:21:20Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

✅ Review completed - (🔄 Check again to review again)

📝 Walkthrough

Walkthrough

A training configuration system is introduced across the evaluation module, adding typed EvalTrainingConfig and HarnessConfig dataclasses, implementing validation logic, and propagating training settings through feedback payloads during execution.

Changes

Cohort / File(s)	Summary
Configuration Schema & Type Definitions `claas/eval/types.py`, `claas/eval/configs/base.yaml`	Introduces new EvalTrainingConfig and HarnessConfig dataclasses. EvalTrainingConfig added to EvalConfig with default factory; HarnessConfig introduced as runtime-facing config with training: TrainingConfig field. Base YAML expanded with training section (learning_rate, alpha, is_clip, max_grad_norm, kl_reg_weight, teacher_top_k).
Configuration Processing `claas/eval/config.py`	Adds imports for TrainingConfig and EvalTrainingConfig. build_harness_config now validates eval_cfg.training type and constructs a TrainingConfig instance from EvalTrainingConfig fields before assignment.
Runtime Execution `claas/eval/runner.py`	Adds TrainingConfig import and runtime type validation. Extracts training_cfg with type checking before main loop; serializes and includes training config in metadata; propagates training_cfg to each FeedbackItem payload.
Documentation & Examples `claas/eval/README.md`	Updates config documentation with new training section details, increases steps_per_batch from 1 to 4, and expands CLI override examples to demonstrate training.hyperparameters overrides.
Test Coverage `tests/test_eval_config.py`, `tests/test_eval_runner.py`	Adds assertions for training field type (TrainingConfig), validates training-specific fields (is_clip, learning_rate); tests config overrides for training parameters; introduces type invariance test rejecting dict; verifies HarnessConfig defaults to typed TrainingConfig.

Sequence Diagram

sequenceDiagram
    participant Config as Config File (YAML)
    participant Loader as Config Loader
    participant Validator as build_harness_config()
    participant Runner as runner.py
    participant Feedback as FeedbackItem
    
    Config->>Loader: Load EvalConfig with training section
    Loader->>Validator: Pass EvalConfig (with EvalTrainingConfig)
    Validator->>Validator: Type check: training is EvalTrainingConfig?
    Validator->>Validator: Construct TrainingConfig from fields
    Validator->>Runner: Return HarnessConfig (with TrainingConfig)
    Runner->>Runner: Type check: config.training is TrainingConfig?
    Runner->>Runner: Extract training_cfg variable
    Runner->>Runner: Serialize training config to metadata.json
    Runner->>Feedback: Create FeedbackItem with training_cfg
    Feedback->>Feedback: Include training config in payload

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A training config hops into place,
With types and fields to embrace,
From schema to runner, it flows so sweet,
Each feedback now has a training beat! 🎯

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 55.56% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'eval: add typed training config to Hydra and feedback requests' accurately summarizes the main changes: introducing a typed training configuration to the eval module's Hydra setup and propagating it through feedback requests.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch eval-trainingconfig-hydra

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

claas/eval/config.py (1)

30-37: Consider collapsing manual training field mapping to reduce schema drift.

Field-by-field copy is easy to miss when EvalTrainingConfig/TrainingConfig evolves.

♻️ Suggested simplification

-    training = TrainingConfig(
-        learning_rate=eval_cfg.training.learning_rate,
-        alpha=eval_cfg.training.alpha,
-        is_clip=eval_cfg.training.is_clip,
-        max_grad_norm=eval_cfg.training.max_grad_norm,
-        kl_reg_weight=eval_cfg.training.kl_reg_weight,
-        teacher_top_k=eval_cfg.training.teacher_top_k,
-    )
+    training = TrainingConfig(**dataclasses.asdict(eval_cfg.training))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@claas/eval/config.py` around lines 30 - 37, Replace the manual field-by-field
construction of TrainingConfig by unpacking the training sub-config directly to
avoid schema drift: construct TrainingConfig by passing the EvalTrainingConfig
instance's mapping (e.g., using
eval_cfg.training.dict()/eval_cfg.training.model_dump() or
dataclasses.asdict(eval_cfg.training), or a TrainingConfig.from_... helper) and
let Python unpack those keys into TrainingConfig(**...). Keep the symbols
TrainingConfig and eval_cfg.training and ensure you use the appropriate method
for the config type (pydantic/dataclass) and include exclude_unset/exclude_none
if needed to preserve intended defaults.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@claas/eval/config.py`:
- Around line 30-37: Replace the manual field-by-field construction of
TrainingConfig by unpacking the training sub-config directly to avoid schema
drift: construct TrainingConfig by passing the EvalTrainingConfig instance's
mapping (e.g., using eval_cfg.training.dict()/eval_cfg.training.model_dump() or
dataclasses.asdict(eval_cfg.training), or a TrainingConfig.from_... helper) and
let Python unpack those keys into TrainingConfig(**...). Keep the symbols
TrainingConfig and eval_cfg.training and ensure you use the appropriate method
for the config type (pydantic/dataclass) and include exclude_unset/exclude_none
if needed to preserve intended defaults.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 8745ad6 and 5abfd7b.

📒 Files selected for processing (7)

claas/eval/README.md
claas/eval/config.py
claas/eval/configs/base.yaml
claas/eval/runner.py
claas/eval/types.py
tests/test_eval_config.py
tests/test_eval_runner.py

kfallah · 2026-02-25T17:40:34Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d5aa29cf5b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-02-25T17:47:04Z

claas/eval/runner.py

+    training_cfg = config.training
+    if not isinstance(training_cfg, TrainingConfig):
+        raise TypeError(f"EvalConfig.training must be TrainingConfig, got {type(training_cfg)!r}")


Validate training overrides before starting eval loop

This new runtime check only enforces that config.training is a TrainingConfig, not that its values are valid, so invalid Hydra overrides (for example training.is_clip=0) now make every /v1/feedback call fail with 422 while the loop continues under the existing except httpx.HTTPError path and still records step outputs. In that scenario the run appears to progress but applies no distillation updates, which can silently invalidate experiment conclusions; the eval side should fail fast on invalid training values before entering the step loop.

Useful? React with 👍 / 👎.

coderabbitai

🧹 Nitpick comments (1)

tests/test_api.py (1)

275-327: Consider parameterizing boundary tests across all validated training fields.

Line 275–Line 327 currently exercises is_clip and mismatch checks; a small parametrized matrix for learning_rate, alpha, max_grad_norm, kl_reg_weight, and teacher_top_k would better protect _validate_training_config from regressions.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/test_api.py` around lines 275 - 327, Extend the existing tests that
call _validate_training_config by parameterizing the invalid-value and mismatch
checks currently in test_feedback_rejects_invalid_training_config and
test_feedback_rejects_mismatched_training_config: add pytest.mark.parametrize
over the fields
["is_clip","learning_rate","alpha","max_grad_norm","kl_reg_weight","teacher_top_k"]
and for each field send a request with an out-of-bound/invalid value (e.g.,
wrong type or negative where not allowed) and assert 422 with the field name in
the error, and add a second parametrized test that submits two requests with
differing valid values for the same field and assert 400 with detail "all
requests must use the same training config"; keep using
_mock_config(monkeypatch) and TestClient(web_app) and reference the existing
test function names to replace/augment so the new parameterized tests exercise
_validate_training_config for every listed training field.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/test_api.py`:
- Around line 275-327: Extend the existing tests that call
_validate_training_config by parameterizing the invalid-value and mismatch
checks currently in test_feedback_rejects_invalid_training_config and
test_feedback_rejects_mismatched_training_config: add pytest.mark.parametrize
over the fields
["is_clip","learning_rate","alpha","max_grad_norm","kl_reg_weight","teacher_top_k"]
and for each field send a request with an out-of-bound/invalid value (e.g.,
wrong type or negative where not allowed) and assert 422 with the field name in
the error, and add a second parametrized test that submits two requests with
differing valid values for the same field and assert 400 with detail "all
requests must use the same training config"; keep using
_mock_config(monkeypatch) and TestClient(web_app) and reference the existing
test function names to replace/augment so the new parameterized tests exercise
_validate_training_config for every listed training field.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 5abfd7b and 1efe702.

📒 Files selected for processing (11)

claas/api.py
claas/core/types.py
claas/eval/README.md
claas/eval/__main__.py
claas/eval/config.py
claas/eval/configs/base.yaml
claas/eval/runner.py
claas/eval/types.py
tests/test_api.py
tests/test_eval_config.py
tests/test_eval_runner.py

🚧 Files skipped from review as they are similar to previous changes (3)

claas/eval/configs/base.yaml
claas/eval/README.md
tests/test_eval_runner.py

eval: add typed training config to hydra and feedback path

5abfd7b

coderabbitai bot reviewed Feb 25, 2026

View reviewed changes

refactor: unify eval config and dataclass training config

d5aa29c

refactor: remove eval config builder and use hydra output_dir

1561097

chatgpt-codex-connector bot reviewed Feb 25, 2026

View reviewed changes

docs: shorten TrainingConfig rationale into docstring

1efe702

coderabbitai bot reviewed Feb 25, 2026

View reviewed changes

kfallah merged commit ff7fda8 into main Feb 25, 2026
3 checks passed

kfallah deleted the eval-trainingconfig-hydra branch February 25, 2026 18:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

eval: add typed training config to Hydra and feedback requests#38

eval: add typed training config to Hydra and feedback requests#38
kfallah merged 4 commits intomainfrom
eval-trainingconfig-hydra

kfallah commented Feb 25, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 25, 2026 •

edited

Loading

Review skipped

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

kfallah commented Feb 25, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Feb 25, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kfallah commented Feb 25, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

kfallah commented Feb 25, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kfallah commented Feb 25, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 25, 2026 •

edited

Loading