Skip to content

eval: add typed training config to Hydra and feedback requests#38

Merged
kfallah merged 4 commits intomainfrom
eval-trainingconfig-hydra
Feb 25, 2026
Merged

eval: add typed training config to Hydra and feedback requests#38
kfallah merged 4 commits intomainfrom
eval-trainingconfig-hydra

Conversation

@kfallah
Copy link
Owner

@kfallah kfallah commented Feb 25, 2026

Summary

  • add a nested training section to eval Hydra config defaults
  • add explicit eval-side training schema and convert it once into runtime TrainingConfig
  • enforce strict type invariance for runtime (HarnessConfig.training must be TrainingConfig)
  • pass typed training config through every FeedbackItem generated by eval runner
  • update eval docs and tests for nested overrides (training.is_clip, training.learning_rate)

Validation

  • uv run pytest tests/test_eval_config.py tests/test_eval_runner.py -q
  • 23 passed

Summary by CodeRabbit

  • New Features

    • Added configurable training parameters (learning rate, alpha, clipping, max-grad-norm, KL weight, teacher_top_k) and runtime validation for training settings.
    • CLI and programmatic interfaces now support overriding nested training hyperparameters and explicit output directory.
  • Configuration

    • Batch processing frequency increased to 4 steps per batch.
    • Default output directory now uses a timestamped path when not overridden.
  • Tests

    • Added tests covering training config validation, overrides, and mismatch rejection.

@coderabbitai
Copy link

coderabbitai bot commented Feb 25, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • ✅ Review completed - (🔄 Check again to review again)
📝 Walkthrough

Walkthrough

A training configuration system is introduced across the evaluation module, adding typed EvalTrainingConfig and HarnessConfig dataclasses, implementing validation logic, and propagating training settings through feedback payloads during execution.

Changes

Cohort / File(s) Summary
Configuration Schema & Type Definitions
claas/eval/types.py, claas/eval/configs/base.yaml
Introduces new EvalTrainingConfig and HarnessConfig dataclasses. EvalTrainingConfig added to EvalConfig with default factory; HarnessConfig introduced as runtime-facing config with training: TrainingConfig field. Base YAML expanded with training section (learning_rate, alpha, is_clip, max_grad_norm, kl_reg_weight, teacher_top_k).
Configuration Processing
claas/eval/config.py
Adds imports for TrainingConfig and EvalTrainingConfig. build_harness_config now validates eval_cfg.training type and constructs a TrainingConfig instance from EvalTrainingConfig fields before assignment.
Runtime Execution
claas/eval/runner.py
Adds TrainingConfig import and runtime type validation. Extracts training_cfg with type checking before main loop; serializes and includes training config in metadata; propagates training_cfg to each FeedbackItem payload.
Documentation & Examples
claas/eval/README.md
Updates config documentation with new training section details, increases steps_per_batch from 1 to 4, and expands CLI override examples to demonstrate training.hyperparameters overrides.
Test Coverage
tests/test_eval_config.py, tests/test_eval_runner.py
Adds assertions for training field type (TrainingConfig), validates training-specific fields (is_clip, learning_rate); tests config overrides for training parameters; introduces type invariance test rejecting dict; verifies HarnessConfig defaults to typed TrainingConfig.

Sequence Diagram

sequenceDiagram
    participant Config as Config File (YAML)
    participant Loader as Config Loader
    participant Validator as build_harness_config()
    participant Runner as runner.py
    participant Feedback as FeedbackItem
    
    Config->>Loader: Load EvalConfig with training section
    Loader->>Validator: Pass EvalConfig (with EvalTrainingConfig)
    Validator->>Validator: Type check: training is EvalTrainingConfig?
    Validator->>Validator: Construct TrainingConfig from fields
    Validator->>Runner: Return HarnessConfig (with TrainingConfig)
    Runner->>Runner: Type check: config.training is TrainingConfig?
    Runner->>Runner: Extract training_cfg variable
    Runner->>Runner: Serialize training config to metadata.json
    Runner->>Feedback: Create FeedbackItem with training_cfg
    Feedback->>Feedback: Include training config in payload
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A training config hops into place,
With types and fields to embrace,
From schema to runner, it flows so sweet,
Each feedback now has a training beat! 🎯

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 55.56% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'eval: add typed training config to Hydra and feedback requests' accurately summarizes the main changes: introducing a typed training configuration to the eval module's Hydra setup and propagating it through feedback requests.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch eval-trainingconfig-hydra

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
claas/eval/config.py (1)

30-37: Consider collapsing manual training field mapping to reduce schema drift.

Field-by-field copy is easy to miss when EvalTrainingConfig/TrainingConfig evolves.

♻️ Suggested simplification
-    training = TrainingConfig(
-        learning_rate=eval_cfg.training.learning_rate,
-        alpha=eval_cfg.training.alpha,
-        is_clip=eval_cfg.training.is_clip,
-        max_grad_norm=eval_cfg.training.max_grad_norm,
-        kl_reg_weight=eval_cfg.training.kl_reg_weight,
-        teacher_top_k=eval_cfg.training.teacher_top_k,
-    )
+    training = TrainingConfig(**dataclasses.asdict(eval_cfg.training))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@claas/eval/config.py` around lines 30 - 37, Replace the manual field-by-field
construction of TrainingConfig by unpacking the training sub-config directly to
avoid schema drift: construct TrainingConfig by passing the EvalTrainingConfig
instance's mapping (e.g., using
eval_cfg.training.dict()/eval_cfg.training.model_dump() or
dataclasses.asdict(eval_cfg.training), or a TrainingConfig.from_... helper) and
let Python unpack those keys into TrainingConfig(**...). Keep the symbols
TrainingConfig and eval_cfg.training and ensure you use the appropriate method
for the config type (pydantic/dataclass) and include exclude_unset/exclude_none
if needed to preserve intended defaults.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@claas/eval/config.py`:
- Around line 30-37: Replace the manual field-by-field construction of
TrainingConfig by unpacking the training sub-config directly to avoid schema
drift: construct TrainingConfig by passing the EvalTrainingConfig instance's
mapping (e.g., using eval_cfg.training.dict()/eval_cfg.training.model_dump() or
dataclasses.asdict(eval_cfg.training), or a TrainingConfig.from_... helper) and
let Python unpack those keys into TrainingConfig(**...). Keep the symbols
TrainingConfig and eval_cfg.training and ensure you use the appropriate method
for the config type (pydantic/dataclass) and include exclude_unset/exclude_none
if needed to preserve intended defaults.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 8745ad6 and 5abfd7b.

📒 Files selected for processing (7)
  • claas/eval/README.md
  • claas/eval/config.py
  • claas/eval/configs/base.yaml
  • claas/eval/runner.py
  • claas/eval/types.py
  • tests/test_eval_config.py
  • tests/test_eval_runner.py

@kfallah
Copy link
Owner Author

kfallah commented Feb 25, 2026

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d5aa29cf5b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 363 to 365
training_cfg = config.training
if not isinstance(training_cfg, TrainingConfig):
raise TypeError(f"EvalConfig.training must be TrainingConfig, got {type(training_cfg)!r}")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Validate training overrides before starting eval loop

This new runtime check only enforces that config.training is a TrainingConfig, not that its values are valid, so invalid Hydra overrides (for example training.is_clip=0) now make every /v1/feedback call fail with 422 while the loop continues under the existing except httpx.HTTPError path and still records step outputs. In that scenario the run appears to progress but applies no distillation updates, which can silently invalidate experiment conclusions; the eval side should fail fast on invalid training values before entering the step loop.

Useful? React with 👍 / 👎.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/test_api.py (1)

275-327: Consider parameterizing boundary tests across all validated training fields.

Line 275–Line 327 currently exercises is_clip and mismatch checks; a small parametrized matrix for learning_rate, alpha, max_grad_norm, kl_reg_weight, and teacher_top_k would better protect _validate_training_config from regressions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_api.py` around lines 275 - 327, Extend the existing tests that
call _validate_training_config by parameterizing the invalid-value and mismatch
checks currently in test_feedback_rejects_invalid_training_config and
test_feedback_rejects_mismatched_training_config: add pytest.mark.parametrize
over the fields
["is_clip","learning_rate","alpha","max_grad_norm","kl_reg_weight","teacher_top_k"]
and for each field send a request with an out-of-bound/invalid value (e.g.,
wrong type or negative where not allowed) and assert 422 with the field name in
the error, and add a second parametrized test that submits two requests with
differing valid values for the same field and assert 400 with detail "all
requests must use the same training config"; keep using
_mock_config(monkeypatch) and TestClient(web_app) and reference the existing
test function names to replace/augment so the new parameterized tests exercise
_validate_training_config for every listed training field.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/test_api.py`:
- Around line 275-327: Extend the existing tests that call
_validate_training_config by parameterizing the invalid-value and mismatch
checks currently in test_feedback_rejects_invalid_training_config and
test_feedback_rejects_mismatched_training_config: add pytest.mark.parametrize
over the fields
["is_clip","learning_rate","alpha","max_grad_norm","kl_reg_weight","teacher_top_k"]
and for each field send a request with an out-of-bound/invalid value (e.g.,
wrong type or negative where not allowed) and assert 422 with the field name in
the error, and add a second parametrized test that submits two requests with
differing valid values for the same field and assert 400 with detail "all
requests must use the same training config"; keep using
_mock_config(monkeypatch) and TestClient(web_app) and reference the existing
test function names to replace/augment so the new parameterized tests exercise
_validate_training_config for every listed training field.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 5abfd7b and 1efe702.

📒 Files selected for processing (11)
  • claas/api.py
  • claas/core/types.py
  • claas/eval/README.md
  • claas/eval/__main__.py
  • claas/eval/config.py
  • claas/eval/configs/base.yaml
  • claas/eval/runner.py
  • claas/eval/types.py
  • tests/test_api.py
  • tests/test_eval_config.py
  • tests/test_eval_runner.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • claas/eval/configs/base.yaml
  • claas/eval/README.md
  • tests/test_eval_runner.py

@kfallah kfallah merged commit ff7fda8 into main Feb 25, 2026
3 checks passed
@kfallah kfallah deleted the eval-trainingconfig-hydra branch February 25, 2026 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant