Skip to content

Move multi-step training into TrainingConfig with per-step IS correction#39

Open
kfallah wants to merge 3 commits intomainfrom
feature/multi-step-training
Open

Move multi-step training into TrainingConfig with per-step IS correction#39
kfallah wants to merge 3 commits intomainfrom
feature/multi-step-training

Conversation

@kfallah
Copy link
Owner

@kfallah kfallah commented Feb 25, 2026

Summary

  • move multi-step training controls (steps_per_batch, feedback_repetitions) from eval-owned settings into TrainingConfig
  • remove eval-side sub-step loop and pass typed training config through FeedbackItem in each /v1/feedback request
  • execute multi-step updates inside training engines (local/modal + tinker)
  • recompute behavior-policy logprobs after each optimizer step for off-policy importance reweighting
  • include engine metadata (steps_per_batch_applied, per-step metrics) and wire eval sub_step_count to that metadata
  • update eval Hydra schema/config/docs and related tests

Key Implementation Notes

  • added strict TrainingConfig fields:
    • steps_per_batch
    • feedback_repetitions
  • introduced Hydra-safe EvalTrainingConfig and convert to runtime TrainingConfig in build_harness_config
  • tinker engine now refreshes student logprobs between steps using save_weights_and_get_sampling_client_async

Validation

  • uv run ruff check claas/ tests/ --fix
  • uv run pytest tests/ -q -m "not integration"
    • result: 109 passed, 26 skipped, 5 deselected
  • uv run ty check
    • unresolved-import diagnostics for heavy runtime deps (torch, tinker, transformers) are expected in this environment

Summary by CodeRabbit

Release Notes

  • New Features

    • Added support for multi-step training per batch with configurable steps_per_batch parameter
    • Added feedback_repetitions configuration option for enhanced training control
    • New metric steps_per_batch_applied tracks actual steps executed per batch
  • Documentation

    • Updated configuration structure to use nested training block for training-specific parameters
  • Refactor

    • Reorganized configuration hierarchy to consolidate training settings under dedicated training section

@coderabbitai
Copy link

coderabbitai bot commented Feb 25, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • ✅ Review completed - (🔄 Check again to review again)
📝 Walkthrough

Walkthrough

This PR implements multi-step gradient updates within single batches and feedback repetition control. Configuration is restructured to nest training parameters under a training block, new dataclasses manage config state (EvalTrainingConfig, HarnessConfig), and training loops in distillation and tinker engine are refactored to support multiple optimizer steps per batch with recomputed importance weights.

Changes

Cohort / File(s) Summary
Configuration Type System
claas/core/types.py, claas/eval/types.py
Added steps_per_batch and feedback_repetitions fields to TrainingConfig with bounds [1,32] and [1,16]. Introduced new EvalTrainingConfig dataclass for Hydra-managed training settings and HarnessConfig dataclass for runtime config with strict TrainingConfig field. Added steps_per_batch_applied tracking fields to LocalDistillMetrics and TinkerDistillMetrics.
Configuration Structure
claas/eval/config.py, claas/eval/configs/base.yaml, claas/eval/README.md
Reorganized YAML configs to nest steps_per_batch and feedback_repetitions under training: block. Updated build_harness_config to convert training dict to TrainingConfig instance, changing the config access path from top-level to training.*.
Distillation Training
claas/training/distillation.py
Introduced PreparedSample TypedDict and refactored distill() method to implement multi-step training loops over config.steps_per_batch iterations with per-step loss computation, gradient updates, and recomputed student logprobs via new _compute_student_response_logprobs() helper. Replaced single-batch forward pass with accumulated prepared samples and per-step metrics tracking.
Tinker Engine Training
claas/training/engine/tinker/engine.py
Added PreparedSample TypedDict for sample data representation. Refactored distillation flow to introduce multi-step training loop controlled by steps_per_batch with per-step optimizer updates and behavior logprob recomputation. Added _compute_student_logprobs_for_batch() and _compute_student_logprobs_for_sample() helpers. Updated _build_sample_datum signature to accept prepared samples and per-sample student logprobs.
Evaluation Runner
claas/eval/runner.py
Modified _submit_feedback to return steps_per_batch_applied in metrics and single-submission model per step. Updated FeedbackItem to include training field. Extended StepResult with sub_step_count field (default 1) for steps.jsonl persistence and loading.
Test Coverage
tests/test_eval_config.py, tests/test_eval_runner.py, tests/test_tinker_engine.py
Updated config tests to validate nested training.steps_per_batch path and reject top-level overrides. Added assertions for steps_per_batch_applied fields in TinkerDistillMetrics and LocalDistillMetrics. Updated HarnessConfig tests to use TrainingConfig field. Added new multi-step test validating behavior logprob recomputation across optimizer steps.

Sequence Diagram(s)

sequenceDiagram
    participant Trainer as DistillationTrainer
    participant PreparedSamples as Prepared Samples
    participant Step as Step Loop
    participant StudentModel as Student Model
    participant Optimizer as Optimizer
    
    Trainer->>PreparedSamples: Validate & accumulate samples<br/>(full_ids, response_ids, logprobs)
    PreparedSamples-->>Trainer: PreparedSample list
    
    Trainer->>Step: For each step in steps_per_batch
    Step->>StudentModel: Compute student response logprobs<br/>(current adapter state)
    StudentModel-->>Step: per_step_logprobs
    Step->>Step: Build SDPOLossInput from<br/>prepared samples + new logprobs
    Step->>Step: Compute per-step loss<br/>(distill_loss, kl_reg, clip)
    Step->>Optimizer: Backward & gradient update<br/>with clipping
    Optimizer-->>Step: updated model state
    Step->>StudentModel: Recompute behavior_logprobs<br/>for next step
    StudentModel-->>Step: updated logprobs
    Step-->>Trainer: step metrics & updated state
    
    Trainer->>Trainer: Aggregate per-step metrics<br/>steps_per_batch_applied
    Trainer-->>Trainer: Return per-step results<br/>& tokens processed
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 Hops and loops through batches bright,
Steps per batch bring multi-height,
Recompute the logprobs true,
Feedback repeated, fresh and new!
Gradients dance in nested care,
Training loops flopping everywhere!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Move multi-step training into TrainingConfig with per-step IS correction' directly and clearly describes the primary changes: reorganizing multi-step training configuration into a TrainingConfig object and implementing per-step importance sampling correction.
Docstring Coverage ✅ Passed Docstring coverage is 85.19% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/multi-step-training

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (5)
claas/training/engine/tinker/engine.py (2)

239-241: Lambda used for averaging — minor style nit.

The avg lambda is re-created each loop iteration. Consider extracting it before the loop or using a local function.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@claas/training/engine/tinker/engine.py` around lines 239 - 241, The averaging
lambda avg is being recreated each loop iteration; extract it as a local helper
function or define it once before the loop to avoid recreating the closure
repeatedly. Replace the inline lambda assignment avg = lambda key: ... with a
named function (e.g., def avg(key): return sum(m[key] for m in sample_metrics) /
n) or move the lambda definition above the loop where sample_metrics and n are
available, and update all uses of avg (referenced as avg and sample_metrics in
this block) accordingly.

218-261: Multi-step loop with Tinker SDK: correct but note the cost of intermediate weight saves.

The flow is sound: build datums → forward/backward → optimizer step → recompute logprobs. The save_weights_and_get_sampling_client_async call at line 257 is required by Tinker's architecture to get a sampling client with updated weights, but it means each intermediate step (all except the last) triggers a full weight save. For steps_per_batch > 2, this could be a latency concern.

Worth documenting this tradeoff or considering whether Tinker offers a lighter-weight way to get an updated sampling client without a full checkpoint save.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@claas/training/engine/tinker/engine.py` around lines 218 - 261, The loop
calls training_client.save_weights_and_get_sampling_client_async inside the step
loop (see save_weights_and_get_sampling_client_async, steps_per_batch and
training_client) which triggers a full weight save on every intermediate step
and can cause latency when steps_per_batch > 2; update the code to either (a)
document this tradeoff just above the loop and in the function docstring, or (b)
add a configurable behavior (e.g., a flag like save_intermediate_weights) so you
only call save_weights_and_get_sampling_client_async for steps where it’s
necessary (or avoid it until the final step), and, if the Tinker SDK offers a
lighter alternative to get an updated sampling client, switch to that API
instead.
tests/test_tinker_engine.py (1)

91-122: Consider parameterizing mock save paths for multi-step scenarios.

The mock_training_client fixture returns a fixed save_result.path = "tinker://checkpoints/step-1" regardless of the checkpoint name passed to save_state_async. This works for current tests, but if future tests need to assert the saved path reflects the actual step, the fixture would need to be updated. Not a blocker.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_tinker_engine.py` around lines 91 - 122, The fixture
mock_training_client currently returns a fixed save_result.path
("tinker://checkpoints/step-1") for save_state_async; change it so
save_state_async uses an AsyncMock side_effect that builds and returns a
MagicMock whose .path is derived from the checkpoint name/step passed into
save_state_async (e.g., include the step id or checkpoint name from the method
args), and do the same for save_weights_for_sampler_async/sampler_save.path if
needed; update references to save_result and sampler_save in the fixture to be
created inside the side_effects so tests that call
mock_training_client.save_state_async(...) will receive a result object with a
path that reflects the input.
claas/training/distillation.py (1)

38-47: PreparedSample name collision with tinker engine.

Both claas/training/distillation.py and claas/training/engine/tinker/engine.py define a PreparedSample TypedDict with different fields (torch.Tensor-based vs. list-based). This works fine since they're module-private, but could cause confusion when navigating the codebase or in IDE symbol search.

Consider naming one of them more specifically (e.g., LocalPreparedSample or TinkerPreparedSample) to disambiguate.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@claas/training/distillation.py` around lines 38 - 47, The TypedDict
PreparedSample in claas/training/distillation.py collides by name with another
PreparedSample in claas/training/engine/tinker/engine.py; rename this TypedDict
to a more specific name (e.g., DistillationPreparedSample or
LocalPreparedSample) and update all local type annotations and imports in
claas/training/distillation.py that reference PreparedSample (functions, return
types, variables) to use the new name so the module remains unambiguous while
preserving the same fields and behavior.
claas/eval/types.py (1)

80-91: Risk of default drift between EvalTrainingConfig and TrainingConfig.

EvalTrainingConfig manually duplicates field names and defaults from the Pydantic TrainingConfig (in claas/core/types.py). If a default changes in one but not the other, eval runs will silently use stale values. Consider adding a test or a factory that asserts parity.

💡 Example: add a parity test
# tests/test_eval_config.py (or similar)
from claas.core.types import TrainingConfig
from claas.eval.types import EvalTrainingConfig

def test_eval_training_config_defaults_match():
    """Ensure EvalTrainingConfig defaults stay in sync with TrainingConfig."""
    runtime = TrainingConfig()
    hydra = EvalTrainingConfig()
    for f in dataclasses.fields(hydra):
        assert getattr(hydra, f.name) == getattr(runtime, f.name), (
            f"Default mismatch on '{f.name}': "
            f"EvalTrainingConfig={getattr(hydra, f.name)} vs "
            f"TrainingConfig={getattr(runtime, f.name)}"
        )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@claas/eval/types.py` around lines 80 - 91, EvalTrainingConfig duplicates
defaults from the Pydantic TrainingConfig which can drift; add a parity test
that instantiates TrainingConfig and EvalTrainingConfig and asserts all field
defaults match (use dataclasses.fields on EvalTrainingConfig and compare
getattr(hydra, name) == getattr(runtime, name)), e.g. add
tests/test_eval_config.py to fail CI if any default on EvalTrainingConfig
diverges from TrainingConfig; alternatively implement a factory that constructs
EvalTrainingConfig from TrainingConfig to guarantee parity and update usages to
call that factory instead of hardcoding defaults.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@claas/eval/runner.py`:
- Around line 84-93: The code currently indexes
metadata["steps_per_batch_applied"] in the LocalDistillMetrics return path which
can raise KeyError and drop the entire metrics object; change that access to use
metadata.get("steps_per_batch_applied", 1) so LocalDistillMetrics is constructed
with a default of 1 when the field is absent (keep identical pattern used for
other fields like distill_loss, kl_reg, mean_is_ratio, clip_fraction) to make
the metrics construction resilient.

---

Nitpick comments:
In `@claas/eval/types.py`:
- Around line 80-91: EvalTrainingConfig duplicates defaults from the Pydantic
TrainingConfig which can drift; add a parity test that instantiates
TrainingConfig and EvalTrainingConfig and asserts all field defaults match (use
dataclasses.fields on EvalTrainingConfig and compare getattr(hydra, name) ==
getattr(runtime, name)), e.g. add tests/test_eval_config.py to fail CI if any
default on EvalTrainingConfig diverges from TrainingConfig; alternatively
implement a factory that constructs EvalTrainingConfig from TrainingConfig to
guarantee parity and update usages to call that factory instead of hardcoding
defaults.

In `@claas/training/distillation.py`:
- Around line 38-47: The TypedDict PreparedSample in
claas/training/distillation.py collides by name with another PreparedSample in
claas/training/engine/tinker/engine.py; rename this TypedDict to a more specific
name (e.g., DistillationPreparedSample or LocalPreparedSample) and update all
local type annotations and imports in claas/training/distillation.py that
reference PreparedSample (functions, return types, variables) to use the new
name so the module remains unambiguous while preserving the same fields and
behavior.

In `@claas/training/engine/tinker/engine.py`:
- Around line 239-241: The averaging lambda avg is being recreated each loop
iteration; extract it as a local helper function or define it once before the
loop to avoid recreating the closure repeatedly. Replace the inline lambda
assignment avg = lambda key: ... with a named function (e.g., def avg(key):
return sum(m[key] for m in sample_metrics) / n) or move the lambda definition
above the loop where sample_metrics and n are available, and update all uses of
avg (referenced as avg and sample_metrics in this block) accordingly.
- Around line 218-261: The loop calls
training_client.save_weights_and_get_sampling_client_async inside the step loop
(see save_weights_and_get_sampling_client_async, steps_per_batch and
training_client) which triggers a full weight save on every intermediate step
and can cause latency when steps_per_batch > 2; update the code to either (a)
document this tradeoff just above the loop and in the function docstring, or (b)
add a configurable behavior (e.g., a flag like save_intermediate_weights) so you
only call save_weights_and_get_sampling_client_async for steps where it’s
necessary (or avoid it until the final step), and, if the Tinker SDK offers a
lighter alternative to get an updated sampling client, switch to that API
instead.

In `@tests/test_tinker_engine.py`:
- Around line 91-122: The fixture mock_training_client currently returns a fixed
save_result.path ("tinker://checkpoints/step-1") for save_state_async; change it
so save_state_async uses an AsyncMock side_effect that builds and returns a
MagicMock whose .path is derived from the checkpoint name/step passed into
save_state_async (e.g., include the step id or checkpoint name from the method
args), and do the same for save_weights_for_sampler_async/sampler_save.path if
needed; update references to save_result and sampler_save in the fixture to be
created inside the side_effects so tests that call
mock_training_client.save_state_async(...) will receive a result object with a
path that reflects the input.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 8745ad6 and 762072d.

📒 Files selected for processing (11)
  • claas/core/types.py
  • claas/eval/README.md
  • claas/eval/config.py
  • claas/eval/configs/base.yaml
  • claas/eval/runner.py
  • claas/eval/types.py
  • claas/training/distillation.py
  • claas/training/engine/tinker/engine.py
  • tests/test_eval_config.py
  • tests/test_eval_runner.py
  • tests/test_tinker_engine.py

Comment on lines +84 to 93
steps_per_batch_applied=metadata["steps_per_batch_applied"],
)

return LocalDistillMetrics(
distill_loss=metadata.get("distill_loss"),
kl_reg=metadata.get("kl_reg"),
mean_is_ratio=metadata.get("mean_is_ratio"),
clip_fraction=metadata.get("clip_fraction"),
steps_per_batch_applied=metadata["steps_per_batch_applied"],
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use .get() with a default for steps_per_batch_applied to match resilience of other fields.

Lines 84 and 92 use hard metadata["steps_per_batch_applied"] access. If an older or third-party engine omits this new field, both branches raise KeyError. While the caller catches KeyError (line 409), that discards the entire metrics object — losing distill_loss, kl_mean, etc. for the step.

This is especially inconsistent in the local branch (lines 88–92), where every other field uses .get().

🛡️ Proposed fix — use `.get()` with default 1
             batch_size=metadata["batch_size"],
-            steps_per_batch_applied=metadata["steps_per_batch_applied"],
+            steps_per_batch_applied=metadata.get("steps_per_batch_applied", 1),
         )
 
     return LocalDistillMetrics(
         distill_loss=metadata.get("distill_loss"),
         kl_reg=metadata.get("kl_reg"),
         mean_is_ratio=metadata.get("mean_is_ratio"),
         clip_fraction=metadata.get("clip_fraction"),
-        steps_per_batch_applied=metadata["steps_per_batch_applied"],
+        steps_per_batch_applied=metadata.get("steps_per_batch_applied", 1),
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@claas/eval/runner.py` around lines 84 - 93, The code currently indexes
metadata["steps_per_batch_applied"] in the LocalDistillMetrics return path which
can raise KeyError and drop the entire metrics object; change that access to use
metadata.get("steps_per_batch_applied", 1) so LocalDistillMetrics is constructed
with a default of 1 when the field is absent (keep identical pattern used for
other fields like distill_loss, kl_reg, mean_is_ratio, clip_fraction) to make
the metrics construction resilient.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 762072da79

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…aining

# Conflicts:
#	claas/core/types.py
#	claas/eval/README.md
#	claas/eval/config.py
#	claas/eval/configs/base.yaml
#	claas/eval/runner.py
#	claas/eval/types.py
#	tests/test_eval_config.py
#	tests/test_eval_runner.py
@kfallah
Copy link
Owner Author

kfallah commented Feb 25, 2026

@codex review

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
claas/eval/runner.py (1)

84-84: ⚠️ Potential issue | 🟠 Major

Use safe default access for steps_per_batch_applied to avoid dropping metrics.

Line 84 and Line 92 still use metadata["steps_per_batch_applied"]. If omitted by an engine, this raises KeyError, and the catch path discards the entire SDPO metrics object for that step.

Suggested fix
     if config.mode == "tinker" and "adv_mean" in metadata:
         return TinkerDistillMetrics(
@@
-            steps_per_batch_applied=metadata["steps_per_batch_applied"],
+            steps_per_batch_applied=metadata.get("steps_per_batch_applied", 1),
         )
@@
     return LocalDistillMetrics(
         distill_loss=metadata.get("distill_loss"),
         kl_reg=metadata.get("kl_reg"),
         mean_is_ratio=metadata.get("mean_is_ratio"),
         clip_fraction=metadata.get("clip_fraction"),
-        steps_per_batch_applied=metadata["steps_per_batch_applied"],
+        steps_per_batch_applied=metadata.get("steps_per_batch_applied", 1),
     )

Also applies to: 92-92

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@claas/eval/runner.py` at line 84, Replace direct indexing of
metadata["steps_per_batch_applied"] with a safe lookup that supplies a sensible
default (e.g., metadata.get("steps_per_batch_applied", 1)) to avoid raising
KeyError and dropping the SDPO metrics object; update both occurrences that
reference steps_per_batch_applied in claass.eval.runner (the two places around
the current lines using metadata["steps_per_batch_applied"]) so downstream logic
receives the fallback value when the engine omits the key.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@claas/eval/runner.py`:
- Line 84: Replace direct indexing of metadata["steps_per_batch_applied"] with a
safe lookup that supplies a sensible default (e.g.,
metadata.get("steps_per_batch_applied", 1)) to avoid raising KeyError and
dropping the SDPO metrics object; update both occurrences that reference
steps_per_batch_applied in claass.eval.runner (the two places around the current
lines using metadata["steps_per_batch_applied"]) so downstream logic receives
the fallback value when the engine omits the key.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 762072d and e2331f7.

📒 Files selected for processing (7)
  • claas/core/types.py
  • claas/eval/README.md
  • claas/eval/configs/base.yaml
  • claas/eval/runner.py
  • claas/eval/types.py
  • tests/test_eval_config.py
  • tests/test_eval_runner.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • claas/core/types.py
  • claas/eval/types.py
  • tests/test_eval_config.py

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e2331f72b5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

max_grad_norm: float = 1.0
kl_reg_weight: float = 0.0
teacher_top_k: int = 100
steps_per_batch: int = 4

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Enforce positive steps_per_batch in TrainingConfig

The newly added steps_per_batch field has no lower-bound validation, but both multi-step trainers now assume at least one iteration and unconditionally read step_metrics[-1] (claas/training/distillation.py and claas/training/engine/tinker/engine.py), so sending training.steps_per_batch=0 is currently accepted and then crashes /v1/feedback with a server error instead of a clean 4xx validation failure; this can break eval runs by turning every feedback update into a failed request.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant