Fix batch training padding mismatch handling by H-Chris233 · Pull Request #22 · H-Chris233/RustGPT-Chinese

H-Chris233 · 2026-01-15T12:43:41Z

Motivation

Batch training used padded Batch inputs while create_training_batches produced targets based on real token lengths, causing logits/target length mismatches and incorrect gradient application in train_monitored_batch.
Improve clarity for this educational project by documenting the per-sample real-length slicing approach so learners understand why PAD tokens must not participate in training.

Description

In src/llm.rs train_monitored_batch compute per-sample real_len from input_batch.attention_mask and slice input tokens to tokens[0..real_len] so forward/backward operate only on real tokens, collecting per-sample batch_targets accordingly.
Add a safety check to skip malformed samples when log_probs.nrows() != target_ids.len() and avoid crashing or applying mismatched gradients.
Remove the previous approach of zeroing gradients by PAD positions because samples are now sliced to exclude PAD before gradient computation.
Update docs/批量训练与动态掩码.md to explain the real-length slicing rule and why PAD must not participate in gradient computation.

Testing

No automated tests were executed as part of this change.

Codex Task

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3afa7178c7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

src/llm.rs

Fix batch training padding mismatch

3afa717

H-Chris233 added the codex label Jan 15, 2026 — with ChatGPT Codex Connector

chatgpt-codex-connector bot reviewed Jan 15, 2026

View reviewed changes

src/llm.rs Show resolved Hide resolved

H-Chris233 merged commit f40ceae into main Jan 15, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Fix batch training padding mismatch handling#22

Fix batch training padding mismatch handling#22
H-Chris233 merged 1 commit intomainfrom
codex/fix-training-logic-errors-and-improve-documentation-blh261

H-Chris233 commented Jan 15, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

H-Chris233 commented Jan 15, 2026

Motivation

Description

Testing

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant