Initialize scaling parameter to prevent NaNs in scratch training #342

Astro36 · 2025-11-26T03:46:51Z

Description

The self.scaling parameter is created using torch.empty and is then used without being initialized.
This is not a problem during finetuning, because pretrained checkpoints provide valid scaling parameters.
But when training from scratch, these undefined values may occasionally cause numerical instability and lead to NaN outputs.
Initializing self.scaling with torch.randn ensures proper initial values and improves training stability.

List of changes

Initialize self.scaling with torch.randn.

For reviewers

No functionality change, just fixed parameter initialization.

Initialize scaling parameter with random values.

google-cla · 2025-11-26T03:46:57Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Change scaling parameter initialization to random

742a880

Initialize scaling parameter with random values.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initialize scaling parameter to prevent NaNs in scratch training #342

Initialize scaling parameter to prevent NaNs in scratch training #342

Uh oh!

Astro36 commented Nov 26, 2025

Uh oh!

google-cla bot commented Nov 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Initialize scaling parameter to prevent NaNs in scratch training #342

Are you sure you want to change the base?

Initialize scaling parameter to prevent NaNs in scratch training #342

Uh oh!

Conversation

Astro36 commented Nov 26, 2025

Description

List of changes

For reviewers

Uh oh!

google-cla bot commented Nov 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant