Skip to content

Adding a thread safe RNG utility function#1529

Open
divyanshk wants to merge 1 commit intometa-pytorch:mainfrom
divyanshk:export-D93776060
Open

Adding a thread safe RNG utility function#1529
divyanshk wants to merge 1 commit intometa-pytorch:mainfrom
divyanshk:export-D93776060

Conversation

@divyanshk
Copy link
Contributor

@divyanshk divyanshk commented Feb 19, 2026

Equivalent unit test on pytorch/pytorch pytorch/pytorch#169116


Summary:
[Identical version of PR but with alterations to keep BC with torchdata statefuldataloader]

This includes part of the changes in pytorch/pytorch#161044

When using PyTorch's DataLoader with thread-based workers, all worker threads share the same global random number generator (RNG) state. This creates a race condition: multiple threads may call random functions like torch.randint() or torch.rand() simultaneously, leading to non-reproducible results.

torch.thread_safe_generator() solves this by returning a thread-local generator when called from within a DataLoader thread worker.

This PR:

  • we only include the utility public function to return the RNG. The RNG will be populated with the thread dataloader PR linked above. Right now this PR doesn't open any new functionality, the function will return None as RNG state isn't populated for thread workers (there are no thread workers right now - will land with PR#161044).
  • landing this function separately to enable integration with Torchvision random transforms.
  • Also, refactored WorkerInfo in worker.py to be a frozen dataclass.

Differential Revision: D93776060

@meta-codesync
Copy link

meta-codesync bot commented Feb 19, 2026

@divyanshk has exported this pull request. If you are a Meta employee, you can view the originating Diff in D93776060.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 19, 2026
divyanshk added a commit to divyanshk/data that referenced this pull request Feb 19, 2026
Summary:

[Identical version of [PR](pytorch/pytorch#172659) but with alterations to keep BC with torchdata statefuldataloader]

This includes part of the changes in pytorch/pytorch#161044

When using PyTorch's DataLoader with thread-based workers, all worker threads share the same global random number generator (RNG) state. This creates a race condition: multiple threads may call random functions like torch.randint() or torch.rand() simultaneously, leading to non-reproducible results.

`torch.thread_safe_generator()` solves this by returning a thread-local generator when called from within a DataLoader thread worker.

This PR:
* we only include the utility public function to return the RNG. The RNG will be populated with the thread dataloader PR linked above. Right now this PR doesn't open any new functionality, the function will return `None` as RNG state isn't populated for thread workers (there are no thread workers right now - will land with PR#161044).
* landing this function separately to enable integration with Torchvision random transforms.
* Also, refactored `WorkerInfo` in `worker.py` to be a frozen dataclass.

Differential Revision: D93776060
Summary:

[Identical version of [PR](pytorch/pytorch#172659) but with alterations to keep BC with torchdata statefuldataloader]

This includes part of the changes in pytorch/pytorch#161044

When using PyTorch's DataLoader with thread-based workers, all worker threads share the same global random number generator (RNG) state. This creates a race condition: multiple threads may call random functions like torch.randint() or torch.rand() simultaneously, leading to non-reproducible results.

`torch.thread_safe_generator()` solves this by returning a thread-local generator when called from within a DataLoader thread worker.

This PR:
* we only include the utility public function to return the RNG. The RNG will be populated with the thread dataloader PR linked above. Right now this PR doesn't open any new functionality, the function will return `None` as RNG state isn't populated for thread workers (there are no thread workers right now - will land with PR#161044).
* landing this function separately to enable integration with Torchvision random transforms.
* Also, refactored `WorkerInfo` in `worker.py` to be a frozen dataclass.

Differential Revision: D93776060
divyanshk added a commit to divyanshk/pytorch that referenced this pull request Feb 19, 2026
Summary:
X-link: meta-pytorch/data#1529

[Identical version of [PR](pytorch#172659) but with alterations to keep BC with torchdata statefuldataloader]

This includes part of the changes in pytorch#161044

When using PyTorch's DataLoader with thread-based workers, all worker threads share the same global random number generator (RNG) state. This creates a race condition: multiple threads may call random functions like torch.randint() or torch.rand() simultaneously, leading to non-reproducible results.

`torch.thread_safe_generator()` solves this by returning a thread-local generator when called from within a DataLoader thread worker.

This PR:
* we only include the utility public function to return the RNG. The RNG will be populated with the thread dataloader PR linked above. Right now this PR doesn't open any new functionality, the function will return `None` as RNG state isn't populated for thread workers (there are no thread workers right now - will land with PR#161044).
* landing this function separately to enable integration with Torchvision random transforms.
* Also, refactored `WorkerInfo` in `worker.py` to be a frozen dataclass.

Test Plan: contbuild & OSS CI

Differential Revision: D93776060
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments