Handle small tactical/openings NPZ batches#102
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR addresses the handling of small tactical and openings NPZ batch files to prevent runtime errors when sampling more data than available. The changes enable robust sampling from undersized curriculum data sources and skip empty data files.
Key changes include:
- Clamping tactical and openings sampling to the available curriculum data size
- Enabling replacement sampling only when the batch size exceeds the available data
- Adding empty data validation to skip malformed outputs
- Including regression tests for undersized NPZ file scenarios
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| tests/test_data_manager_batches.py | Adds regression tests that verify batch sampling from undersized tactical and openings NPZ files |
| azchess/data_manager.py | Updates tactical and openings batch methods to handle small files with proper size validation and replacement sampling |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| draw_size = min(batch_size, total_positions) | ||
| replace = batch_size > total_positions | ||
| target_size = batch_size if replace else draw_size |
There was a problem hiding this comment.
[nitpick] The variable draw_size is misleading since it represents the sample size when replacement is disabled, not the size being drawn. Consider renaming to sample_size_no_replace or clamped_size for clarity.
| draw_size = min(batch_size, total_positions) | |
| replace = batch_size > total_positions | |
| target_size = batch_size if replace else draw_size | |
| sample_size_no_replace = min(batch_size, total_positions) | |
| replace = batch_size > total_positions | |
| target_size = batch_size if replace else sample_size_no_replace |
| draw_size = min(batch_size, total_positions) | ||
| replace = batch_size > total_positions | ||
| target_size = batch_size if replace else draw_size |
There was a problem hiding this comment.
[nitpick] The variable draw_size is misleading since it represents the sample size when replacement is disabled, not the size being drawn. Consider renaming to sample_size_no_replace or clamped_size for clarity.
| draw_size = min(batch_size, total_positions) | |
| replace = batch_size > total_positions | |
| target_size = batch_size if replace else draw_size | |
| clamped_size = min(batch_size, total_positions) | |
| replace = batch_size > total_positions | |
| target_size = batch_size if replace else clamped_size |
Summary
Testing
https://chatgpt.com/codex/tasks/task_e_68e6f991ddcc83239164246fad36d7cc