Skip to content

Refactor training batch sampling and add coverage#101

Merged
lukifer23 merged 1 commit intomasterfrom
codex/refactor-datamanager-for-mini-batch-training
Oct 12, 2025
Merged

Refactor training batch sampling and add coverage#101
lukifer23 merged 1 commit intomasterfrom
codex/refactor-datamanager-for-mini-batch-training

Conversation

@lukifer23
Copy link
Owner

Summary

  • refactor DataManager.get_training_batch to draw from dedicated external and self-play shard iterators
  • ensure batch assembly falls back to the remaining source when one iterator is exhausted
  • add a unit test that verifies the external mixing ratio on a seeded replay set

Testing

  • pytest tests/test_data_manager.py

https://chatgpt.com/codex/tasks/task_e_68e6f98fd1d48323a4726f10a01a695d

Copilot AI review requested due to automatic review settings October 12, 2025 21:54
@lukifer23 lukifer23 merged commit 0bdf49d into master Oct 12, 2025
1 check failed
@lukifer23 lukifer23 deleted the codex/refactor-datamanager-for-mini-batch-training branch October 12, 2025 21:54
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the training batch sampling mechanism in DataManager to improve data source mixing and adds comprehensive test coverage. The refactoring changes from a single-loop batch generation approach to dedicated iterators for external and self-play data sources with proper fallback handling.

  • Replaces batch-level sampling with individual sample iterators for better mixing control
  • Implements fallback logic to fill batches when one iterator is exhausted
  • Adds unit test to verify the 30% external data mixing ratio target

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
azchess/data_manager.py Refactored get_training_batch method to use separate iterators and added _iter_shard_samples helper method
tests/test_data_manager.py Added new test file with comprehensive test for external/self-play data balancing

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

return

if len(combined) < batch_size:
return
Copy link

Copilot AI Oct 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Early return when combined samples are less than batch_size may cause incomplete batches to be yielded. This could lead to training instability with variable batch sizes. Consider either padding to exact batch_size or ensuring minimum batch requirements are met.

Suggested change
return
# Pad with dummy samples to ensure consistent batch size
if len(combined) == 0:
return
# Create dummy sample based on the first sample's structure
first_sample = combined[0]
state_shape = first_sample[0].shape
policy_shape = first_sample[1].shape
value_shape = () if np.isscalar(first_sample[2]) else np.shape(first_sample[2])
legal_mask_shape = first_sample[3].shape if len(first_sample) > 3 and first_sample[3] is not None else None
num_to_pad = batch_size - len(combined)
for _ in range(num_to_pad):
dummy_state = np.zeros(state_shape, dtype=first_sample[0].dtype)
dummy_policy = np.zeros(policy_shape, dtype=first_sample[1].dtype)
dummy_value = np.zeros(value_shape, dtype=np.float32)
if legal_mask_shape is not None:
dummy_legal_mask = np.zeros(legal_mask_shape, dtype=first_sample[3].dtype)
else:
dummy_legal_mask = None
combined.append((dummy_state, dummy_policy, dummy_value, dummy_legal_mask))

Copilot uses AI. Check for mistakes.
Comment on lines +304 to +310
if external_ratio > 0 and external_batch_size == 0:
external_batch_size = 1
selfplay_batch_size = max(0, batch_size - external_batch_size)

for shard_path in shard_paths:
try:
# Memory-map to reduce RSS and speed IO
with np.load(shard_path, mmap_mode='r') as data:
states, policies, values, legal_mask_all, ssl_targets = self._extract_training_arrays(data)
if selfplay_ratio > 0 and selfplay_batch_size == 0:
selfplay_batch_size = 1
external_batch_size = max(0, batch_size - selfplay_batch_size)
Copy link

Copilot AI Oct 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic for ensuring minimum batch sizes when ratios are non-zero is duplicated and could lead to inconsistent state. Consider consolidating this logic into a single validation step or extracting it into a helper method.

Copilot uses AI. Check for mistakes.
batch = next(generator)
states = batch[0]
total_samples += states.shape[0]
external_samples += int(np.sum(np.isclose(states[:, 0, 0, 0], 2.0)))
Copy link

Copilot AI Oct 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using magic number 2.0 to identify external samples makes the test brittle. Consider using a named constant like EXTERNAL_FILL_VALUE = 2.0 to make the test logic clearer and more maintainable.

Copilot uses AI. Check for mistakes.
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

# Separate external and self-play shards for balanced sampling
external_shards = [s for s in valid_shards if s.source and 'stockfish' in s.source]
selfplay_shards = [s for s in valid_shards if s.source == 'selfplay' or not s.source]
# Calculate balanced proportions (ensure external data gets fair representation)
total_external_samples = sum(s.sample_count for s in external_shards)
total_selfplay_samples = sum(s.sample_count for s in selfplay_shards)
# Target 30% external data, 70% self-play for balanced learning
target_external_ratio = 0.3
# Adjust ratios based on available data
if total_external_samples == 0:
external_ratio = 0.0
selfplay_ratio = 1.0
elif total_selfplay_samples == 0:
external_ratio = 1.0
selfplay_ratio = 0.0
else:
external_ratio = min(target_external_ratio, total_external_samples / (total_external_samples + total_selfplay_samples))
selfplay_ratio = 1.0 - external_ratio
if not external_shards:
external_batch_size = 0
selfplay_batch_size = batch_size
elif not selfplay_shards:
external_batch_size = batch_size
selfplay_batch_size = 0
else:
external_batch_size = int(round(batch_size * external_ratio))
external_batch_size = min(batch_size, external_batch_size)
selfplay_batch_size = batch_size - external_batch_size
if external_ratio > 0 and external_batch_size == 0:
external_batch_size = 1
selfplay_batch_size = max(0, batch_size - external_batch_size)
if selfplay_ratio > 0 and selfplay_batch_size == 0:
selfplay_batch_size = 1
external_batch_size = max(0, batch_size - selfplay_batch_size)
external_iter = self._iter_shard_samples(external_shards) if external_shards else None
selfplay_iter = self._iter_shard_samples(selfplay_shards) if selfplay_shards else None
if external_iter is None and selfplay_iter is None:
raise RuntimeError("No valid training data available")

P1 Badge External shard filter ignores non-stockfish sources

The new batching logic only classifies shards as external when source contains 'stockfish' and as self-play when the source is 'selfplay' or empty. Shards imported via import_replay_dir often carry other labels such as 'external' or teacher:* (see orchestrator.import_replay_dir), but they are now dropped from both lists. If a run relies solely on these external/teacher shards, get_training_batch will raise RuntimeError("No valid training data available") even though data exists, and when mixed with self-play shards those external shards are never sampled. This regression prevents training pipelines that ingest non-stockfish external data from working.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants