Skip to content

Ensure inference fallback respects dynamic policy size#95

Merged
lukifer23 merged 1 commit intomasterfrom
codex/refactor-fallback-policy-width-derivation
Sep 22, 2025
Merged

Ensure inference fallback respects dynamic policy size#95
lukifer23 merged 1 commit intomasterfrom
codex/refactor-fallback-policy-width-derivation

Conversation

@lukifer23
Copy link
Owner

Summary

  • add a helper to derive the policy tensor width from shared memory resources
  • use the derived width when building fallback policy logits on the server and client paths

Testing

  • python - <<'PY' ...

https://chatgpt.com/codex/tasks/task_e_68d1bc54bd888323b9feb57ff68200d9

Copilot AI review requested due to automatic review settings September 22, 2025 21:23
@lukifer23 lukifer23 merged commit c075bb0 into master Sep 22, 2025
1 check failed
@lukifer23 lukifer23 deleted the codex/refactor-fallback-policy-width-derivation branch September 22, 2025 21:23
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR ensures that inference fallback logic respects dynamic policy tensor sizes rather than using hardcoded dimensions. The changes add a helper function to dynamically derive policy width from shared memory resources and update both server and client paths to use this derived width when creating fallback policy logits.

  • Added _get_policy_width_from_resource() helper function to extract policy tensor dimensions
  • Updated server fallback logic to derive policy width from worker resources
  • Updated client fallback logic to use dynamic policy width instead of hardcoded 4672

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +96 to +98
return int(width)
except (TypeError, ValueError):
return 0
Copy link

Copilot AI Sep 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function returns 0 for invalid width values, but later code uses this as a tensor dimension. A zero-width tensor would cause runtime errors. Consider returning a sensible default width or raising an exception to fail fast.

Copilot uses AI. Check for mistakes.
Comment on lines +505 to +507
policy_logits_np = np.zeros(
(batch_size, policy_width), dtype=np.float32
)
Copy link

Copilot AI Sep 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When policy_width is 0 (from the helper function), this creates a tensor with shape (batch_size, 0) which will likely cause errors in downstream code expecting valid policy dimensions.

Copilot uses AI. Check for mistakes.
if policy_width <= 0:
self.logger.debug(
"Falling back to zero-width policy logits due to missing shape information"
)
Copy link

Copilot AI Sep 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the server path, creating a zero-width policy tensor when policy_width is 0 will cause runtime errors. The fallback should ensure a valid tensor dimension.

Suggested change
)
)
policy_width = max(1, policy_width)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants