Ensure inference fallback respects dynamic policy size#95
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR ensures that inference fallback logic respects dynamic policy tensor sizes rather than using hardcoded dimensions. The changes add a helper function to dynamically derive policy width from shared memory resources and update both server and client paths to use this derived width when creating fallback policy logits.
- Added
_get_policy_width_from_resource()helper function to extract policy tensor dimensions - Updated server fallback logic to derive policy width from worker resources
- Updated client fallback logic to use dynamic policy width instead of hardcoded 4672
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| return int(width) | ||
| except (TypeError, ValueError): | ||
| return 0 |
There was a problem hiding this comment.
The function returns 0 for invalid width values, but later code uses this as a tensor dimension. A zero-width tensor would cause runtime errors. Consider returning a sensible default width or raising an exception to fail fast.
| policy_logits_np = np.zeros( | ||
| (batch_size, policy_width), dtype=np.float32 | ||
| ) |
There was a problem hiding this comment.
When policy_width is 0 (from the helper function), this creates a tensor with shape (batch_size, 0) which will likely cause errors in downstream code expecting valid policy dimensions.
| if policy_width <= 0: | ||
| self.logger.debug( | ||
| "Falling back to zero-width policy logits due to missing shape information" | ||
| ) |
There was a problem hiding this comment.
Similar to the server path, creating a zero-width policy tensor when policy_width is 0 will cause runtime errors. The fallback should ensure a valid tensor dimension.
| ) | |
| ) | |
| policy_width = max(1, policy_width) |
Summary
Testing
https://chatgpt.com/codex/tasks/task_e_68d1bc54bd888323b9feb57ff68200d9