Conversation
Signed-off-by: oliver könig <okoenig@nvidia.com>
d136f6d to
6130f07
Compare
Signed-off-by: oliver könig <okoenig@nvidia.com>
|
/claude review |
1 similar comment
|
/claude review |
| choice_data["kv_cache_staleness"] = result["kv_cache_staleness"] | ||
| choice_data["num_evictions"] = sum( | ||
| 1 for e in result["events"] if e.get("type") == "EVICT" | ||
| ) |
There was a problem hiding this comment.
Code Review
Bug: response shape change for previously-optional fields
The old code only included policy_staleness, kv_cache_staleness, and num_evictions in the response when they had meaningful values. The new code unconditionally includes them:
policy_staleness/kv_cache_stalenesswill beNonefor any request where those tensors were never populated (they'reOptional[torch.Tensor] = NoneonDynamicInferenceRequest).num_evictionswill be0even when there were no eviction events.
This is a breaking change to the response schema — consumers that treat the absence of these keys as "not applicable" will now always see them. If always-present fields are intentional, that's fine, but it should be explicit and clients should be updated.
Potential TypeError from chat_template_kwargs overlap
prompt_tokens = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
tools=req.get("tools", None),
**req.get("chat_template_kwargs", {}),
)If a caller passes tokenize, add_generation_prompt, or tools inside chat_template_kwargs, Python raises TypeError: got multiple values for keyword argument. There is no validation to guard against this — consider an explicit check or stripping overlapping keys with a warning.
Missing test coverage
No tests cover the chat_completions endpoint response shape for these changed fields. The existing tests in tests/unit_tests/inference/engines/test_dynamic_engine.py exercise DynamicInferenceRequest directly, not the endpoint. Missing coverage:
policy_staleness/kv_cache_stalenessalways present (possiblyNone) in responsenum_evictionsalways0when there are no EVICT eventschat_template_kwargsforwarding, including the overlap TypeError case
|
🔄 Merge queue validation started! You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/22685751754 |
|
🔄 Merge queue validation started! You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/22685854327 |
|
🔄 Merge queue validation started! You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/22692993456 |
|
🔄 Merge queue validation started! You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/22695852416 |
What does this PR do ?
Contribution process
flowchart LR A[Pre-checks] --> B[PR Tests] subgraph Code Review/Approval C1[Expert Review] --> C2[Final Review] end B --> C1 C2 --> D[Merge]Pre-checks
Core 0.8)Code review
The following process is enforced via the CODEOWNERS file for changes into
megatron/core. For changes outside ofmegatron/core, it is up to the PR author whether or not to tag the Final Reviewer team.For MRs into `main` branch
Feel free to message or comment the @mcore-oncall to help accelerate your merge into main. The less complex your PR is, the faster it will be approved and merged!
(Step 1): Add PR label
Expert Review(Step 2): Collect the expert reviewers reviews
Expert Reviewlabel when your PR is ready for review.Final Review might get declined if these requirements are not fulfilled.
(Step 3): Final Review
Final Reviewlabel(Optional Step 4): Cherry-pick into release branch
If this PR also needs to be merged into
core_r*release branches, after this PR has been merged, selectCherry-pickto open a new PR into the release branch.For MRs into `dev` branch
The proposed review process for `dev` branch is under active discussion.MRs are mergable after one approval by either
eharper@nvidia.comorzijiey@nvidia.com.Merging your PR
Any member of core-adlr and
core-nemowill be able to merge your PR.