-
Notifications
You must be signed in to change notification settings - Fork 31
Add ReasoningFormat detection and automatic polyfills #89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
docker run --rm \
-v "$PWD":/src:ro \
-v "$PWD/build-docker":/src/build \
-w /src \
"$(echo "
FROM ghcr.io/astral-sh/uv:debian-slim
RUN apt-get update && apt-get install -y build-essential libcurl4-openssl-dev cmake clang-tidy
" | docker build . -q -f - )" \
bash -c "
cmake -B build -DCMAKE_BUILD_TYPE=Debug -DMINJA_SANITIZER=address && \
cmake --build build -j --config Debug && \
ctest --test-dir build -j -C Debug --output-on-failure
"
Fixes #4 - Fix parsing of values (nested method calls on function calls, e.g. `foo(x).bar(y)`) - Fix tool call capability detection - Tolerate `ensure_ascii` arg in `tojson` with support in Python jinja2 testing harness (supersedes google#84 - thanks @cnaples79 - & google#69 - thanks @rouseabout ),
Minimax has a different format for tools, so need this one more case. --------- Co-authored-by: Olivier Chafik <olivier.chafik@gmail.com>
Used among others in SmolVLM template Edit: Noticed that the `capitalize` function is actually not working correctly, added fix. Fixes ggml-org/llama.cpp#17871
the chat template in unsloth/Qwen3-Next-80B-A3B-Thinking-GGUF uses `|
first`
```
{%- set reasoning_content = ((content.split('</think>')|first).rstrip('\n').split('<think>')|last).lstrip('\n') %} {%- set content = (content.split('</think>')|last).lstrip('\n') %}
```
Co-authored-by: zhaobin <zhaobin@icbench.com>
Co-authored-by: Olivier Chafik <olivier.chafik@gmail.com>
## Summary This PR fixes multiple CI issues to get all builds passing on Windows, macOS, and Linux. ## Changes ### Workflow Fixes - **Branch trigger**: Changed from `master` to `main` - **Sanitizer exclusions**: Added exclusions for MSVC ARM64 builds (address/thread/undefined sanitizers not supported) ### Build Fixes - **Disabled clang-tidy for address sanitizer builds**: Avoids GCC `-Wno-maybe-uninitialized` flag incompatibility with clang-tidy - **Disabled cppcheck on Windows**: Fixes `std.cfg` not found error - **Added `-Wa,-mbig-obj` for MinGW Debug builds**: Fixes COFF section limit exceeded error (>65535 sections) ### Python/Encoding Fixes - **Added `PYTHONIOENCODING=utf-8`** to Configure and Test steps for Windows Unicode support - **Added `encoding='utf-8'`** to all file operations in `fetch_templates_and_goldens.py` - **Added `newline='\n'`** to force Unix line endings in generated files ### Test Fixes - **Normalize actual template output**: Apply `normalize_newlines()` to actual output in tests - **Windows blank line workaround**: Added `collapse_blank_lines()` for Windows due to a known issue where C++ minja outputs fewer newlines than Python Jinja2 (tracked in #16) ## Related Issues - #16 - Windows: C++ minja outputs fewer newlines than Python Jinja2 ## Test Plan - [x] All 28 CI jobs pass (Windows, macOS, Linux with various sanitizers and build types) 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
## Summary Implements support for DeepSeek V3.2's DSML (Domain Specific Markup Language) format, superseeds #11 (cc/ @hksdpc255) DeepSeek V3.2 doesn't provide a Jinja template but uses a custom Python encoding with DSML format: ```xml <|DSML|parameter name="key" string="true">value</|DSML|parameter> ``` ## Changes - **Simplified argument needle detection**: Changed from specific patterns (`"argument_needle":`, `="argument_needle"`) to broader `"argument_needle"` pattern which matches both JSON keys and DSML attribute values - **Local .jinja file support**: Fetch script now handles local `.jinja` files in MODEL_IDS (for synthetic test templates) - **Synthetic template**: Added `synthetic-deepseek-v3.2-dsml.jinja` replicating V3.2's Python encoding logic (from `encoding_dsv32.py`) - **Integrated testing**: Added synthetic template to MODEL_IDS, generates 3 test cases (simple, system, tool_use) ## Test plan - [x] All 248 tests pass - [x] Capability detection correctly identifies DSML format (`supports_tool_calls: true`, `requires_object_arguments: true`) - [x] Synthetic template tests pass for all contexts Closes #11 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Fixes #16 Looks like `std::regex_replace()` does not respect anchors, at least not in Windows. **Minimal reproducing example (Microsoft (R) C/C++ Optimizing Compiler Version 19.44.35221 for x64)** ```cpp #include <iostream> #include <regex> int main() { auto text = "\nthis contains\n\nmultiple\nline\n\nbreaks\n\n"; std::cout << "== Leading ==\n"; auto bad = std::regex_replace(text, std::regex(R"(^\s)"), ""); std::cout << "Bad: " << bad << "\n"; std::cout << "==\n"; std::string good = text; good.erase(0, good.find_first_not_of(" \t\r\n")); std::cout << "Good: " << good << "\n"; std::cout << "==\n"; std::cout << "== Trailing ==\n"; bad = std::regex_replace(text, std::regex(R"(\s$)"), ""); std::cout << "Bad: " << bad << "\n"; std::cout << "==\n"; good = text; auto pos = good.find_last_not_of(" \t\n\r\f\v"); good.resize(pos == std::string::npos ? 0 : pos + 1); std::cout << "Good: " << good << "\n"; std::cout << "==\n"; } ``` ``` == Leading == Bad: this contains multiple line breaks == Good: this contains multiple line breaks == == Trailing == Bad: this contains multiple line breaks == Good: this contains multiple line breaks == ``` Passes all the tests, excluding the gated templates I don't have. ``` $ ctest -R test-supported-template -j 24 ... 100% tests passed, 0 tests failed out of 220 Total Test time (real) = 32.38 sec The following tests did not run: 11 - test-supported-template-google-gemma-7b-it (Skipped) 12 - test-supported-template-CohereForAI-c4ai-command-r-plus (Skipped) 14 - test-supported-template-meta-llama-Llama-3.2-3B-Instruct (Skipped) 15 - test-supported-template-meta-llama-Llama-3.1-8B-Instruct (Skipped) 16 - test-supported-template-meta-llama-Meta-Llama-3-8B-Instruct (Skipped) 18 - test-supported-template-meta-llama-Llama-2-7b-chat-hf (Skipped) 54 - test-supported-template-CohereForAI-aya-expanse-8b (Skipped) 55 - test-supported-template-databricks-dbrx-instruct (Skipped) ```
- Add supports_thinking flag to detect reasoning_content field support - Add supports_disable_thinking, supports_reasoning_only, supports_reasoning_with_content flags - Add reasoning_requires_tools flag for templates that only reason with tools - Add tests for Qwen3-235B-A22B-Thinking-2507 and GLM-4.6 - Add model IDs: DeepSeek-V3.1, granite-3.3-2b-instruct, GLM-4.7 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…cture ThinkingPattern detection & polyfills: - Add polyfill logic to transform reasoning_content to template's native format - Support for THOUGHT_FIELD (MiniCPM3), THINKING_FIELD (GPT-OSS), TOOL_PLAN_FIELD (Command-R7B) - Add CONTENT_BLOCK patterns (Ministral/Apertus) with improved detection - Improved content block detection: reject stringified output by checking for structural markers - Add supports_clear_thinking detection for templates like GLM-4.7 Test infrastructure: - Add test metadata (_test_metadata) to context JSON files for template-independent validation - Add expected_strings/forbidden_strings checks to test-supported-template.cpp - Support conditional checks: expected_strings_if_supports_thinking, _system_role, _tool_calls, _tool_responses - Add ThinkingPattern capability tests to test-capabilities.cpp New reasoning test contexts: - reasoning_only.json - basic reasoning content - reasoning_multi_turn.json - multi-turn conversation with reasoning - reasoning_position_based.json - position-based visibility - reasoning_clear_thinking.json - clear_thinking flag behavior - reasoning_with_tools.json - reasoning with tool calls - reasoning_disabled.json - enable_thinking=false 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add the missing collapse_blank_lines function and regex include that was lost during the rebase conflict resolution. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The template is already in MODEL_IDS and gets downloaded to build/tests/ during cmake configure. No need to commit it separately. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
API renames for consistency: - ThinkingPattern → ReasoningFormat - REASONING_CONTENT_FIELD → REASONING_CONTENT - thinking_pattern → reasoning_format - supports_thinking → supports_reasoning - supports_clear_thinking → supports_reasoning_visibility New behavior detection probes (computed via template rendering): - supports_reasoning_without_content: Can emit reasoning with empty content - supports_reasoning_with_content: Can emit both reasoning and content - respects_enable_reasoning: Template honors enable_thinking=false Added tool_plan_reasoning.json test context for TOOL_PLAN_FIELD format. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The name directly matches the input flag (clear_thinking).
… tojson separators - Rename `requires_typed_content` to `requires_typed_content_blocks` for clarity - Rename ReasoningFormat enum values: - REASONING_CONTENT → REASONING_CONTENT_FIELD - CONTENT_BLOCK_THINKING → THINKING_CONTENT_BLOCK - CONTENT_BLOCK_THOUGHTS → THOUGHTS_CONTENT_BLOCK - Add `tojson(separators=...)` support (used by Kimi K2 template) - Add Kimi K2 (moonshotai/Kimi-K2-Instruct) to test suite - Add capabilities tests for reasoning_requires_tools behavior - Add stringification checks to test contexts 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
ochafik
added a commit
to ochafik/llama.cpp
that referenced
this pull request
Dec 30, 2025
- Rename requires_typed_content → requires_typed_content_blocks - Rename ReasoningFormat enum values for clarity - Add tojson(separators=...) support for Kimi K2 template - Sync from google/minja#89 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
ochafik
added a commit
to ochafik/llama.cpp
that referenced
this pull request
Dec 30, 2025
- Rename requires_typed_content → requires_typed_content_blocks - Rename ReasoningFormat enum values for clarity - Add tojson(separators=...) support for Kimi K2 template - Sync from google/minja#89 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds automatic detection of reasoning/thinking format support in chat templates, enabling automatic polyfills when needed.
Key Changes
ReasoningFormat enum with detection for 6 different formats:
REASONING_CONTENT_FIELD-message.reasoning_contentfield (Qwen3, GLM-4.6/4.7)THINKING_CONTENT_BLOCK-message.content[].type == "thinking"(Ministral, DeepSeek-R1)THOUGHTS_CONTENT_BLOCK-message.content[].type == "thoughts"(Apertus, Kimi K2)THOUGHT_FIELD-message.thoughtfield (MiniCPM3)TOOL_PLAN_FIELD-message.tool_planfield (Command-R7B)THINKING_FIELD-message.thinkingfield (GPT-OSS-120B)Automatic polyfills: When a template supports reasoning but uses a non-canonical format, the polyfill system automatically converts
reasoning_contentto the template's native formatCapability detection flags:
supports_reasoning- Template supports some form of reasoningreasoning_requires_tools- Reasoning only works with tool_calls (Command-R7B, TOOL_PLAN_FIELD)supports_reasoning_without_content/supports_reasoning_with_contentrespects_enable_reasoning- Template responds toenable_thinking=falsesupports_clear_thinking- GLM-4.7's reasoning visibility controlrequires_typed_content_blocks- Template expects content as[{type: "text", text: ...}]New model support: Added Kimi K2 (
moonshotai/Kimi-K2-Instruct) withTHOUGHTS_CONTENT_BLOCKformattojson separators support: Added
tojson(separators=(',', ':'))for compact JSON output (used by Kimi K2)llama.cpp Integration
This enables llama.cpp to:
reasoning_contentto native formatscommon_chat_msg_parser_oaicompat.cpp- Only needs to handlereasoning_contentfieldchat-peg-parser.cpp- Can simplify content block parsingTest plan
🤖 Generated with Claude Code