Add ReasoningFormat detection and automatic polyfills #89

ochafik · 2025-12-30T22:12:52Z

Summary

This PR adds automatic detection of reasoning/thinking format support in chat templates, enabling automatic polyfills when needed.

Key Changes

ReasoningFormat enum with detection for 6 different formats:
- REASONING_CONTENT_FIELD - message.reasoning_content field (Qwen3, GLM-4.6/4.7)
- THINKING_CONTENT_BLOCK - message.content[].type == "thinking" (Ministral, DeepSeek-R1)
- THOUGHTS_CONTENT_BLOCK - message.content[].type == "thoughts" (Apertus, Kimi K2)
- THOUGHT_FIELD - message.thought field (MiniCPM3)
- TOOL_PLAN_FIELD - message.tool_plan field (Command-R7B)
- THINKING_FIELD - message.thinking field (GPT-OSS-120B)
Automatic polyfills: When a template supports reasoning but uses a non-canonical format, the polyfill system automatically converts reasoning_content to the template's native format
Capability detection flags:
- supports_reasoning - Template supports some form of reasoning
- reasoning_requires_tools - Reasoning only works with tool_calls (Command-R7B, TOOL_PLAN_FIELD)
- supports_reasoning_without_content / supports_reasoning_with_content
- respects_enable_reasoning - Template responds to enable_thinking=false
- supports_clear_thinking - GLM-4.7's reasoning visibility control
- requires_typed_content_blocks - Template expects content as [{type: "text", text: ...}]
New model support: Added Kimi K2 (moonshotai/Kimi-K2-Instruct) with THOUGHTS_CONTENT_BLOCK format
tojson separators support: Added tojson(separators=(',', ':')) for compact JSON output (used by Kimi K2)

llama.cpp Integration

This enables llama.cpp to:

Automatically detect reasoning format from model templates
Apply polyfills to convert reasoning_content to native formats
Simplify parsers: With polyfills converting output to canonical format, the following parsers could be simplified:
- common_chat_msg_parser_oaicompat.cpp - Only needs to handle reasoning_content field
- chat-peg-parser.cpp - Can simplify content block parsing

Test plan

All 880 minja tests pass
All llama.cpp chat tests pass (test-chat, test-chat-template, test-chat-parser, test-chat-peg-parser)
Verified Kimi K2 template works with tool calling

🤖 Generated with Claude Code

docker run --rm \ -v "$PWD":/src:ro \ -v "$PWD/build-docker":/src/build \ -w /src \ "$(echo " FROM ghcr.io/astral-sh/uv:debian-slim RUN apt-get update && apt-get install -y build-essential libcurl4-openssl-dev cmake clang-tidy " | docker build . -q -f - )" \ bash -c " cmake -B build -DCMAKE_BUILD_TYPE=Debug -DMINJA_SANITIZER=address && \ cmake --build build -j --config Debug && \ ctest --test-dir build -j -C Debug --output-on-failure "

@cnaples79

Fixes #4 - Fix parsing of values (nested method calls on function calls, e.g. `foo(x).bar(y)`) - Fix tool call capability detection - Tolerate `ensure_ascii` arg in `tojson` with support in Python jinja2 testing harness (supersedes google#84 - thanks @cnaples79 - & google#69 - thanks @rouseabout ),

…ent)

Minimax has a different format for tools, so need this one more case. --------- Co-authored-by: Olivier Chafik <olivier.chafik@gmail.com>

Used among others in SmolVLM template Edit: Noticed that the `capitalize` function is actually not working correctly, added fix. Fixes ggml-org/llama.cpp#17871

the chat template in unsloth/Qwen3-Next-80B-A3B-Thinking-GGUF uses `| first` ``` {%- set reasoning_content = ((content.split('</think>')|first).rstrip('\n').split('<think>')|last).lstrip('\n') %} {%- set content = (content.split('</think>')|last).lstrip('\n') %} ``` Co-authored-by: zhaobin <zhaobin@icbench.com> Co-authored-by: Olivier Chafik <olivier.chafik@gmail.com>

## Summary This PR fixes multiple CI issues to get all builds passing on Windows, macOS, and Linux. ## Changes ### Workflow Fixes - **Branch trigger**: Changed from `master` to `main` - **Sanitizer exclusions**: Added exclusions for MSVC ARM64 builds (address/thread/undefined sanitizers not supported) ### Build Fixes - **Disabled clang-tidy for address sanitizer builds**: Avoids GCC `-Wno-maybe-uninitialized` flag incompatibility with clang-tidy - **Disabled cppcheck on Windows**: Fixes `std.cfg` not found error - **Added `-Wa,-mbig-obj` for MinGW Debug builds**: Fixes COFF section limit exceeded error (>65535 sections) ### Python/Encoding Fixes - **Added `PYTHONIOENCODING=utf-8`** to Configure and Test steps for Windows Unicode support - **Added `encoding='utf-8'`** to all file operations in `fetch_templates_and_goldens.py` - **Added `newline='\n'`** to force Unix line endings in generated files ### Test Fixes - **Normalize actual template output**: Apply `normalize_newlines()` to actual output in tests - **Windows blank line workaround**: Added `collapse_blank_lines()` for Windows due to a known issue where C++ minja outputs fewer newlines than Python Jinja2 (tracked in #16) ## Related Issues - #16 - Windows: C++ minja outputs fewer newlines than Python Jinja2 ## Test Plan - [x] All 28 CI jobs pass (Windows, macOS, Linux with various sanitizers and build types) 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

@hksdpc255

## Summary Implements support for DeepSeek V3.2's DSML (Domain Specific Markup Language) format, superseeds #11 (cc/ @hksdpc255) DeepSeek V3.2 doesn't provide a Jinja template but uses a custom Python encoding with DSML format: ```xml <｜DSML｜parameter name="key" string="true">value</｜DSML｜parameter> ``` ## Changes - **Simplified argument needle detection**: Changed from specific patterns (`"argument_needle":`, `="argument_needle"`) to broader `"argument_needle"` pattern which matches both JSON keys and DSML attribute values - **Local .jinja file support**: Fetch script now handles local `.jinja` files in MODEL_IDS (for synthetic test templates) - **Synthetic template**: Added `synthetic-deepseek-v3.2-dsml.jinja` replicating V3.2's Python encoding logic (from `encoding_dsv32.py`) - **Integrated testing**: Added synthetic template to MODEL_IDS, generates 3 test cases (simple, system, tool_use) ## Test plan - [x] All 248 tests pass - [x] Capability detection correctly identifies DSML format (`supports_tool_calls: true`, `requires_object_arguments: true`) - [x] Synthetic template tests pass for all contexts Closes #11 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

Fixes #16 Looks like `std::regex_replace()` does not respect anchors, at least not in Windows. **Minimal reproducing example (Microsoft (R) C/C++ Optimizing Compiler Version 19.44.35221 for x64)** ```cpp #include <iostream> #include <regex> int main() { auto text = "\nthis contains\n\nmultiple\nline\n\nbreaks\n\n"; std::cout << "== Leading ==\n"; auto bad = std::regex_replace(text, std::regex(R"(^\s)"), ""); std::cout << "Bad: " << bad << "\n"; std::cout << "==\n"; std::string good = text; good.erase(0, good.find_first_not_of(" \t\r\n")); std::cout << "Good: " << good << "\n"; std::cout << "==\n"; std::cout << "== Trailing ==\n"; bad = std::regex_replace(text, std::regex(R"(\s$)"), ""); std::cout << "Bad: " << bad << "\n"; std::cout << "==\n"; good = text; auto pos = good.find_last_not_of(" \t\n\r\f\v"); good.resize(pos == std::string::npos ? 0 : pos + 1); std::cout << "Good: " << good << "\n"; std::cout << "==\n"; } ``` ``` == Leading == Bad: this contains multiple line breaks == Good: this contains multiple line breaks == == Trailing == Bad: this contains multiple line breaks == Good: this contains multiple line breaks == ``` Passes all the tests, excluding the gated templates I don't have. ``` $ ctest -R test-supported-template -j 24 ... 100% tests passed, 0 tests failed out of 220 Total Test time (real) = 32.38 sec The following tests did not run: 11 - test-supported-template-google-gemma-7b-it (Skipped) 12 - test-supported-template-CohereForAI-c4ai-command-r-plus (Skipped) 14 - test-supported-template-meta-llama-Llama-3.2-3B-Instruct (Skipped) 15 - test-supported-template-meta-llama-Llama-3.1-8B-Instruct (Skipped) 16 - test-supported-template-meta-llama-Meta-Llama-3-8B-Instruct (Skipped) 18 - test-supported-template-meta-llama-Llama-2-7b-chat-hf (Skipped) 54 - test-supported-template-CohereForAI-aya-expanse-8b (Skipped) 55 - test-supported-template-databricks-dbrx-instruct (Skipped) ```

- Add supports_thinking flag to detect reasoning_content field support - Add supports_disable_thinking, supports_reasoning_only, supports_reasoning_with_content flags - Add reasoning_requires_tools flag for templates that only reason with tools - Add tests for Qwen3-235B-A22B-Thinking-2507 and GLM-4.6 - Add model IDs: DeepSeek-V3.1, granite-3.3-2b-instruct, GLM-4.7 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

…cture ThinkingPattern detection & polyfills: - Add polyfill logic to transform reasoning_content to template's native format - Support for THOUGHT_FIELD (MiniCPM3), THINKING_FIELD (GPT-OSS), TOOL_PLAN_FIELD (Command-R7B) - Add CONTENT_BLOCK patterns (Ministral/Apertus) with improved detection - Improved content block detection: reject stringified output by checking for structural markers - Add supports_clear_thinking detection for templates like GLM-4.7 Test infrastructure: - Add test metadata (_test_metadata) to context JSON files for template-independent validation - Add expected_strings/forbidden_strings checks to test-supported-template.cpp - Support conditional checks: expected_strings_if_supports_thinking, _system_role, _tool_calls, _tool_responses - Add ThinkingPattern capability tests to test-capabilities.cpp New reasoning test contexts: - reasoning_only.json - basic reasoning content - reasoning_multi_turn.json - multi-turn conversation with reasoning - reasoning_position_based.json - position-based visibility - reasoning_clear_thinking.json - clear_thinking flag behavior - reasoning_with_tools.json - reasoning with tool calls - reasoning_disabled.json - enable_thinking=false 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add the missing collapse_blank_lines function and regex include that was lost during the rebase conflict resolution. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The template is already in MODEL_IDS and gets downloaded to build/tests/ during cmake configure. No need to commit it separately. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

API renames for consistency: - ThinkingPattern → ReasoningFormat - REASONING_CONTENT_FIELD → REASONING_CONTENT - thinking_pattern → reasoning_format - supports_thinking → supports_reasoning - supports_clear_thinking → supports_reasoning_visibility New behavior detection probes (computed via template rendering): - supports_reasoning_without_content: Can emit reasoning with empty content - supports_reasoning_with_content: Can emit both reasoning and content - respects_enable_reasoning: Template honors enable_thinking=false Added tool_plan_reasoning.json test context for TOOL_PLAN_FIELD format. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The name directly matches the input flag (clear_thinking).

… tojson separators - Rename `requires_typed_content` to `requires_typed_content_blocks` for clarity - Rename ReasoningFormat enum values: - REASONING_CONTENT → REASONING_CONTENT_FIELD - CONTENT_BLOCK_THINKING → THINKING_CONTENT_BLOCK - CONTENT_BLOCK_THOUGHTS → THOUGHTS_CONTENT_BLOCK - Add `tojson(separators=...)` support (used by Kimi K2 template) - Add Kimi K2 (moonshotai/Kimi-K2-Instruct) to test suite - Add capabilities tests for reasoning_requires_tools behavior - Add stringification checks to test contexts 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

google-cla · 2025-12-30T22:13:01Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

- Rename requires_typed_content → requires_typed_content_blocks - Rename ReasoningFormat enum values for clarity - Add tojson(separators=...) support for Kimi K2 template - Sync from google/minja#89 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

ochafik and others added 26 commits November 2, 2025 17:52

build & test w/ sanitizers

e75dff5

fix bad patch

03a6c98

Add tiny reserves in value ctor (+ use emplace to avoid some copies)

2a42ba8

drop unused enable_shared_from_this

844eae8

Update minja.hpp

dc245f5

Update minja.hpp

6641257

Update build.yml

bd30364

Add missing capabilities tests (tool call id & requires non null cont…

0c55c36

…ent)

fix sanitizer exclusion in github workflow matrix

41f9022

Update README.md

4891676

Support MiniMax tool call format (#7)

c755506

Minimax has a different format for tools, so need this one more case. --------- Co-authored-by: Olivier Chafik <olivier.chafik@gmail.com>

Dedupe test templates (#8)

911b645

Add capitalize filter and fix method (#12)

9744121

Used among others in SmolVLM template Edit: Noticed that the `capitalize` function is actually not working correctly, added fix. Fixes ggml-org/llama.cpp#17871

Revert supports_reasoning_visibility → supports_clear_thinking

1e39bb3

The name directly matches the input flag (clear_thinking).

ochafik closed this Dec 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add ReasoningFormat detection and automatic polyfills #89

Add ReasoningFormat detection and automatic polyfills #89

Uh oh!

ochafik commented Dec 30, 2025

Uh oh!

google-cla bot commented Dec 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Add ReasoningFormat detection and automatic polyfills #89

Add ReasoningFormat detection and automatic polyfills #89

Uh oh!

Conversation

ochafik commented Dec 30, 2025

Summary

Key Changes

llama.cpp Integration

Test plan

Uh oh!

google-cla bot commented Dec 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants