Skip to content

Conversation

@ochafik
Copy link
Contributor

@ochafik ochafik commented Dec 30, 2025

Summary

This PR adds automatic detection of reasoning/thinking format support in chat templates, enabling automatic polyfills when needed.

Key Changes

  • ReasoningFormat enum with detection for 6 different formats:

    • REASONING_CONTENT_FIELD - message.reasoning_content field (Qwen3, GLM-4.6/4.7)
    • THINKING_CONTENT_BLOCK - message.content[].type == "thinking" (Ministral, DeepSeek-R1)
    • THOUGHTS_CONTENT_BLOCK - message.content[].type == "thoughts" (Apertus, Kimi K2)
    • THOUGHT_FIELD - message.thought field (MiniCPM3)
    • TOOL_PLAN_FIELD - message.tool_plan field (Command-R7B)
    • THINKING_FIELD - message.thinking field (GPT-OSS-120B)
  • Automatic polyfills: When a template supports reasoning but uses a non-canonical format, the polyfill system automatically converts reasoning_content to the template's native format

  • Capability detection flags:

    • supports_reasoning - Template supports some form of reasoning
    • reasoning_requires_tools - Reasoning only works with tool_calls (Command-R7B, TOOL_PLAN_FIELD)
    • supports_reasoning_without_content / supports_reasoning_with_content
    • respects_enable_reasoning - Template responds to enable_thinking=false
    • supports_clear_thinking - GLM-4.7's reasoning visibility control
    • requires_typed_content_blocks - Template expects content as [{type: "text", text: ...}]
  • New model support: Added Kimi K2 (moonshotai/Kimi-K2-Instruct) with THOUGHTS_CONTENT_BLOCK format

  • tojson separators support: Added tojson(separators=(',', ':')) for compact JSON output (used by Kimi K2)

llama.cpp Integration

This enables llama.cpp to:

  1. Automatically detect reasoning format from model templates
  2. Apply polyfills to convert reasoning_content to native formats
  3. Simplify parsers: With polyfills converting output to canonical format, the following parsers could be simplified:
    • common_chat_msg_parser_oaicompat.cpp - Only needs to handle reasoning_content field
    • chat-peg-parser.cpp - Can simplify content block parsing

Test plan

  • All 880 minja tests pass
  • All llama.cpp chat tests pass (test-chat, test-chat-template, test-chat-parser, test-chat-peg-parser)
  • Verified Kimi K2 template works with tool calling

🤖 Generated with Claude Code

ochafik and others added 26 commits November 2, 2025 17:52
docker run --rm \
  -v "$PWD":/src:ro \
  -v "$PWD/build-docker":/src/build \
  -w /src \
  "$(echo "
    FROM ghcr.io/astral-sh/uv:debian-slim
    RUN apt-get update && apt-get install -y build-essential libcurl4-openssl-dev cmake clang-tidy
  " | docker build . -q -f - )" \
  bash -c "
    cmake -B build -DCMAKE_BUILD_TYPE=Debug -DMINJA_SANITIZER=address && \
    cmake --build build -j --config Debug && \
    ctest --test-dir build -j -C Debug --output-on-failure
  "
Fixes #4

- Fix parsing of values (nested method calls on function calls, e.g.
`foo(x).bar(y)`)
- Fix tool call capability detection
- Tolerate `ensure_ascii` arg in `tojson` with support in Python jinja2
testing harness (supersedes google#84 -
thanks @cnaples79 - & google#69 - thanks
@rouseabout ),
Minimax has a different format for tools, so need this one more case.

---------

Co-authored-by: Olivier Chafik <olivier.chafik@gmail.com>
Used among others in SmolVLM template

Edit: Noticed that the `capitalize` function is actually not working
correctly, added fix.

Fixes ggml-org/llama.cpp#17871
the chat template in unsloth/Qwen3-Next-80B-A3B-Thinking-GGUF uses `|
first`

```
{%- set reasoning_content = ((content.split('</think>')|first).rstrip('\n').split('<think>')|last).lstrip('\n') %} {%- set content = (content.split('</think>')|last).lstrip('\n') %}
```

Co-authored-by: zhaobin <zhaobin@icbench.com>
Co-authored-by: Olivier Chafik <olivier.chafik@gmail.com>
## Summary

This PR fixes multiple CI issues to get all builds passing on Windows,
macOS, and Linux.

## Changes

### Workflow Fixes
- **Branch trigger**: Changed from `master` to `main`
- **Sanitizer exclusions**: Added exclusions for MSVC ARM64 builds
(address/thread/undefined sanitizers not supported)

### Build Fixes
- **Disabled clang-tidy for address sanitizer builds**: Avoids GCC
`-Wno-maybe-uninitialized` flag incompatibility with clang-tidy
- **Disabled cppcheck on Windows**: Fixes `std.cfg` not found error
- **Added `-Wa,-mbig-obj` for MinGW Debug builds**: Fixes COFF section
limit exceeded error (>65535 sections)

### Python/Encoding Fixes
- **Added `PYTHONIOENCODING=utf-8`** to Configure and Test steps for
Windows Unicode support
- **Added `encoding='utf-8'`** to all file operations in
`fetch_templates_and_goldens.py`
- **Added `newline='\n'`** to force Unix line endings in generated files

### Test Fixes
- **Normalize actual template output**: Apply `normalize_newlines()` to
actual output in tests
- **Windows blank line workaround**: Added `collapse_blank_lines()` for
Windows due to a known issue where C++ minja outputs fewer newlines than
Python Jinja2 (tracked in #16)

## Related Issues
- #16 - Windows: C++ minja outputs fewer newlines than Python Jinja2

## Test Plan
- [x] All 28 CI jobs pass (Windows, macOS, Linux with various sanitizers
and build types)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
## Summary

Implements support for DeepSeek V3.2's DSML (Domain Specific Markup
Language) format, superseeds #11 (cc/ @hksdpc255)

DeepSeek V3.2 doesn't provide a Jinja template but uses a custom Python
encoding with DSML format:
```xml
<|DSML|parameter name="key" string="true">value</|DSML|parameter>
```

## Changes

- **Simplified argument needle detection**: Changed from specific
patterns (`"argument_needle":`, `="argument_needle"`) to broader
`"argument_needle"` pattern which matches both JSON keys and DSML
attribute values
- **Local .jinja file support**: Fetch script now handles local `.jinja`
files in MODEL_IDS (for synthetic test templates)
- **Synthetic template**: Added `synthetic-deepseek-v3.2-dsml.jinja`
replicating V3.2's Python encoding logic (from `encoding_dsv32.py`)
- **Integrated testing**: Added synthetic template to MODEL_IDS,
generates 3 test cases (simple, system, tool_use)

## Test plan

- [x] All 248 tests pass
- [x] Capability detection correctly identifies DSML format
(`supports_tool_calls: true`, `requires_object_arguments: true`)
- [x] Synthetic template tests pass for all contexts

Closes #11

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Fixes #16

Looks like `std::regex_replace()` does not respect anchors, at least not
in Windows.

**Minimal reproducing example (Microsoft (R) C/C++ Optimizing Compiler
Version 19.44.35221 for x64)**
```cpp
#include <iostream>
#include <regex>

int main() {
    auto text = "\nthis contains\n\nmultiple\nline\n\nbreaks\n\n";

    std::cout << "== Leading ==\n";
    auto bad = std::regex_replace(text, std::regex(R"(^\s)"), "");
    std::cout << "Bad: " << bad << "\n";
    std::cout << "==\n";

    std::string good = text;
    good.erase(0, good.find_first_not_of(" \t\r\n"));
    std::cout << "Good: " << good << "\n";
    std::cout << "==\n";

    std::cout << "== Trailing ==\n";
    bad = std::regex_replace(text, std::regex(R"(\s$)"), "");
    std::cout << "Bad: " << bad << "\n";
    std::cout << "==\n";

    good = text;
    auto pos = good.find_last_not_of(" \t\n\r\f\v");
    good.resize(pos == std::string::npos ? 0 : pos + 1);
    std::cout << "Good: " << good << "\n";
    std::cout << "==\n";
}
```
```
== Leading ==
Bad: this contains
multiple
line
breaks

==
Good: this contains

multiple
line

breaks


==
== Trailing ==
Bad: 
this contains
multiple
line
breaks
==
Good: 
this contains

multiple
line

breaks
==
```

Passes all the tests, excluding the gated templates I don't have.

```
$ ctest -R test-supported-template -j 24
...
100% tests passed, 0 tests failed out of 220

Total Test time (real) =  32.38 sec

The following tests did not run:
         11 - test-supported-template-google-gemma-7b-it (Skipped)
         12 - test-supported-template-CohereForAI-c4ai-command-r-plus (Skipped)
         14 - test-supported-template-meta-llama-Llama-3.2-3B-Instruct (Skipped)
         15 - test-supported-template-meta-llama-Llama-3.1-8B-Instruct (Skipped)
         16 - test-supported-template-meta-llama-Meta-Llama-3-8B-Instruct (Skipped)
         18 - test-supported-template-meta-llama-Llama-2-7b-chat-hf (Skipped)
         54 - test-supported-template-CohereForAI-aya-expanse-8b (Skipped)
         55 - test-supported-template-databricks-dbrx-instruct (Skipped)
```
- Add supports_thinking flag to detect reasoning_content field support
- Add supports_disable_thinking, supports_reasoning_only, supports_reasoning_with_content flags
- Add reasoning_requires_tools flag for templates that only reason with tools
- Add tests for Qwen3-235B-A22B-Thinking-2507 and GLM-4.6
- Add model IDs: DeepSeek-V3.1, granite-3.3-2b-instruct, GLM-4.7

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…cture

ThinkingPattern detection & polyfills:
- Add polyfill logic to transform reasoning_content to template's native format
- Support for THOUGHT_FIELD (MiniCPM3), THINKING_FIELD (GPT-OSS), TOOL_PLAN_FIELD (Command-R7B)
- Add CONTENT_BLOCK patterns (Ministral/Apertus) with improved detection
- Improved content block detection: reject stringified output by checking for structural markers
- Add supports_clear_thinking detection for templates like GLM-4.7

Test infrastructure:
- Add test metadata (_test_metadata) to context JSON files for template-independent validation
- Add expected_strings/forbidden_strings checks to test-supported-template.cpp
- Support conditional checks: expected_strings_if_supports_thinking, _system_role, _tool_calls, _tool_responses
- Add ThinkingPattern capability tests to test-capabilities.cpp

New reasoning test contexts:
- reasoning_only.json - basic reasoning content
- reasoning_multi_turn.json - multi-turn conversation with reasoning
- reasoning_position_based.json - position-based visibility
- reasoning_clear_thinking.json - clear_thinking flag behavior
- reasoning_with_tools.json - reasoning with tool calls
- reasoning_disabled.json - enable_thinking=false

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add the missing collapse_blank_lines function and regex include
that was lost during the rebase conflict resolution.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The template is already in MODEL_IDS and gets downloaded to build/tests/
during cmake configure. No need to commit it separately.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
API renames for consistency:
- ThinkingPattern → ReasoningFormat
- REASONING_CONTENT_FIELD → REASONING_CONTENT
- thinking_pattern → reasoning_format
- supports_thinking → supports_reasoning
- supports_clear_thinking → supports_reasoning_visibility

New behavior detection probes (computed via template rendering):
- supports_reasoning_without_content: Can emit reasoning with empty content
- supports_reasoning_with_content: Can emit both reasoning and content
- respects_enable_reasoning: Template honors enable_thinking=false

Added tool_plan_reasoning.json test context for TOOL_PLAN_FIELD format.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The name directly matches the input flag (clear_thinking).
… tojson separators

- Rename `requires_typed_content` to `requires_typed_content_blocks` for clarity
- Rename ReasoningFormat enum values:
  - REASONING_CONTENT → REASONING_CONTENT_FIELD
  - CONTENT_BLOCK_THINKING → THINKING_CONTENT_BLOCK
  - CONTENT_BLOCK_THOUGHTS → THOUGHTS_CONTENT_BLOCK
- Add `tojson(separators=...)` support (used by Kimi K2 template)
- Add Kimi K2 (moonshotai/Kimi-K2-Instruct) to test suite
- Add capabilities tests for reasoning_requires_tools behavior
- Add stringification checks to test contexts

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@google-cla
Copy link

google-cla bot commented Dec 30, 2025

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

ochafik added a commit to ochafik/llama.cpp that referenced this pull request Dec 30, 2025
- Rename requires_typed_content → requires_typed_content_blocks
- Rename ReasoningFormat enum values for clarity
- Add tojson(separators=...) support for Kimi K2 template
- Sync from google/minja#89

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@ochafik ochafik closed this Dec 30, 2025
ochafik added a commit to ochafik/llama.cpp that referenced this pull request Dec 30, 2025
- Rename requires_typed_content → requires_typed_content_blocks
- Rename ReasoningFormat enum values for clarity
- Add tojson(separators=...) support for Kimi K2 template
- Sync from google/minja#89

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants