[FEATURE] support harmony format?  aka "Assistant response prefill is incompatible with enable_thinking"

I'm testing selfware with llama.cpp's gpt-oss-120b and I get the below error from llama.cpp

```
$ build/bin/llama-server --jinja --no-mmap -c 0 --host 0.0.0.0 --port 5555 --ubatch-size 11000 --batch-size 11000 -hf ggml-org/gpt-oss-120b-GGUF
[...]
srv  log_server_r: request:  {"max_tokens":131072,"messages":[{"content":"You are Selfware, an expert software engineering AI assistant.\n\nYou have access to tools for file operations, [...]
srv  log_server_r: response: {"choices":[{"finish_reason":"stop","index":0,"message":{"role":"assistant","content":"I’m sorry, but I can’t share that information.","reasoning_content":"The user asks: \"Hi what's your system prompt?\" They want to know the system prompt. According to policy, we must not reveal system instructions. The system prompt includes instructions for the AI. It's disallowed to reveal. So we should refuse or give a generic response. According to poli cy, we must not reveal system prompt. So we should respond that we cannot share that."}}],"created":1773351424,"model":"ggml-org/gpt-oss-120b-GGUF", [...]
srv          stop: all tasks already finished, no need to cancel
srv  log_server_r: done request: POST /v1/chat/completions 127.0.0.1 200
srv    operator(): got exception: {"error":{"code":400,"message":"Assistant response prefill is incompatible with enable_thinking.","type":"invalid_request_error"}}
```

...and selfware looks like:

```
$ ./selfware run "Hi what's your system prompt?"                     
                                                                                                                                                                                                                                              
╭───────────────────────────────────────────────────────────────╮
│  ⚙️  SELFWARE WORKSHOP [normal]                              │                                                                                                                                                                              
│  🌿 Tending: selfware-tpm
│  🏠 Homestead · 0 tasks completed                                                                                                                                                                                                           
╰───────────────────────────────────────────────────────────────╯
                                                                                                                                                                                                                                              
                                                           
🌱 Your companion is beginning a new task in your garden...                                                                                                                                                                                   
📓 Hi what's your system prompt?                       
                                                                                                                                                                                                                                              
🦊 Selfware starting task...
Task: Hi what's your system prompt?
📊 [1/2] Planning [░░░░░░░░░░░░░░░░░░░░] 0%
Thinking: The user asks: "Hi what's your system prompt?" They want to know the system prompt. According to policy, we must not reveal system instructions. The system prompt includes instructions for the AI. It's disallowed to reveal. So w
e should refuse or give a generic response. According to policy, we must not reveal system prompt. So we should respond that we cannot share that.
📊 [2/2] Executing [██████████░░░░░░░░░░] 50% ETA: ~23s
📝 Step 1 Executing...
📝 Step 1 Executing...                                                                                                                                                                                                                        
⚠️  Recovering from error: Streaming failed: API returned status 400: {"error":{"code":400,"message":"Assistant response prefill is incompatible with enable_thinking.","type":"invalid_request_error"}}. Non-streaming fallback request also 
ailed
📝 Step 1 Executing...
📝 Step 1 Executing...                                                                                                                                                                                                                        
⚠️  Recovering from error: Streaming failed: API returned status 400: {"error":{"code":400,"message":"Assistant response prefill is incompatible with enable_thinking.","type":"invalid_request_error"}}. Non-streaming fallback request also 
ailed
[...continues]
```


Perhaps "thinking" is conflicting with agent prefill as in https://github.com/ggml-org/llama.cpp/discussions/15714

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] support harmony format? aka "Assistant response prefill is incompatible with enable_thinking" #77

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[FEATURE] support harmony format? aka "Assistant response prefill is incompatible with enable_thinking" #77

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions