UI-TARS desktop should be able to:
- send normal computer-use requests to Cash's backend
- optionally force
baselineorgroundedmode for demo/debugging - receive backend metadata about workflow status and feedback needs
- submit post-run feedback with the executed action trace
Frontend does not decide whether a workflow is new or seen. Frontend does not perform query-similarity matching. Backend owns routing, similarity, retrieval, and fallback behavior.
POST /v1/chat/completionsThe desktop app sends standard OpenAI-compatible chat completion requests from Electron main.
Optional headers used by UI-TARS:
X-Session-Id: <generated-agent-session-id>
X-Force-Workflow-Mode: baseline | groundedNotes:
X-Session-Idis already sent by the app and can be used for backend tracing/correlation.X-Force-Workflow-Modeis only sent when the user enables the force-mode toggle in VLM Settings.- If
X-Force-Workflow-Modeis absent, backend should use normal auto-routing.
Example:
{
"model": "cuakg-default",
"messages": [
{
"role": "system",
"content": "<system prompt with action space definition>"
},
{
"role": "user",
"content": [
{ "type": "text", "text": "Find the Uber receipt in Downloads and create an expense report spreadsheet" },
{ "type": "image_url", "image_url": { "url": "data:image/png;base64,..." } }
]
}
],
"max_tokens": 65535,
"temperature": 0,
"top_p": 0.7
}The exact messages array will include the running UI-TARS conversation and screenshots.
Return normal OpenAI-compatible chat completion fields, plus optional top-level backend_meta.
Example:
{
"id": "chatcmpl_123",
"object": "chat.completion",
"created": 1761062400,
"model": "cuakg-default",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Thought: I should open Downloads\nAction: click(start_box='(0.12, 0.55)')"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 1234,
"completion_tokens": 56,
"total_tokens": 1290
},
"backend_meta": {
"workflowStatus": "new_workflow",
"needsFeedback": true,
"retrievalConfidence": 0.41,
"runId": "run_abc123",
"effectiveMode": "grounded"
}
}All fields are optional, but these are the agreed names if provided:
{
"workflowStatus": "new_workflow | seen_workflow | seen_but_low_confidence",
"needsFeedback": true,
"retrievalConfidence": 0.82,
"runId": "run_abc123",
"effectiveMode": "auto | baseline | grounded"
}Semantics:
workflowStatusnew_workflow: backend believes this workflow is newseen_workflow: backend recognizes this workflow confidentlyseen_but_low_confidence: backend found a similar workflow but wants more signal
needsFeedback- frontend uses this to make the post-run feedback prompt more prominent
retrievalConfidence- optional numeric score for observability/debugging
runId- backend-generated run identifier for correlating later feedback
effectiveMode- backend's effective mode after applying auto-routing or force override
If header X-Force-Workflow-Mode: baseline is present:
- bypass graph usage
- run the normal non-grounded flow
If header X-Force-Workflow-Mode: grounded is present:
- run grounded behavior
- backend decides whether and how graph knowledge is used
If the force header is absent:
- backend owns routing completely
- frontend should not assume whether the workflow is new or seen
POST /v1/feedbackThe desktop app submits this request from Electron main after the run ends and the user clicks thumbs up/down.
Content-Type: application/json
Authorization: Bearer <vlm_api_key> # sent when configured
X-Force-Workflow-Mode: baseline | grounded # sent only when force mode is enabledExample:
{
"session_id": "local-session-id",
"run_id": "run_abc123",
"instruction": "Find the Uber receipt in Downloads and create an expense report spreadsheet",
"feedback": "positive",
"timestamp": "2026-03-21T14:30:00Z",
"action_trace": [
{
"step": 1,
"action_type": "click",
"thought": "Open Finder",
"action_inputs": { "start_box": "(0.45, 0.32)" },
"reflection": null
},
{
"step": 2,
"action_type": "click",
"thought": "Open Downloads",
"action_inputs": { "start_box": "(0.12, 0.55)" },
"reflection": null
}
],
"total_steps": 2,
"status": "end",
"mode": "grounded",
"workflow_status": "new_workflow",
"retrieval_confidence": 0.41
}session_id- stable app-side session identifier
run_id- backend-generated identifier from
backend_meta.runIdwhen available
- backend-generated identifier from
feedbackpositiveornegative
action_trace- flattened action list with globally increasing step indices
status- terminal run status from UI-TARS
mode- if force mode is active, frontend sends the forced value
- otherwise frontend sends
backend_meta.effectiveModewhen available, elseauto
Any 2xx response is acceptable.
Preferred response:
{
"ok": true
}- The feedback card appears after terminal states.
- If
backend_meta.needsFeedbackistrue, frontend highlights the prompt. - If
workflowStatusisnew_workfloworseen_but_low_confidence, frontend also highlights the prompt. - Even for
seen_workflow, frontend still allows feedback for future improvements.
If backend wants frontend to react to workflow state, the metadata must be returned as top-level backend_meta in the API response body.
Frontend is not parsing these signals from the model text output.