Skip to content

Add OpenClaw blog#536

Open
B-Step62 wants to merge 3 commits intomlflow:mainfrom
B-Step62:openclaw
Open

Add OpenClaw blog#536
B-Step62 wants to merge 3 commits intomlflow:mainfrom
B-Step62:openclaw

Conversation

@B-Step62
Copy link
Copy Markdown
Collaborator

No description provided.

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 23, 2026

🚀 Netlify Preview Deployed!

Preview URL: https://pr-536--test-mlflow-website.netlify.app

Details

PR: #536
Build Action: https://github.com/mlflow/mlflow-website/actions/runs/23468306709
Deploy Action: https://github.com/mlflow/mlflow-website/actions/runs/23468361085

This preview will be updated automatically on new commits.

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

That's it. Start OpenClaw as usual and use it normally. Once the integration is enabled, tracing is automatic. Every agent run generates a trace that gets recorded in your MLflow server. There is no need to modify your skills, tool definitions, or agent configuration.

Open `http://localhost:5000` in your browser and you'll see traces appear as your OpenClaw agent runs.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Can we improve the image resolution slightly?

Traces are useful for debugging, but their real value is closing the feedback loop between you and your agent.

When you review a trace and notice the agent did something well or poorly, you can [record that feedback in MLflow](https://mlflow.org/docs/latest/genai/tracing/collect-user-feedback.html) as a structured annotation on the trace or session (a group of traces sharing a conversation ID). A thumbs-down on a trace that used the wrong tool, a note on a session where the agent missed context from an earlier message. Over time, this builds up a labeled dataset of what your agent gets right and what it gets wrong. That dataset becomes the basis for everything that follows: evaluating new skill versions, tuning prompts, and understanding which types of requests your agent handles reliably.

Copy link
Copy Markdown
Contributor

@dbczumar dbczumar Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we mention that folks can set up recurring / realtime monitoring for openclaw to detect problematic behaviors?

Edit: saw this is mentioned below - not sure why, but I kind of expected it to be mentioned here. a screenshot with some assessments on it would be nice to show.

Copy link
Copy Markdown
Contributor

@dbczumar dbczumar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with 2 small suggestions - thanks @B-Step62 !

@@ -0,0 +1,113 @@
---
title: "From Black Box to Observable: Tracing OpenClaw with MLflow"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

title: "From Black Box to Observability: Tracing OpenClaw with MLflow"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be synonymous with saying from obscurity to clarity.


You might think tracing is only for production systems with SLAs and uptime requirements. But personal agents have their own version of the same problem: you're relying on the agent to do real work for you, and when it gets something wrong, you need to understand what happened so you can fix it.

Consider a few scenarios that are hard to debug without traces. You ask your OpenClaw agent to summarize this week's AI news and draft a brief. The summary is shallow and misses the biggest story. Was the web search tool returning poor results? Did the model ignore relevant results during summarization? Did it hit the context window limit and silently drop content? You ask it to reschedule a meeting based on your calendar, and it picks the wrong time slot. Did the calendar tool return stale data? Did the model misinterpret the constraint you gave it? You won't know from the chat reply alone.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did the model misinterpret the constraint you gave it? These are vital questions. You won't know from the chat reply alone. However, tracing can help you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants