Add OpenClaw blog by B-Step62 · Pull Request #536 · mlflow/mlflow-website

B-Step62 · 2026-03-23T17:24:50Z

No description provided.

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

github-actions · 2026-03-23T17:34:37Z

🚀 Netlify Preview Deployed!

Preview URL: https://pr-536--test-mlflow-website.netlify.app

Details

PR: #536
Build Action: https://github.com/mlflow/mlflow-website/actions/runs/23468306709
Deploy Action: https://github.com/mlflow/mlflow-website/actions/runs/23468361085

This preview will be updated automatically on new commits.

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

dbczumar · 2026-03-24T04:34:20Z

website/blog/2026-03-31-openclaw-tracing/index.mdx

+
+That's it. Start OpenClaw as usual and use it normally. Once the integration is enabled, tracing is automatic. Every agent run generates a trace that gets recorded in your MLflow server. There is no need to modify your skills, tool definitions, or agent configuration.
+
+Open `http://localhost:5000` in your browser and you'll see traces appear as your OpenClaw agent runs.


Nit: Can we improve the image resolution slightly?

dbczumar · 2026-03-24T04:35:33Z

website/blog/2026-03-31-openclaw-tracing/index.mdx

+Traces are useful for debugging, but their real value is closing the feedback loop between you and your agent.
+
+When you review a trace and notice the agent did something well or poorly, you can [record that feedback in MLflow](https://mlflow.org/docs/latest/genai/tracing/collect-user-feedback.html) as a structured annotation on the trace or session (a group of traces sharing a conversation ID). A thumbs-down on a trace that used the wrong tool, a note on a session where the agent missed context from an earlier message. Over time, this builds up a labeled dataset of what your agent gets right and what it gets wrong. That dataset becomes the basis for everything that follows: evaluating new skill versions, tuning prompts, and understanding which types of requests your agent handles reliably.
+


Can we mention that folks can set up recurring / realtime monitoring for openclaw to detect problematic behaviors?

Edit: saw this is mentioned below - not sure why, but I kind of expected it to be mentioned here. a screenshot with some assessments on it would be nice to show.

dbczumar

LGTM with 2 small suggestions - thanks @B-Step62 !

dmatrix · 2026-03-25T03:06:44Z

website/blog/2026-03-31-openclaw-tracing/index.mdx

@@ -0,0 +1,113 @@
+---
+title: "From Black Box to Observable: Tracing OpenClaw with MLflow"


title: "From Black Box to Observability: Tracing OpenClaw with MLflow"

This would be synonymous with saying from obscurity to clarity.

dmatrix · 2026-03-25T03:16:14Z

website/blog/2026-03-31-openclaw-tracing/index.mdx

+
+You might think tracing is only for production systems with SLAs and uptime requirements. But personal agents have their own version of the same problem: you're relying on the agent to do real work for you, and when it gets something wrong, you need to understand what happened so you can fix it.
+
+Consider a few scenarios that are hard to debug without traces. You ask your OpenClaw agent to summarize this week's AI news and draft a brief. The summary is shallow and misses the biggest story. Was the web search tool returning poor results? Did the model ignore relevant results during summarization? Did it hit the context window limit and silently drop content? You ask it to reschedule a meeting based on your calendar, and it picks the wrong time slot. Did the calendar tool return stale data? Did the model misinterpret the constraint you gave it? You won't know from the chat reply alone.


Did the model misinterpret the constraint you gave it? These are vital questions. You won't know from the chat reply alone. However, tracing can help you.

B-Step62 added 2 commits March 24, 2026 02:16

Add OpenClaw blog

6095a1c

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

fix

5054e95

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

add

326c39a

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

dbczumar reviewed Mar 24, 2026

View reviewed changes

dbczumar approved these changes Mar 24, 2026

View reviewed changes

dmatrix reviewed Mar 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add OpenClaw blog#536

Add OpenClaw blog#536
B-Step62 wants to merge 3 commits intomlflow:mainfrom
B-Step62:openclaw

B-Step62 commented Mar 23, 2026

Uh oh!

github-actions bot commented Mar 23, 2026 •

edited

Loading

Uh oh!

dbczumar Mar 24, 2026

Uh oh!

dbczumar Mar 24, 2026 •

edited

Loading

Uh oh!

dbczumar left a comment

Uh oh!

dmatrix Mar 25, 2026

Uh oh!

dmatrix Mar 25, 2026

Uh oh!

dmatrix Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		That's it. Start OpenClaw as usual and use it normally. Once the integration is enabled, tracing is automatic. Every agent run generates a trace that gets recorded in your MLflow server. There is no need to modify your skills, tool definitions, or agent configuration.

		Open `http://localhost:5000` in your browser and you'll see traces appear as your OpenClaw agent runs.

		Traces are useful for debugging, but their real value is closing the feedback loop between you and your agent.

		When you review a trace and notice the agent did something well or poorly, you can [record that feedback in MLflow](https://mlflow.org/docs/latest/genai/tracing/collect-user-feedback.html) as a structured annotation on the trace or session (a group of traces sharing a conversation ID). A thumbs-down on a trace that used the wrong tool, a note on a session where the agent missed context from an earlier message. Over time, this builds up a labeled dataset of what your agent gets right and what it gets wrong. That dataset becomes the basis for everything that follows: evaluating new skill versions, tuning prompts, and understanding which types of requests your agent handles reliably.

		@@ -0,0 +1,113 @@
		---
		title: "From Black Box to Observable: Tracing OpenClaw with MLflow"


		You might think tracing is only for production systems with SLAs and uptime requirements. But personal agents have their own version of the same problem: you're relying on the agent to do real work for you, and when it gets something wrong, you need to understand what happened so you can fix it.

		Consider a few scenarios that are hard to debug without traces. You ask your OpenClaw agent to summarize this week's AI news and draft a brief. The summary is shallow and misses the biggest story. Was the web search tool returning poor results? Did the model ignore relevant results during summarization? Did it hit the context window limit and silently drop content? You ask it to reschedule a meeting based on your calendar, and it picks the wrong time slot. Did the calendar tool return stale data? Did the model misinterpret the constraint you gave it? You won't know from the chat reply alone.

Conversation

B-Step62 commented Mar 23, 2026

Uh oh!

github-actions bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dbczumar Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

dbczumar Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dbczumar left a comment

Choose a reason for hiding this comment

Uh oh!

dmatrix Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

dmatrix Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

dmatrix Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Mar 23, 2026 •

edited

Loading

dbczumar Mar 24, 2026 •

edited

Loading