Add blog: Tracking and Debugging AI Safety Evaluations with Inspect AI and MLflow by debu-sinha · Pull Request #533 · mlflow/mlflow-website

debu-sinha · 2026-03-21T20:49:11Z

Closes #532

Summary

Blog post covering the inspect-mlflow package (v0.3.1 on PyPI, 42 tests, 82% coverage), which provides MLflow tracking, tracing, and assessment hooks for Inspect AI evaluations.

What's in the post

Problem: debugging Inspect AI evals means reading raw JSON logs
Tracking hook: hierarchical runs, per-sample metrics, token usage, artifacts
Tracing hook: span tree with LLM, TOOL, and EVALUATOR spans
Assessments: eval scores logged as MLflow trace assessments via log_feedback()
Scout import: pull MLflow traces into Scout transcript databases for safety analysis
Architecture diagram showing the data flow (tracking, tracing, and Scout paths)
Calculator tool example demonstrating TOOL spans in agent evaluations
Getting Started with pip install and environment variables

Assets

architecture-diagram.png (matches TruLens blog diagram style)
screenshot-tracking.png (task run with 17 metrics)
screenshot-assessments.png (Traces table with inspect/match assessment column)
screenshot-tracing.png (span tree with solver/scorer hierarchy)
screenshot-llm-detail.png (LLM span with token counts and response)
inspect-mlflow-thumbnail.png (MLflow + AISI branding)

Package details

PyPI: https://pypi.org/project/inspect-mlflow/ (v0.3.1)
Docs: https://inspect-mlflow.readthedocs.io
Tests: 42 tests, 82% code coverage
CI: 4 GitHub Actions workflows + pre-commit
Features: Tracking hook, tracing hook, assessments via log_feedback, Scout import source
Auto-registration: entry points, no code changes needed

Context

4 PRs merged same-day by JJ Allaire (Inspect AI lead, creator of RStudio):
#3433,
#3483,
#3492,
#3548
(+2,401 lines, 0 changes requested)
JJ requested standalone package: #3547
Listed on Inspect AI Extensions page
Referenced in Scout docs (added by JJ)
Combined ecosystem: MLflow 31M + Inspect AI 16M = 47M+ monthly PyPI downloads

cc @B-Step62

…I and MLflow Covers the inspect-mlflow package that provides MLflow tracking and tracing hooks for Inspect AI evaluations. Includes architecture diagram, screenshots of MLflow UI, and tool-using eval example. Closes mlflow#532 Signed-off-by: debu-sinha <debusinha2009@gmail.com>

Signed-off-by: debu-sinha <debusinha2009@gmail.com>

…al line Signed-off-by: debu-sinha <debusinha2009@gmail.com>

Point documentation link to inspect-mlflow.readthedocs.io instead of GitHub repo, and add ReadTheDocs to the Resources section. Signed-off-by: debu-sinha <debusinha2009@gmail.com>

github-actions · 2026-03-23T17:02:43Z

🚀 Netlify Preview Deployed!

Preview URL: https://pr-533--test-mlflow-website.netlify.app

Details

PR: #533
Build Action: https://github.com/mlflow/mlflow-website/actions/runs/23448373945
Deploy Action: https://github.com/mlflow/mlflow-website/actions/runs/23449730413

This preview will be updated automatically on new commits.

The hooks are referenced as examples on the Extensions page, not listed as a package. Updated wording to match what the page shows. Signed-off-by: debu-sinha <debusinha2009@gmail.com>

Changed "requested" to "suggested" for JJ's package proposal (he suggested it in response to our issue, not requested independently). Changed "Inspect AI lead" to "lead developer of Inspect AI" which is verifiable from commit history (2,454 contributions, 3x next highest). Signed-off-by: debu-sinha <debusinha2009@gmail.com>

Add evaluation comparison section, artifact tables and autolog mentions, Vector Institute / NRC Canada contributor credit. Signed-off-by: debu-sinha <debusinha2009@gmail.com>

Trim implementation details (autolog gating, no-scipy claim), label all diagram arrows, fix font sizes for readability at 750px, update annotation contrast for dark mode, add Vector Institute contributor credit in Provenance. Signed-off-by: debu-sinha <debusinha2009@gmail.com>

Signed-off-by: debu-sinha <debusinha2009@gmail.com>

debu-sinha added 5 commits March 21, 2026 16:48

Remove unused SVG source files from blog PR

5a58314

Signed-off-by: debu-sinha <debusinha2009@gmail.com>

Add trace assessments section with screenshot

4c93b6f

Signed-off-by: debu-sinha <debusinha2009@gmail.com>

Address review feedback: buffer text between images, remove promotion…

bb7dbaf

…al line Signed-off-by: debu-sinha <debusinha2009@gmail.com>

Add ReadTheDocs link to inspect-mlflow blog

9ecf43c

Point documentation link to inspect-mlflow.readthedocs.io instead of GitHub repo, and add ReadTheDocs to the Resources section. Signed-off-by: debu-sinha <debusinha2009@gmail.com>

debu-sinha added 6 commits March 23, 2026 13:14

Fix inaccurate Extensions page claim in Provenance

23cce1f

The hooks are referenced as examples on the Extensions page, not listed as a package. Updated wording to match what the page shows. Signed-off-by: debu-sinha <debusinha2009@gmail.com>

Update inspect-mlflow blog with v0.7.0 features

760c2d7

Add evaluation comparison section, artifact tables and autolog mentions, Vector Institute / NRC Canada contributor credit. Signed-off-by: debu-sinha <debusinha2009@gmail.com>

Fix diagram label overlap and legend truncation

a8cdec2

Signed-off-by: debu-sinha <debusinha2009@gmail.com>

Trim implementation details and fix artifacts list consistency

8a291be

Signed-off-by: debu-sinha <debusinha2009@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add blog: Tracking and Debugging AI Safety Evaluations with Inspect AI and MLflow#533

Add blog: Tracking and Debugging AI Safety Evaluations with Inspect AI and MLflow#533
debu-sinha wants to merge 11 commits intomlflow:mainfrom
debu-sinha:blog/inspect-mlflow-integration

debu-sinha commented Mar 21, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

debu-sinha commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's in the post

Assets

Package details

Context

Uh oh!

github-actions bot commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

debu-sinha commented Mar 21, 2026 •

edited

Loading