Skip to content

Add blog: Tracking and Debugging AI Safety Evaluations with Inspect AI and MLflow#533

Open
debu-sinha wants to merge 11 commits intomlflow:mainfrom
debu-sinha:blog/inspect-mlflow-integration
Open

Add blog: Tracking and Debugging AI Safety Evaluations with Inspect AI and MLflow#533
debu-sinha wants to merge 11 commits intomlflow:mainfrom
debu-sinha:blog/inspect-mlflow-integration

Conversation

@debu-sinha
Copy link
Copy Markdown
Contributor

@debu-sinha debu-sinha commented Mar 21, 2026

Closes #532

Summary

Blog post covering the inspect-mlflow package (v0.3.1 on PyPI, 42 tests, 82% coverage), which provides MLflow tracking, tracing, and assessment hooks for Inspect AI evaluations.

What's in the post

  • Problem: debugging Inspect AI evals means reading raw JSON logs
  • Tracking hook: hierarchical runs, per-sample metrics, token usage, artifacts
  • Tracing hook: span tree with LLM, TOOL, and EVALUATOR spans
  • Assessments: eval scores logged as MLflow trace assessments via log_feedback()
  • Scout import: pull MLflow traces into Scout transcript databases for safety analysis
  • Architecture diagram showing the data flow (tracking, tracing, and Scout paths)
  • Calculator tool example demonstrating TOOL spans in agent evaluations
  • Getting Started with pip install and environment variables

Assets

  • architecture-diagram.png (matches TruLens blog diagram style)
  • screenshot-tracking.png (task run with 17 metrics)
  • screenshot-assessments.png (Traces table with inspect/match assessment column)
  • screenshot-tracing.png (span tree with solver/scorer hierarchy)
  • screenshot-llm-detail.png (LLM span with token counts and response)
  • inspect-mlflow-thumbnail.png (MLflow + AISI branding)

Package details

Context

  • 4 PRs merged same-day by JJ Allaire (Inspect AI lead, creator of RStudio):
    #3433,
    #3483,
    #3492,
    #3548
    (+2,401 lines, 0 changes requested)
  • JJ requested standalone package: #3547
  • Listed on Inspect AI Extensions page
  • Referenced in Scout docs (added by JJ)
  • Combined ecosystem: MLflow 31M + Inspect AI 16M = 47M+ monthly PyPI downloads

cc @B-Step62

…I and MLflow

Covers the inspect-mlflow package that provides MLflow tracking and
tracing hooks for Inspect AI evaluations. Includes architecture
diagram, screenshots of MLflow UI, and tool-using eval example.

Closes mlflow#532

Signed-off-by: debu-sinha <debusinha2009@gmail.com>
Signed-off-by: debu-sinha <debusinha2009@gmail.com>
Signed-off-by: debu-sinha <debusinha2009@gmail.com>
…al line

Signed-off-by: debu-sinha <debusinha2009@gmail.com>
Point documentation link to inspect-mlflow.readthedocs.io instead of
GitHub repo, and add ReadTheDocs to the Resources section.

Signed-off-by: debu-sinha <debusinha2009@gmail.com>
@github-actions
Copy link
Copy Markdown

🚀 Netlify Preview Deployed!

Preview URL: https://pr-533--test-mlflow-website.netlify.app

Details

PR: #533
Build Action: https://github.com/mlflow/mlflow-website/actions/runs/23448373945
Deploy Action: https://github.com/mlflow/mlflow-website/actions/runs/23449730413

This preview will be updated automatically on new commits.

The hooks are referenced as examples on the Extensions page, not
listed as a package. Updated wording to match what the page shows.

Signed-off-by: debu-sinha <debusinha2009@gmail.com>
Changed "requested" to "suggested" for JJ's package proposal (he
suggested it in response to our issue, not requested independently).
Changed "Inspect AI lead" to "lead developer of Inspect AI" which is
verifiable from commit history (2,454 contributions, 3x next highest).

Signed-off-by: debu-sinha <debusinha2009@gmail.com>
Add evaluation comparison section, artifact tables and autolog
mentions, Vector Institute / NRC Canada contributor credit.

Signed-off-by: debu-sinha <debusinha2009@gmail.com>
Trim implementation details (autolog gating, no-scipy claim),
label all diagram arrows, fix font sizes for readability at
750px, update annotation contrast for dark mode, add Vector
Institute contributor credit in Provenance.

Signed-off-by: debu-sinha <debusinha2009@gmail.com>
Signed-off-by: debu-sinha <debusinha2009@gmail.com>
Signed-off-by: debu-sinha <debusinha2009@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Blog Post Submission: Tracking and Debugging AI Safety Evaluations with Inspect AI and MLflow

1 participant