Add blog: Tracking and Debugging AI Safety Evaluations with Inspect AI and MLflow#533
Open
debu-sinha wants to merge 11 commits intomlflow:mainfrom
Open
Add blog: Tracking and Debugging AI Safety Evaluations with Inspect AI and MLflow#533debu-sinha wants to merge 11 commits intomlflow:mainfrom
debu-sinha wants to merge 11 commits intomlflow:mainfrom
Conversation
…I and MLflow Covers the inspect-mlflow package that provides MLflow tracking and tracing hooks for Inspect AI evaluations. Includes architecture diagram, screenshots of MLflow UI, and tool-using eval example. Closes mlflow#532 Signed-off-by: debu-sinha <debusinha2009@gmail.com>
Signed-off-by: debu-sinha <debusinha2009@gmail.com>
Signed-off-by: debu-sinha <debusinha2009@gmail.com>
…al line Signed-off-by: debu-sinha <debusinha2009@gmail.com>
Point documentation link to inspect-mlflow.readthedocs.io instead of GitHub repo, and add ReadTheDocs to the Resources section. Signed-off-by: debu-sinha <debusinha2009@gmail.com>
|
🚀 Netlify Preview Deployed! Preview URL: https://pr-533--test-mlflow-website.netlify.app DetailsPR: #533 This preview will be updated automatically on new commits. |
The hooks are referenced as examples on the Extensions page, not listed as a package. Updated wording to match what the page shows. Signed-off-by: debu-sinha <debusinha2009@gmail.com>
Changed "requested" to "suggested" for JJ's package proposal (he suggested it in response to our issue, not requested independently). Changed "Inspect AI lead" to "lead developer of Inspect AI" which is verifiable from commit history (2,454 contributions, 3x next highest). Signed-off-by: debu-sinha <debusinha2009@gmail.com>
Add evaluation comparison section, artifact tables and autolog mentions, Vector Institute / NRC Canada contributor credit. Signed-off-by: debu-sinha <debusinha2009@gmail.com>
Trim implementation details (autolog gating, no-scipy claim), label all diagram arrows, fix font sizes for readability at 750px, update annotation contrast for dark mode, add Vector Institute contributor credit in Provenance. Signed-off-by: debu-sinha <debusinha2009@gmail.com>
Signed-off-by: debu-sinha <debusinha2009@gmail.com>
Signed-off-by: debu-sinha <debusinha2009@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #532
Summary
Blog post covering the
inspect-mlflowpackage (v0.3.1 on PyPI, 42 tests, 82% coverage), which provides MLflow tracking, tracing, and assessment hooks for Inspect AI evaluations.What's in the post
log_feedback()Assets
architecture-diagram.png(matches TruLens blog diagram style)screenshot-tracking.png(task run with 17 metrics)screenshot-assessments.png(Traces table with inspect/match assessment column)screenshot-tracing.png(span tree with solver/scorer hierarchy)screenshot-llm-detail.png(LLM span with token counts and response)inspect-mlflow-thumbnail.png(MLflow + AISI branding)Package details
Context
#3433,
#3483,
#3492,
#3548
(+2,401 lines, 0 changes requested)
cc @B-Step62