Add vitrine integration and research session skills#30
Merged
Conversation
7e50350 to
8bce591
Compare
…rer, redaction stub Build the data layer that converts Python objects into storable card descriptors and artifacts. Fully unit-testable without a server, browser, or network. - CardType enum, CardDescriptor/CardProvenance/DisplayEvent dataclasses - ArtifactStore with Parquet/JSON/image storage and DuckDB-powered table paging - Renderer dispatching DataFrame, str, dict, and fallback types - Redactor stub with full signatures and env var config (pass-through for now) - 92 tests covering all components
Starlette + WebSocket + REST server with artifact-backed card rendering. Python API (show/start/stop/clear/section) auto-starts on first show(). CLI `m4 display` subcommand with port and mode options.
Transform the display server from ephemeral to persistent. show() discovers an existing server via PID file and pushes cards over HTTP. - Add /api/health, /api/command, /api/shutdown endpoints to server - Add Bearer token auth for mutating endpoints - Add PID file management (.server.json) with stale detection - Add _run_standalone() entry point with signal handling - Add client-side discovery: _discover_server(), _health_check() - show/clear/section push via HTTP when remote server is found - Add stop_server() and server_status() public API - CLI: replace --mode with --stop/--status flags - Session directory cleaned up on stop_server()
… filter, sort - Dark/light theme toggle persisted in localStorage - Card collapse/expand, pin toggle, and copy-to-clipboard buttons - Run filter dropdown in header for filtering cards by run_id - Column header sort delegating to server-side REST endpoint - Improved WebSocket reconnect with exponential backoff and dedup - Provenance line now includes dataset alongside source and timestamp - Card count in footer shows visible vs total when filtering
…ering Renderer: - Plotly Figure → JSON artifact with full spec inlined in preview - matplotlib Figure → sanitized SVG artifact with base64 preview - Type detection without hard imports (checks module + class name) - SVG sanitization: strip <script> tags, event attributes, 2MB cap - Title inference from Plotly layout and matplotlib suptitle Frontend: - renderPlotly() with lazy Plotly.js loading on first chart card - renderImage() for matplotlib SVG/PNG display - Plotly point click/selection fires events over WebSocket - Responsive chart resize on window resize - Theme-aware Plotly colors (updates on dark/light toggle) Infrastructure: - Vendor Plotly.js v2.35.2 for offline-first operation - Artifact endpoint serves SVG/PNG with correct Content-Types - 29 new tests covering both renderers and SVG sanitization
…e search/export - Cap table cards at ~10 visible rows with vertical scroll (max-height 360px) - Keep header row sticky during vertical scroll with proper z-index - Add server-side search, sort, pagination, and CSV export endpoints - Add row click detail panel and Copy TSV support
Add blocking show(wait=True) that returns DisplayResponse with user action and optional selected data. Implement event system with on_event() callbacks, request queue (pending_requests/acknowledge), and send-to-agent flow from the browser. Server gains new endpoints for requests, events, and long-poll response resolution. Frontend adds response UI panel, row selection checkboxes, send-to-agent popover, in-place card updates with flash animation, and request badge.
Replace session-directory storage with a RunManager that persists runs
across server restarts. Each run gets its own ArtifactStore under
display/runs/, enabling cross-run queries, run listing, deletion, and
age-based cleanup. Remove the clear functionality in favor of run
lifecycle management.
- Add RunManager with auto-run creation, card index, request queue
- Add CLI commands: --list, --clean for run management
- Add REST endpoints: GET /api/runs, DELETE /api/runs/{id}
- Update server and Python API to resolve stores via RunManager
- Preserve run data on server stop (only PID file cleaned up)
- Update frontend to load runs from API and remove clear button
…port scan _find_port() auto-incremented through 7741-7750 with no cross-process guard, so concurrent agents spawned duplicate servers. Fix: - Add fcntl.flock on .server.lock in both _run_standalone() (subprocess entry point) and _ensure_started() (client-side discovery) - Add port-range health scan fallback so servers without PID files are still detected before spawning a new one - _run_standalone() exits early if another server exists (PID file or port scan), preventing the subprocess itself from becoming an orphan - PID file is written inside the lock, eliminating the window where no PID file exists but a server is starting
Add date-grouped run history dropdown with inline delete, metadata bar, auto-select of active run, and run separators in the all-runs view. Remove auth requirement from run delete endpoint (localhost-only with browser confirmation dialog).
Replace browser-native confirm() with a styled in-page confirmation
modal for run deletion. Add click-to-edit on the run title in the
metadata bar with real-time name validation and a PATCH /api/runs/
{id}/rename backend endpoint.
marked.js and plotly.js were loaded after the card HTML, so inline scripts that convert markdown to HTML would find `typeof marked` undefined and fall through to plain escaped text.
…h workflow Add m4-display skill with full API reference (show, section, interaction patterns, run management, export). Update m4-api and m4-research skills with display cross-references and save-then-show patterns. Add Display Output section to CLAUDE.md establishing behavioral defaults. Add PROVENANCE.yaml for all three skills.
…er UI
Implement HTML and JSON export for display runs:
- export() in __init__.py replaces NotImplementedError stub with working
implementation supporting run_id filtering and format selection
- Server adds GET /api/export and /api/runs/{id}/export endpoints
- CLI adds --export, --format, and --run flags to `m4 display`
- Browser UI adds export dropdown (HTML, JSON, Print) in header and
per-run export buttons in the run dropdown
- Print-optimized @media styles for clean browser printing
10 field primitives (Dropdown, MultiSelect, Slider, RangeSlider, Checkbox, Toggle, RadioGroup, TextInput, DateRange, NumberInput) grouped via Form class. Supports blocking show(Form([...]), wait=True) returning a DisplayResponse with .values dict, hybrid data+controls cards via controls= parameter on show(), freeze rendering for confirmed forms, and self-contained HTML export.
Full rebrand of the display system:
- Package path: m4.display → m4.vitrine
- CLI command: m4 display → m4 vitrine
- Skill: m4-display → m4-vitrine
- Env vars: M4_DISPLAY_* → M4_VITRINE_*
- WebSocket messages: display.event → vitrine.event
- On-disk data dir: {m4_data}/display/ → {m4_data}/vitrine/
- All user-visible strings, docs, and test assertions updated
Apply the UI.md visual refactor to the live vitrine UI and exported HTML output. - Introduce neobrutalist design tokens, typography, borders, and hard-shadow interactions - Add typed card headers with per-type color backgrounds and icon badges - Update card rendering logic for decision-state header typing and icon mapping - Align export HTML renderer and inline CSS with the new visual system - Add font loading links for Space Grotesk, Inter, and DM Mono
- Fix Plotly figure serialization to handle numpy arrays in customdata fields - Replace native HTML date inputs with custom calendar picker component - Replace native select elements with custom dropdown component - Add comprehensive styling for custom form controls matching neobrutalist design - Add test coverage for Plotly Express customdata serialization
…verage - Add warning logs to _push_remote, _poll_remote_response, get_selection for failure scenarios; differentiate HTTPError vs URLError - Apply snapshot-under-lock pattern to all functions reading module globals - Add __post_init__ validation to all form field types (range, options, defaults) and Form (field name uniqueness) - Add DisplayResponse.CONFIRM/SKIP/TIMEOUT/ERROR constants - Extend DisplayHandle with run_id parameter - Add Plotly spec size cap (5MB) with data array truncation - Add selection persistence (debounced JSON save/load on server restart) - Enhance health endpoint with uptime, version, run_count - Frontend: SECTION badge, Plotly resize debounce, error toasts, blur race fix, aria-labels, select-all banner, constants extraction - Add test_forms.py (50 tests) covering serialization, rendering, blocking flow, controls, export, WebSocket, validation - Add error logging tests, selection persistence tests, export endpoint tests, concurrent blocking tests to existing test files
Reframes the skill description around vitrine's role as the agent's research journal — documenting decisions, rationale, and findings as persistent cards. Adds comprehensive examples for forms, quick actions, hybrid controls, progressive updates, provenance tracking, and the full research session pattern.
Break the 5,451-line monolith into 19 focused files loaded via <link> and <script defer> tags. No build step, no bundler — plain static files served by the existing StaticFiles mount. CSS (6 files): base, cards, tables, forms, neo, print JS (13 files): state, theme, status, runs, export-ui, renderers, plotly, tables, forms, responses, cards, websocket, init - Remove IIFE wrapper; each JS file uses 'use strict' at top level - Integrate selectRun hash update directly (eliminate monkey-patch) - Use addEventListener for Plotly theme re-color (eliminate monkey-patch) - Relocate agentStatusEl and date utilities to state.js
The organizing unit is a study (a research question investigated over one or more conversations), not a run. Storage moves from m4_data/vitrine/ to .vitrine/ at the project root since it's research output, not dataset infrastructure. - RunManager → StudyManager, run_manager.py → study_manager.py - run_id → study across types, renderer, artifacts, server, export, CLI - API routes /api/runs → /api/studies - Frontend: runs.js → studies.js, all DOM IDs and CSS classes updated - Storage: runs.json → studies.json, runs/ → studies/ - Auto-migration moves old storage layout on first access - All tests updated (389 pass)
Studies can now register an output directory for research artifacts (CSVs, scripts, images, etc.). Files are browsable in the live UI, included in JSON/HTML exports, and served via new API endpoints with preview support for tabular, text, and image files.
PID file (.vitrine/.server.json) is now the sole authority for finding a running server. Port scanning across 7741-7750 risked connecting to a different project's server when multiple projects run concurrently.
Add __all__ with all 29 public names and promote DisplayResponse, DisplayEvent to top-level imports so agents can use a single `from m4.vitrine import *` instead of per-module imports.
…vitrine-api Drop the m4- prefix for clearer, more descriptive skill names. /research and /vitrine still work as shorthand triggers via fuzzy matching.
addCard() always called renderForm() for decision cards, even when they already had response_action and response_values from a prior decision. The frozen summary (renderFrozenForm) was only applied as a one-time DOM manipulation at confirm time in sendResponse(), so reloading the page re-rendered the interactive form instead. Now addCard() and updateCard() check for existing response data and render the frozen summary directly. Also adds the missing case 'decision' to updateCard's body re-render switch.
The plotly.js file had an escaped backslash (\!) on line 135 that is invalid JavaScript syntax. This caused the entire file to fail parsing, leaving renderPlotly undefined. Any plotly card silently failed to render — the ReferenceError was swallowed by the WebSocket message handler. Also update research session skill to stop saving redundant .html files for charts (vitrine cards already store the interactive Plotly spec).
Switch the ToC type badge from ! (red) to ✓ (green) when a decision card has been answered, matching the icon transition on the card itself. Also fix detection of already-responded decisions loaded from disk.
- Figures now save to output_dir/plots/ as .png (no .html) - Every vitrine plot card must include a description= parameter
marked.js was lazy-loaded with a race condition where only the first markdown card's callback fired — all subsequent cards stayed stuck with basicMarkdown() which has no GFM table support. Load marked.js eagerly in index.html and add proper table styles to .markdown-body.
Replaces ad-hoc file artifact instructions with a structured script-first workflow (write → run → show) and explicit output directory layout (scripts/, data/, plots/). Scripts are self-contained and independently runnable, making research sessions fully reproducible.
Re-invoke updateStudyMetadataBar after studies fetch completes so metadata shows correctly when the page loads in live mode. Exclude dismissed cards from the table of contents scroll list.
set_status() was ephemeral and not worth the API surface. Replace the status-bar "waiting for response" indicator with an audio chime that plays when a decision card arrives, providing a clearer notification even when the browser tab is in the background.
Dispatch: spawn headless agents for reproduce/report tasks on studies. Reproduce runs execute in a sandboxed copy of the output directory to protect original files. Agent output streams into a vitrine card via stream-json parsing with debounced updates. Action palette: replaces the export dropdown with a ⌘K command palette that surfaces export, dispatch, and print actions. Keyboard navigable. Plotly: fix chart overflow by measuring card-body width explicitly and using ResizeObserver per chart instead of relying on autosize. Also adds export-report and reproduce-study skills.
- Fix response.data() using wrong store in multi-study sessions by prioritizing resolved sel_store over study_manager fallback - ask() now returns typed text when researcher writes a free-text answer instead of clicking a button; placeholder updated in UI - Add progress() context manager for long computations — shows auto-completing/failing status cards via MARKDOWN + replace= - Update clinical-research-session skill to use fig.write_json() for reloadable Plotly plots instead of .png/.html
Adds a `vitrine` entry point independent of the `m4` CLI, with start, stop, restart, and status subcommands.
Refactor agent dispatch from immediate-start to create-then-run:
create_agent_card() renders a config form (model, budget, instructions),
run_agent() starts the process after researcher review. New REST endpoints
(/api/studies/{study}/agents, /api/agents/{card_id}/run, /api/agents/{card_id}).
Frontend renders three agent states (config form, running terminal with
live timer and auto-scroll, compact completed view). Adds concurrency
limit, orphan reconciliation on server restart, cancel with output
preservation, and incremental terminal updates without jitter.
- Track token usage (input/output) and cost across agent stream events - Display usage badge (tokens, context %, cost) in agent card headers - Add alive indicator with pulsing dot and rotating thinking messages - Show inactivity warning after 2 min without new output - Add dispatch watchdog to detect dead agent PIDs missed by stream monitor - Move _is_pid_alive to dispatch module for shared access - Add tests for confirm(), wait_for(), ask() timeout, export wrappers, agent REST endpoints, CLI, and dispatch system
- Soft-delete cards: delete button, server-side persistence, undo snackbar with 5s timeout, slide-out/in animations for delete and restore - TOC trash section: collapsed list of deleted cards with restore action - Dismiss animations: fade out/in when hiding/showing cards - Exclude deleted cards from exports, study context, and card counts - Agent card: sync header and terminal pulsing dots via animation-delay - Agent card: move tokens/ctx usage info into terminal alive strip - Remove set_status() references from docs
- Escape file paths in DuckDB SQL queries to prevent injection via single quotes - Escape ILIKE wildcards (%, _) in search to prevent unintended pattern matching - Fix subprocess stderr deadlock by merging stderr into stdout - Guard against None auth token in remote server retry - Validate offset/limit query params (non-negative, bounded) - Lock _event_callbacks and _selections for thread-safe access - Log swallowed exceptions in event polling instead of silent discard - Reject empty/whitespace-only annotations - Fix form export to use actual response values instead of field defaults - Fix ask() to return empty string message instead of falling through to action - Fix _poll_remote_response to return "error" for connection failures (not "timeout") - Use context manager for file lock to prevent descriptor leaks - Strip javascript: URIs in SVG sanitizer - Fix liveMode race: server sends replay_done sentinel, frontend waits for it - Add jitter to WebSocket reconnect backoff - Remove dead code: pinned field, clear(), _current_study, _selection_cooldowns, executeAction(), empty DOMContentLoaded handler, trivial constant tests
…rage - Implement redact_dataframe() and enforce_row_limit() in redaction.py - Extract shared utilities into _utils.py (PID check, path escaping, health check, constants) - Lock module-level state in on_event, _poll_remote_events, list_annotations, stop - Atomic meta.json writes via tempfile + os.replace in study_manager.py - Deduplicate annotation card-lookup in server.py (_get_card_annotations helper) - Fix _sanitize_search regex: SQL comment syntax (--) now correctly rejected - Add tests for _sanitize_search, _update_agent_card, run_agent, _stream_monitor - Replace redaction test stubs with real redact_dataframe/enforce_row_limit tests
The backend changes (dispatch config, paper workspace functions, server whitelist, and dispatch tests) were included in the previous commit. This adds the remaining pieces: - Action palette entry for "Draft Paper" in actions.js - draft-paper SKILL.md with IMRAD structure, auto-generated Methods from decision trail, and supplementary appendices - Server endpoint test for paper task creation - Skills index updated with draft-paper entry
Expand the rotating message list with 12 domain-specific words (Annotating, Auditing, Bootstrapping, Calibrating, Charting, Correlating, Curating, Hypothesizing, Incubating, Pipetting, Stratifying, Titrating, Triaging) and slow rotation to ~30s.
When a researcher selects an option in a decision form, the description for that option is now resolved and displayed in frozen form views (browser + HTML export), study context decisions, and via a new values_detailed property on DisplayResponse.
HTML/HTM files in study output directories now open in a sandboxed iframe instead of displaying raw source code.
The claude CLI refuses to start when it detects the CLAUDECODE environment variable from a parent session. Strip it from the subprocess environment. Also terminate running agents when their card is deleted.
Vitrine (display system) is being extracted into its own package. Removes the full vitrine module, its tests, CLI commands, and dependent skills (draft-paper, export-report, reproduce-study, vitrine-api). Updates package exports and dependencies accordingly.
Strip the agent dispatch configuration from m4.__init__ now that vitrine handles its own dispatch internally. Bump vitrine to latest.
setup-uv@v5 already creates the virtual environment when python-version is specified, causing the explicit uv venv step to fail.
357513c to
e5e3980
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
show()at the m4 package level for convenient accessclinical-research-sessionskill (replacesm4-research) with structured research workflow and provenance trackingm4-apiskill with provenance metadata[research]optional extras (scikit-learn, lifelines, statsmodels)(vitrine was developed on this branch and then got moved out in its own repository – that's why there are so many commits :))
Test plan
from m4 import showworks in a fresh environment