Add vitrine integration and research session skills by hannesill · Pull Request #30 · hannesill/m4

hannesill · 2026-02-14T02:46:46Z

Summary

Integrate vitrine as an external dependency for live research display (tables, charts, decision cards in browser)
Re-export show() at the m4 package level for convenient access
Add clinical-research-session skill (replaces m4-research) with structured research workflow and provenance tracking
Add m4-api skill with provenance metadata
Add [research] optional extras (scikit-learn, lifelines, statsmodels)
Remove stale benchmark conversation files

(vitrine was developed on this branch and then got moved out in its own repository – that's why there are so many commits :))

Test plan

All 700 tests pass
Ruff linting passes
Verify from m4 import show works in a fresh environment

…rer, redaction stub Build the data layer that converts Python objects into storable card descriptors and artifacts. Fully unit-testable without a server, browser, or network. - CardType enum, CardDescriptor/CardProvenance/DisplayEvent dataclasses - ArtifactStore with Parquet/JSON/image storage and DuckDB-powered table paging - Renderer dispatching DataFrame, str, dict, and fallback types - Redactor stub with full signatures and env var config (pass-through for now) - 92 tests covering all components

Starlette + WebSocket + REST server with artifact-backed card rendering. Python API (show/start/stop/clear/section) auto-starts on first show(). CLI `m4 display` subcommand with port and mode options.

Transform the display server from ephemeral to persistent. show() discovers an existing server via PID file and pushes cards over HTTP. - Add /api/health, /api/command, /api/shutdown endpoints to server - Add Bearer token auth for mutating endpoints - Add PID file management (.server.json) with stale detection - Add _run_standalone() entry point with signal handling - Add client-side discovery: _discover_server(), _health_check() - show/clear/section push via HTTP when remote server is found - Add stop_server() and server_status() public API - CLI: replace --mode with --stop/--status flags - Session directory cleaned up on stop_server()

… filter, sort - Dark/light theme toggle persisted in localStorage - Card collapse/expand, pin toggle, and copy-to-clipboard buttons - Run filter dropdown in header for filtering cards by run_id - Column header sort delegating to server-side REST endpoint - Improved WebSocket reconnect with exponential backoff and dedup - Provenance line now includes dataset alongside source and timestamp - Card count in footer shows visible vs total when filtering

…ering Renderer: - Plotly Figure → JSON artifact with full spec inlined in preview - matplotlib Figure → sanitized SVG artifact with base64 preview - Type detection without hard imports (checks module + class name) - SVG sanitization: strip <script> tags, event attributes, 2MB cap - Title inference from Plotly layout and matplotlib suptitle Frontend: - renderPlotly() with lazy Plotly.js loading on first chart card - renderImage() for matplotlib SVG/PNG display - Plotly point click/selection fires events over WebSocket - Responsive chart resize on window resize - Theme-aware Plotly colors (updates on dark/light toggle) Infrastructure: - Vendor Plotly.js v2.35.2 for offline-first operation - Artifact endpoint serves SVG/PNG with correct Content-Types - 29 new tests covering both renderers and SVG sanitization

…e search/export - Cap table cards at ~10 visible rows with vertical scroll (max-height 360px) - Keep header row sticky during vertical scroll with proper z-index - Add server-side search, sort, pagination, and CSV export endpoints - Add row click detail panel and Copy TSV support

Add blocking show(wait=True) that returns DisplayResponse with user action and optional selected data. Implement event system with on_event() callbacks, request queue (pending_requests/acknowledge), and send-to-agent flow from the browser. Server gains new endpoints for requests, events, and long-poll response resolution. Frontend adds response UI panel, row selection checkboxes, send-to-agent popover, in-place card updates with flash animation, and request badge.

Replace session-directory storage with a RunManager that persists runs across server restarts. Each run gets its own ArtifactStore under display/runs/, enabling cross-run queries, run listing, deletion, and age-based cleanup. Remove the clear functionality in favor of run lifecycle management. - Add RunManager with auto-run creation, card index, request queue - Add CLI commands: --list, --clean for run management - Add REST endpoints: GET /api/runs, DELETE /api/runs/{id} - Update server and Python API to resolve stores via RunManager - Preserve run data on server stop (only PID file cleaned up) - Update frontend to load runs from API and remove clear button

…port scan _find_port() auto-incremented through 7741-7750 with no cross-process guard, so concurrent agents spawned duplicate servers. Fix: - Add fcntl.flock on .server.lock in both _run_standalone() (subprocess entry point) and _ensure_started() (client-side discovery) - Add port-range health scan fallback so servers without PID files are still detected before spawning a new one - _run_standalone() exits early if another server exists (PID file or port scan), preventing the subprocess itself from becoming an orphan - PID file is written inside the lock, eliminating the window where no PID file exists but a server is starting

Add date-grouped run history dropdown with inline delete, metadata bar, auto-select of active run, and run separators in the all-runs view. Remove auth requirement from run delete endpoint (localhost-only with browser confirmation dialog).

Replace browser-native confirm() with a styled in-page confirmation modal for run deletion. Add click-to-edit on the run title in the metadata bar with real-time name validation and a PATCH /api/runs/ {id}/rename backend endpoint.

marked.js and plotly.js were loaded after the card HTML, so inline scripts that convert markdown to HTML would find `typeof marked` undefined and fall through to plain escaped text.

…h workflow Add m4-display skill with full API reference (show, section, interaction patterns, run management, export). Update m4-api and m4-research skills with display cross-references and save-then-show patterns. Add Display Output section to CLAUDE.md establishing behavioral defaults. Add PROVENANCE.yaml for all three skills.

…er UI Implement HTML and JSON export for display runs: - export() in __init__.py replaces NotImplementedError stub with working implementation supporting run_id filtering and format selection - Server adds GET /api/export and /api/runs/{id}/export endpoints - CLI adds --export, --format, and --run flags to `m4 display` - Browser UI adds export dropdown (HTML, JSON, Print) in header and per-run export buttons in the run dropdown - Print-optimized @media styles for clean browser printing

10 field primitives (Dropdown, MultiSelect, Slider, RangeSlider, Checkbox, Toggle, RadioGroup, TextInput, DateRange, NumberInput) grouped via Form class. Supports blocking show(Form([...]), wait=True) returning a DisplayResponse with .values dict, hybrid data+controls cards via controls= parameter on show(), freeze rendering for confirmed forms, and self-contained HTML export.

Full rebrand of the display system: - Package path: m4.display → m4.vitrine - CLI command: m4 display → m4 vitrine - Skill: m4-display → m4-vitrine - Env vars: M4_DISPLAY_* → M4_VITRINE_* - WebSocket messages: display.event → vitrine.event - On-disk data dir: {m4_data}/display/ → {m4_data}/vitrine/ - All user-visible strings, docs, and test assertions updated

Apply the UI.md visual refactor to the live vitrine UI and exported HTML output. - Introduce neobrutalist design tokens, typography, borders, and hard-shadow interactions - Add typed card headers with per-type color backgrounds and icon badges - Update card rendering logic for decision-state header typing and icon mapping - Align export HTML renderer and inline CSS with the new visual system - Add font loading links for Space Grotesk, Inter, and DM Mono

- Fix Plotly figure serialization to handle numpy arrays in customdata fields - Replace native HTML date inputs with custom calendar picker component - Replace native select elements with custom dropdown component - Add comprehensive styling for custom form controls matching neobrutalist design - Add test coverage for Plotly Express customdata serialization

…verage - Add warning logs to _push_remote, _poll_remote_response, get_selection for failure scenarios; differentiate HTTPError vs URLError - Apply snapshot-under-lock pattern to all functions reading module globals - Add __post_init__ validation to all form field types (range, options, defaults) and Form (field name uniqueness) - Add DisplayResponse.CONFIRM/SKIP/TIMEOUT/ERROR constants - Extend DisplayHandle with run_id parameter - Add Plotly spec size cap (5MB) with data array truncation - Add selection persistence (debounced JSON save/load on server restart) - Enhance health endpoint with uptime, version, run_count - Frontend: SECTION badge, Plotly resize debounce, error toasts, blur race fix, aria-labels, select-all banner, constants extraction - Add test_forms.py (50 tests) covering serialization, rendering, blocking flow, controls, export, WebSocket, validation - Add error logging tests, selection persistence tests, export endpoint tests, concurrent blocking tests to existing test files

Reframes the skill description around vitrine's role as the agent's research journal — documenting decisions, rationale, and findings as persistent cards. Adds comprehensive examples for forms, quick actions, hybrid controls, progressive updates, provenance tracking, and the full research session pattern.

Break the 5,451-line monolith into 19 focused files loaded via <link> and <script defer> tags. No build step, no bundler — plain static files served by the existing StaticFiles mount. CSS (6 files): base, cards, tables, forms, neo, print JS (13 files): state, theme, status, runs, export-ui, renderers, plotly, tables, forms, responses, cards, websocket, init - Remove IIFE wrapper; each JS file uses 'use strict' at top level - Integrate selectRun hash update directly (eliminate monkey-patch) - Use addEventListener for Plotly theme re-color (eliminate monkey-patch) - Relocate agentStatusEl and date utilities to state.js

The organizing unit is a study (a research question investigated over one or more conversations), not a run. Storage moves from m4_data/vitrine/ to .vitrine/ at the project root since it's research output, not dataset infrastructure. - RunManager → StudyManager, run_manager.py → study_manager.py - run_id → study across types, renderer, artifacts, server, export, CLI - API routes /api/runs → /api/studies - Frontend: runs.js → studies.js, all DOM IDs and CSS classes updated - Storage: runs.json → studies.json, runs/ → studies/ - Auto-migration moves old storage layout on first access - All tests updated (389 pass)

Studies can now register an output directory for research artifacts (CSVs, scripts, images, etc.). Files are browsable in the live UI, included in JSON/HTML exports, and served via new API endpoints with preview support for tabular, text, and image files.

PID file (.vitrine/.server.json) is now the sole authority for finding a running server. Port scanning across 7741-7750 risked connecting to a different project's server when multiple projects run concurrently.

Add __all__ with all 29 public names and promote DisplayResponse, DisplayEvent to top-level imports so agents can use a single `from m4.vitrine import *` instead of per-module imports.

…vitrine-api Drop the m4- prefix for clearer, more descriptive skill names. /research and /vitrine still work as shorthand triggers via fuzzy matching.

addCard() always called renderForm() for decision cards, even when they already had response_action and response_values from a prior decision. The frozen summary (renderFrozenForm) was only applied as a one-time DOM manipulation at confirm time in sendResponse(), so reloading the page re-rendered the interactive form instead. Now addCard() and updateCard() check for existing response data and render the frozen summary directly. Also adds the missing case 'decision' to updateCard's body re-render switch.

The plotly.js file had an escaped backslash (\!) on line 135 that is invalid JavaScript syntax. This caused the entire file to fail parsing, leaving renderPlotly undefined. Any plotly card silently failed to render — the ReferenceError was swallowed by the WebSocket message handler. Also update research session skill to stop saving redundant .html files for charts (vitrine cards already store the interactive Plotly spec).

Switch the ToC type badge from ! (red) to ✓ (green) when a decision card has been answered, matching the icon transition on the card itself. Also fix detection of already-responded decisions loaded from disk.

- Figures now save to output_dir/plots/ as .png (no .html) - Every vitrine plot card must include a description= parameter

marked.js was lazy-loaded with a race condition where only the first markdown card's callback fired — all subsequent cards stayed stuck with basicMarkdown() which has no GFM table support. Load marked.js eagerly in index.html and add proper table styles to .markdown-body.

Replaces ad-hoc file artifact instructions with a structured script-first workflow (write → run → show) and explicit output directory layout (scripts/, data/, plots/). Scripts are self-contained and independently runnable, making research sessions fully reproducible.

Re-invoke updateStudyMetadataBar after studies fetch completes so metadata shows correctly when the page loads in live mode. Exclude dismissed cards from the table of contents scroll list.

set_status() was ephemeral and not worth the API surface. Replace the status-bar "waiting for response" indicator with an audio chime that plays when a decision card arrives, providing a clearer notification even when the browser tab is in the background.

Dispatch: spawn headless agents for reproduce/report tasks on studies. Reproduce runs execute in a sandboxed copy of the output directory to protect original files. Agent output streams into a vitrine card via stream-json parsing with debounced updates. Action palette: replaces the export dropdown with a ⌘K command palette that surfaces export, dispatch, and print actions. Keyboard navigable. Plotly: fix chart overflow by measuring card-body width explicitly and using ResizeObserver per chart instead of relying on autosize. Also adds export-report and reproduce-study skills.

- Fix response.data() using wrong store in multi-study sessions by prioritizing resolved sel_store over study_manager fallback - ask() now returns typed text when researcher writes a free-text answer instead of clicking a button; placeholder updated in UI - Add progress() context manager for long computations — shows auto-completing/failing status cards via MARKDOWN + replace= - Update clinical-research-session skill to use fig.write_json() for reloadable Plotly plots instead of .png/.html

Adds a `vitrine` entry point independent of the `m4` CLI, with start, stop, restart, and status subcommands.

Refactor agent dispatch from immediate-start to create-then-run: create_agent_card() renders a config form (model, budget, instructions), run_agent() starts the process after researcher review. New REST endpoints (/api/studies/{study}/agents, /api/agents/{card_id}/run, /api/agents/{card_id}). Frontend renders three agent states (config form, running terminal with live timer and auto-scroll, compact completed view). Adds concurrency limit, orphan reconciliation on server restart, cancel with output preservation, and incremental terminal updates without jitter.

- Track token usage (input/output) and cost across agent stream events - Display usage badge (tokens, context %, cost) in agent card headers - Add alive indicator with pulsing dot and rotating thinking messages - Show inactivity warning after 2 min without new output - Add dispatch watchdog to detect dead agent PIDs missed by stream monitor - Move _is_pid_alive to dispatch module for shared access - Add tests for confirm(), wait_for(), ask() timeout, export wrappers, agent REST endpoints, CLI, and dispatch system

- Soft-delete cards: delete button, server-side persistence, undo snackbar with 5s timeout, slide-out/in animations for delete and restore - TOC trash section: collapsed list of deleted cards with restore action - Dismiss animations: fade out/in when hiding/showing cards - Exclude deleted cards from exports, study context, and card counts - Agent card: sync header and terminal pulsing dots via animation-delay - Agent card: move tokens/ctx usage info into terminal alive strip - Remove set_status() references from docs

- Escape file paths in DuckDB SQL queries to prevent injection via single quotes - Escape ILIKE wildcards (%, _) in search to prevent unintended pattern matching - Fix subprocess stderr deadlock by merging stderr into stdout - Guard against None auth token in remote server retry - Validate offset/limit query params (non-negative, bounded) - Lock _event_callbacks and _selections for thread-safe access - Log swallowed exceptions in event polling instead of silent discard - Reject empty/whitespace-only annotations - Fix form export to use actual response values instead of field defaults - Fix ask() to return empty string message instead of falling through to action - Fix _poll_remote_response to return "error" for connection failures (not "timeout") - Use context manager for file lock to prevent descriptor leaks - Strip javascript: URIs in SVG sanitizer - Fix liveMode race: server sends replay_done sentinel, frontend waits for it - Add jitter to WebSocket reconnect backoff - Remove dead code: pinned field, clear(), _current_study, _selection_cooldowns, executeAction(), empty DOMContentLoaded handler, trivial constant tests

…rage - Implement redact_dataframe() and enforce_row_limit() in redaction.py - Extract shared utilities into _utils.py (PID check, path escaping, health check, constants) - Lock module-level state in on_event, _poll_remote_events, list_annotations, stop - Atomic meta.json writes via tempfile + os.replace in study_manager.py - Deduplicate annotation card-lookup in server.py (_get_card_annotations helper) - Fix _sanitize_search regex: SQL comment syntax (--) now correctly rejected - Add tests for _sanitize_search, _update_agent_card, run_agent, _stream_monitor - Replace redaction test stubs with real redact_dataframe/enforce_row_limit tests

The backend changes (dispatch config, paper workspace functions, server whitelist, and dispatch tests) were included in the previous commit. This adds the remaining pieces: - Action palette entry for "Draft Paper" in actions.js - draft-paper SKILL.md with IMRAD structure, auto-generated Methods from decision trail, and supplementary appendices - Server endpoint test for paper task creation - Skills index updated with draft-paper entry

Expand the rotating message list with 12 domain-specific words (Annotating, Auditing, Bootstrapping, Calibrating, Charting, Correlating, Curating, Hypothesizing, Incubating, Pipetting, Stratifying, Titrating, Triaging) and slow rotation to ~30s.

When a researcher selects an option in a decision form, the description for that option is now resolved and displayed in frozen form views (browser + HTML export), study context decisions, and via a new values_detailed property on DisplayResponse.

HTML/HTM files in study output directories now open in a sandboxed iframe instead of displaying raw source code.

The claude CLI refuses to start when it detects the CLAUDECODE environment variable from a parent session. Strip it from the subprocess environment. Also terminate running agents when their card is deleted.

Vitrine (display system) is being extracted into its own package. Removes the full vitrine module, its tests, CLI commands, and dependent skills (draft-paper, export-report, reproduce-study, vitrine-api). Updates package exports and dependencies accordingly.

Strip the agent dispatch configuration from m4.__init__ now that vitrine handles its own dispatch internally. Bump vitrine to latest.

setup-uv@v5 already creates the virtual environment when python-version is specified, causing the explicit uv venv step to fail.

hannesill force-pushed the m4-display branch from 7e50350 to 8bce591 Compare February 14, 2026 02:58

hannesill added 29 commits February 13, 2026 22:05

feat(display): add Phase 1b server, Python API, and CLI command

7231954

Starlette + WebSocket + REST server with artifact-backed card rendering. Python API (show/start/stop/clear/section) auto-starts on first show(). CLI `m4 display` subcommand with port and mode options.

fix(display): load vendored JS in <head> so markdown renders in exports

e22d669

marked.js and plotly.js were loaded after the card HTML, so inline scripts that convert markdown to HTML would find `typeof marked` undefined and fall through to plain escaped text.

Complete vitrine TODOs and align docs, context, and status flow

f11743b

Harden vitrine server lifecycle and status handling

38c8ff9

Polish vitrine file explorer UI and add syntax highlighting

8c8a04d

Restyle dark mode theme and toggle icon

3b2acd4

Remove port-scan fallback from vitrine server discovery

e36d5fa

PID file (.vitrine/.server.json) is now the sole authority for finding a running server. Port scanning across 7741-7750 risked connecting to a different project's server when multiple projects run concurrently.

Export all public vitrine API from __init__.py

bd02ff9

Add __all__ with all 29 public names and promote DisplayResponse, DisplayEvent to top-level imports so agents can use a single `from m4.vitrine import *` instead of per-module imports.

hannesill added 27 commits February 13, 2026 22:05

Rename skills: m4-research → clinical-research-session, m4-vitrine → …

a14cea2

…vitrine-api Drop the m4- prefix for clearer, more descriptive skill names. /research and /vitrine still work as shorthand triggers via fuzzy matching.

Update ToC badge to reflect decision card response state

2f47f9a

Switch the ToC type badge from ! (red) to ✓ (green) when a decision card has been answered, matching the icon transition on the card itself. Also fix detection of already-responded decisions loaded from disk.

Update research skill: save plots to plots/ subdir, require descriptions

9587b59

- Figures now save to output_dir/plots/ as .png (no .html) - Every vitrine plot card must include a description= parameter

Fix vitrine live mode metadata bar and ToC dismissed card handling

ffcf9fc

Re-invoke updateStudyMetadataBar after studies fetch completes so metadata shows correctly when the page loads in live mode. Exclude dismissed cards from the table of contents scroll list.

Add standalone vitrine CLI with restart command

eee79ca

Adds a `vitrine` entry point independent of the `m4` CLI, with start, stop, restart, and status subcommands.

Add inline card title rename and move usage to bottom bar

0c2b4c9

Render HTML files as web pages in file explorer preview

c1d48cf

HTML/HTM files in study output directories now open in a sandboxed iframe instead of displaying raw source code.

Fix agent dispatch: strip CLAUDECODE env var and cancel on card delete

bb0038a

The claude CLI refuses to start when it detects the CLAUDECODE environment variable from a parent session. Strip it from the subprocess environment. Also terminate running agents when their card is deleted.

Remove vitrine dispatch config and update vitrine dependency

5091056

Strip the agent dispatch configuration from m4.__init__ now that vitrine handles its own dispatch internally. Bump vitrine to latest.

Fix CI: remove redundant uv venv step

ac717c7

setup-uv@v5 already creates the virtual environment when python-version is specified, causing the explicit uv venv step to fail.

Fix CI: remove redundant uv venv step from pre-commit workflow

e5e3980

hannesill force-pushed the m4-display branch from 357513c to e5e3980 Compare February 14, 2026 03:09

hannesill merged commit f1f526f into main Feb 14, 2026
6 checks passed

hannesill deleted the m4-display branch February 14, 2026 03:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add vitrine integration and research session skills#30

Add vitrine integration and research session skills#30
hannesill merged 87 commits intomainfrom
m4-display

hannesill commented Feb 14, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hannesill commented Feb 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hannesill commented Feb 14, 2026 •

edited

Loading