Skip to content

Add vitrine integration and research session skills#30

Merged
hannesill merged 87 commits intomainfrom
m4-display
Feb 14, 2026
Merged

Add vitrine integration and research session skills#30
hannesill merged 87 commits intomainfrom
m4-display

Conversation

@hannesill
Copy link
Owner

@hannesill hannesill commented Feb 14, 2026

Summary

  • Integrate vitrine as an external dependency for live research display (tables, charts, decision cards in browser)
  • Re-export show() at the m4 package level for convenient access
  • Add clinical-research-session skill (replaces m4-research) with structured research workflow and provenance tracking
  • Add m4-api skill with provenance metadata
  • Add [research] optional extras (scikit-learn, lifelines, statsmodels)
  • Remove stale benchmark conversation files

(vitrine was developed on this branch and then got moved out in its own repository – that's why there are so many commits :))

Test plan

  • All 700 tests pass
  • Ruff linting passes
  • Verify from m4 import show works in a fresh environment

…rer, redaction stub

Build the data layer that converts Python objects into storable card
descriptors and artifacts. Fully unit-testable without a server, browser,
or network.

- CardType enum, CardDescriptor/CardProvenance/DisplayEvent dataclasses
- ArtifactStore with Parquet/JSON/image storage and DuckDB-powered table paging
- Renderer dispatching DataFrame, str, dict, and fallback types
- Redactor stub with full signatures and env var config (pass-through for now)
- 92 tests covering all components
Starlette + WebSocket + REST server with artifact-backed card rendering.
Python API (show/start/stop/clear/section) auto-starts on first show().
CLI `m4 display` subcommand with port and mode options.
Transform the display server from ephemeral to persistent. show()
discovers an existing server via PID file and pushes cards over HTTP.

- Add /api/health, /api/command, /api/shutdown endpoints to server
- Add Bearer token auth for mutating endpoints
- Add PID file management (.server.json) with stale detection
- Add _run_standalone() entry point with signal handling
- Add client-side discovery: _discover_server(), _health_check()
- show/clear/section push via HTTP when remote server is found
- Add stop_server() and server_status() public API
- CLI: replace --mode with --stop/--status flags
- Session directory cleaned up on stop_server()
… filter, sort

- Dark/light theme toggle persisted in localStorage
- Card collapse/expand, pin toggle, and copy-to-clipboard buttons
- Run filter dropdown in header for filtering cards by run_id
- Column header sort delegating to server-side REST endpoint
- Improved WebSocket reconnect with exponential backoff and dedup
- Provenance line now includes dataset alongside source and timestamp
- Card count in footer shows visible vs total when filtering
…ering

Renderer:
- Plotly Figure → JSON artifact with full spec inlined in preview
- matplotlib Figure → sanitized SVG artifact with base64 preview
- Type detection without hard imports (checks module + class name)
- SVG sanitization: strip <script> tags, event attributes, 2MB cap
- Title inference from Plotly layout and matplotlib suptitle

Frontend:
- renderPlotly() with lazy Plotly.js loading on first chart card
- renderImage() for matplotlib SVG/PNG display
- Plotly point click/selection fires events over WebSocket
- Responsive chart resize on window resize
- Theme-aware Plotly colors (updates on dark/light toggle)

Infrastructure:
- Vendor Plotly.js v2.35.2 for offline-first operation
- Artifact endpoint serves SVG/PNG with correct Content-Types
- 29 new tests covering both renderers and SVG sanitization
…e search/export

- Cap table cards at ~10 visible rows with vertical scroll (max-height 360px)
- Keep header row sticky during vertical scroll with proper z-index
- Add server-side search, sort, pagination, and CSV export endpoints
- Add row click detail panel and Copy TSV support
Add blocking show(wait=True) that returns DisplayResponse with user
action and optional selected data. Implement event system with on_event()
callbacks, request queue (pending_requests/acknowledge), and
send-to-agent flow from the browser. Server gains new endpoints for
requests, events, and long-poll response resolution. Frontend adds
response UI panel, row selection checkboxes, send-to-agent popover,
in-place card updates with flash animation, and request badge.
Replace session-directory storage with a RunManager that persists runs
across server restarts. Each run gets its own ArtifactStore under
display/runs/, enabling cross-run queries, run listing, deletion, and
age-based cleanup. Remove the clear functionality in favor of run
lifecycle management.

- Add RunManager with auto-run creation, card index, request queue
- Add CLI commands: --list, --clean for run management
- Add REST endpoints: GET /api/runs, DELETE /api/runs/{id}
- Update server and Python API to resolve stores via RunManager
- Preserve run data on server stop (only PID file cleaned up)
- Update frontend to load runs from API and remove clear button
…port scan

_find_port() auto-incremented through 7741-7750 with no cross-process
guard, so concurrent agents spawned duplicate servers. Fix:

- Add fcntl.flock on .server.lock in both _run_standalone() (subprocess
  entry point) and _ensure_started() (client-side discovery)
- Add port-range health scan fallback so servers without PID files are
  still detected before spawning a new one
- _run_standalone() exits early if another server exists (PID file or
  port scan), preventing the subprocess itself from becoming an orphan
- PID file is written inside the lock, eliminating the window where no
  PID file exists but a server is starting
Add date-grouped run history dropdown with inline delete, metadata bar,
auto-select of active run, and run separators in the all-runs view.
Remove auth requirement from run delete endpoint (localhost-only with
browser confirmation dialog).
Replace browser-native confirm() with a styled in-page confirmation
modal for run deletion. Add click-to-edit on the run title in the
metadata bar with real-time name validation and a PATCH /api/runs/
{id}/rename backend endpoint.
marked.js and plotly.js were loaded after the card HTML, so inline
scripts that convert markdown to HTML would find `typeof marked`
undefined and fall through to plain escaped text.
…h workflow

Add m4-display skill with full API reference (show, section, interaction
patterns, run management, export). Update m4-api and m4-research skills
with display cross-references and save-then-show patterns. Add Display
Output section to CLAUDE.md establishing behavioral defaults. Add
PROVENANCE.yaml for all three skills.
…er UI

Implement HTML and JSON export for display runs:
- export() in __init__.py replaces NotImplementedError stub with working
  implementation supporting run_id filtering and format selection
- Server adds GET /api/export and /api/runs/{id}/export endpoints
- CLI adds --export, --format, and --run flags to `m4 display`
- Browser UI adds export dropdown (HTML, JSON, Print) in header and
  per-run export buttons in the run dropdown
- Print-optimized @media styles for clean browser printing
10 field primitives (Dropdown, MultiSelect, Slider, RangeSlider,
Checkbox, Toggle, RadioGroup, TextInput, DateRange, NumberInput)
grouped via Form class. Supports blocking show(Form([...]), wait=True)
returning a DisplayResponse with .values dict, hybrid data+controls
cards via controls= parameter on show(), freeze rendering for
confirmed forms, and self-contained HTML export.
Full rebrand of the display system:
- Package path: m4.display → m4.vitrine
- CLI command: m4 display → m4 vitrine
- Skill: m4-display → m4-vitrine
- Env vars: M4_DISPLAY_* → M4_VITRINE_*
- WebSocket messages: display.event → vitrine.event
- On-disk data dir: {m4_data}/display/ → {m4_data}/vitrine/
- All user-visible strings, docs, and test assertions updated
Apply the UI.md visual refactor to the live vitrine UI and exported HTML output.

- Introduce neobrutalist design tokens, typography, borders, and hard-shadow interactions

- Add typed card headers with per-type color backgrounds and icon badges

- Update card rendering logic for decision-state header typing and icon mapping

- Align export HTML renderer and inline CSS with the new visual system

- Add font loading links for Space Grotesk, Inter, and DM Mono
- Fix Plotly figure serialization to handle numpy arrays in customdata fields
- Replace native HTML date inputs with custom calendar picker component
- Replace native select elements with custom dropdown component
- Add comprehensive styling for custom form controls matching neobrutalist design
- Add test coverage for Plotly Express customdata serialization
…verage

- Add warning logs to _push_remote, _poll_remote_response, get_selection
  for failure scenarios; differentiate HTTPError vs URLError
- Apply snapshot-under-lock pattern to all functions reading module globals
- Add __post_init__ validation to all form field types (range, options,
  defaults) and Form (field name uniqueness)
- Add DisplayResponse.CONFIRM/SKIP/TIMEOUT/ERROR constants
- Extend DisplayHandle with run_id parameter
- Add Plotly spec size cap (5MB) with data array truncation
- Add selection persistence (debounced JSON save/load on server restart)
- Enhance health endpoint with uptime, version, run_count
- Frontend: SECTION badge, Plotly resize debounce, error toasts,
  blur race fix, aria-labels, select-all banner, constants extraction
- Add test_forms.py (50 tests) covering serialization, rendering,
  blocking flow, controls, export, WebSocket, validation
- Add error logging tests, selection persistence tests, export endpoint
  tests, concurrent blocking tests to existing test files
Reframes the skill description around vitrine's role as the agent's
research journal — documenting decisions, rationale, and findings as
persistent cards. Adds comprehensive examples for forms, quick actions,
hybrid controls, progressive updates, provenance tracking, and the
full research session pattern.
Break the 5,451-line monolith into 19 focused files loaded via
<link> and <script defer> tags. No build step, no bundler — plain
static files served by the existing StaticFiles mount.

CSS (6 files): base, cards, tables, forms, neo, print
JS (13 files): state, theme, status, runs, export-ui, renderers,
plotly, tables, forms, responses, cards, websocket, init

- Remove IIFE wrapper; each JS file uses 'use strict' at top level
- Integrate selectRun hash update directly (eliminate monkey-patch)
- Use addEventListener for Plotly theme re-color (eliminate monkey-patch)
- Relocate agentStatusEl and date utilities to state.js
The organizing unit is a study (a research question investigated over
one or more conversations), not a run. Storage moves from m4_data/vitrine/
to .vitrine/ at the project root since it's research output, not dataset
infrastructure.

- RunManager → StudyManager, run_manager.py → study_manager.py
- run_id → study across types, renderer, artifacts, server, export, CLI
- API routes /api/runs → /api/studies
- Frontend: runs.js → studies.js, all DOM IDs and CSS classes updated
- Storage: runs.json → studies.json, runs/ → studies/
- Auto-migration moves old storage layout on first access
- All tests updated (389 pass)
Studies can now register an output directory for research artifacts (CSVs,
scripts, images, etc.). Files are browsable in the live UI, included in
JSON/HTML exports, and served via new API endpoints with preview support
for tabular, text, and image files.
PID file (.vitrine/.server.json) is now the sole authority for finding
a running server. Port scanning across 7741-7750 risked connecting to
a different project's server when multiple projects run concurrently.
Add __all__ with all 29 public names and promote DisplayResponse,
DisplayEvent to top-level imports so agents can use a single
`from m4.vitrine import *` instead of per-module imports.
…vitrine-api

Drop the m4- prefix for clearer, more descriptive skill names.
/research and /vitrine still work as shorthand triggers via fuzzy matching.
addCard() always called renderForm() for decision cards, even when
they already had response_action and response_values from a prior
decision. The frozen summary (renderFrozenForm) was only applied as
a one-time DOM manipulation at confirm time in sendResponse(), so
reloading the page re-rendered the interactive form instead.

Now addCard() and updateCard() check for existing response data and
render the frozen summary directly. Also adds the missing
case 'decision' to updateCard's body re-render switch.
The plotly.js file had an escaped backslash (\!) on line 135 that is
invalid JavaScript syntax. This caused the entire file to fail parsing,
leaving renderPlotly undefined. Any plotly card silently failed to render
— the ReferenceError was swallowed by the WebSocket message handler.

Also update research session skill to stop saving redundant .html files
for charts (vitrine cards already store the interactive Plotly spec).
Switch the ToC type badge from ! (red) to ✓ (green) when a decision
card has been answered, matching the icon transition on the card itself.
Also fix detection of already-responded decisions loaded from disk.
- Figures now save to output_dir/plots/ as .png (no .html)
- Every vitrine plot card must include a description= parameter
marked.js was lazy-loaded with a race condition where only the first
markdown card's callback fired — all subsequent cards stayed stuck with
basicMarkdown() which has no GFM table support. Load marked.js eagerly
in index.html and add proper table styles to .markdown-body.
Replaces ad-hoc file artifact instructions with a structured
script-first workflow (write → run → show) and explicit output
directory layout (scripts/, data/, plots/). Scripts are self-contained
and independently runnable, making research sessions fully reproducible.
Re-invoke updateStudyMetadataBar after studies fetch completes so metadata
shows correctly when the page loads in live mode. Exclude dismissed cards
from the table of contents scroll list.
set_status() was ephemeral and not worth the API surface. Replace the
status-bar "waiting for response" indicator with an audio chime that
plays when a decision card arrives, providing a clearer notification
even when the browser tab is in the background.
Dispatch: spawn headless agents for reproduce/report tasks on studies.
Reproduce runs execute in a sandboxed copy of the output directory to
protect original files. Agent output streams into a vitrine card via
stream-json parsing with debounced updates.

Action palette: replaces the export dropdown with a ⌘K command palette
that surfaces export, dispatch, and print actions. Keyboard navigable.

Plotly: fix chart overflow by measuring card-body width explicitly and
using ResizeObserver per chart instead of relying on autosize.

Also adds export-report and reproduce-study skills.
- Fix response.data() using wrong store in multi-study sessions by
  prioritizing resolved sel_store over study_manager fallback
- ask() now returns typed text when researcher writes a free-text
  answer instead of clicking a button; placeholder updated in UI
- Add progress() context manager for long computations — shows
  auto-completing/failing status cards via MARKDOWN + replace=
- Update clinical-research-session skill to use fig.write_json()
  for reloadable Plotly plots instead of .png/.html
Adds a `vitrine` entry point independent of the `m4` CLI, with
start, stop, restart, and status subcommands.
Refactor agent dispatch from immediate-start to create-then-run:
create_agent_card() renders a config form (model, budget, instructions),
run_agent() starts the process after researcher review. New REST endpoints
(/api/studies/{study}/agents, /api/agents/{card_id}/run, /api/agents/{card_id}).

Frontend renders three agent states (config form, running terminal with
live timer and auto-scroll, compact completed view). Adds concurrency
limit, orphan reconciliation on server restart, cancel with output
preservation, and incremental terminal updates without jitter.
- Track token usage (input/output) and cost across agent stream events
- Display usage badge (tokens, context %, cost) in agent card headers
- Add alive indicator with pulsing dot and rotating thinking messages
- Show inactivity warning after 2 min without new output
- Add dispatch watchdog to detect dead agent PIDs missed by stream monitor
- Move _is_pid_alive to dispatch module for shared access
- Add tests for confirm(), wait_for(), ask() timeout, export wrappers,
  agent REST endpoints, CLI, and dispatch system
- Soft-delete cards: delete button, server-side persistence, undo snackbar
  with 5s timeout, slide-out/in animations for delete and restore
- TOC trash section: collapsed list of deleted cards with restore action
- Dismiss animations: fade out/in when hiding/showing cards
- Exclude deleted cards from exports, study context, and card counts
- Agent card: sync header and terminal pulsing dots via animation-delay
- Agent card: move tokens/ctx usage info into terminal alive strip
- Remove set_status() references from docs
- Escape file paths in DuckDB SQL queries to prevent injection via single quotes
- Escape ILIKE wildcards (%, _) in search to prevent unintended pattern matching
- Fix subprocess stderr deadlock by merging stderr into stdout
- Guard against None auth token in remote server retry
- Validate offset/limit query params (non-negative, bounded)
- Lock _event_callbacks and _selections for thread-safe access
- Log swallowed exceptions in event polling instead of silent discard
- Reject empty/whitespace-only annotations
- Fix form export to use actual response values instead of field defaults
- Fix ask() to return empty string message instead of falling through to action
- Fix _poll_remote_response to return "error" for connection failures (not "timeout")
- Use context manager for file lock to prevent descriptor leaks
- Strip javascript: URIs in SVG sanitizer
- Fix liveMode race: server sends replay_done sentinel, frontend waits for it
- Add jitter to WebSocket reconnect backoff
- Remove dead code: pinned field, clear(), _current_study, _selection_cooldowns,
  executeAction(), empty DOMContentLoaded handler, trivial constant tests
…rage

- Implement redact_dataframe() and enforce_row_limit() in redaction.py
- Extract shared utilities into _utils.py (PID check, path escaping, health check, constants)
- Lock module-level state in on_event, _poll_remote_events, list_annotations, stop
- Atomic meta.json writes via tempfile + os.replace in study_manager.py
- Deduplicate annotation card-lookup in server.py (_get_card_annotations helper)
- Fix _sanitize_search regex: SQL comment syntax (--) now correctly rejected
- Add tests for _sanitize_search, _update_agent_card, run_agent, _stream_monitor
- Replace redaction test stubs with real redact_dataframe/enforce_row_limit tests
The backend changes (dispatch config, paper workspace functions, server
whitelist, and dispatch tests) were included in the previous commit.
This adds the remaining pieces:

- Action palette entry for "Draft Paper" in actions.js
- draft-paper SKILL.md with IMRAD structure, auto-generated Methods
  from decision trail, and supplementary appendices
- Server endpoint test for paper task creation
- Skills index updated with draft-paper entry
Expand the rotating message list with 12 domain-specific words
(Annotating, Auditing, Bootstrapping, Calibrating, Charting,
Correlating, Curating, Hypothesizing, Incubating, Pipetting,
Stratifying, Titrating, Triaging) and slow rotation to ~30s.
When a researcher selects an option in a decision form, the description
for that option is now resolved and displayed in frozen form views
(browser + HTML export), study context decisions, and via a new
values_detailed property on DisplayResponse.
HTML/HTM files in study output directories now open in a sandboxed
iframe instead of displaying raw source code.
The claude CLI refuses to start when it detects the CLAUDECODE environment
variable from a parent session. Strip it from the subprocess environment.
Also terminate running agents when their card is deleted.
Vitrine (display system) is being extracted into its own package.
Removes the full vitrine module, its tests, CLI commands, and
dependent skills (draft-paper, export-report, reproduce-study,
vitrine-api). Updates package exports and dependencies accordingly.
Strip the agent dispatch configuration from m4.__init__ now that
vitrine handles its own dispatch internally. Bump vitrine to latest.
setup-uv@v5 already creates the virtual environment when python-version
is specified, causing the explicit uv venv step to fail.
@hannesill hannesill merged commit f1f526f into main Feb 14, 2026
6 checks passed
@hannesill hannesill deleted the m4-display branch February 14, 2026 03:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant