Last updated: 2025-12-22
This document is intentionally spec-like: it should be unambiguous enough that two different implementers could build compatible systems and get the same pass/fail results.
We are building an application/component similar to FiftyOne’s embeddings panel (“51”) and Apple’s Embedding Atlas: an interactive embedding scatterplot that stays smooth at very large N.
Key twist: we must support multiple geometries, starting with:
- Euclidean 2D scatter
- Hyperbolic embeddings in the Poincaré disk, with correct hyperbolic navigation and selection semantics (no “Euclidean hacks”)
We will implement this in two layers:
- a naive but accurate implementation (ground truth)
- a high-performance WebGL implementation (candidate)
We will iterate the candidate quickly by continuously comparing it to the reference (correctness harness) and tracking performance regressions (benchmark harness).
Tech direction: WebGL first, then port the candidate path to WebGPU later (keeping the same API and harness).
These are the intended defaults based on the current direction:
- Language: TypeScript
- Candidate renderer: WebGL2
- Future: add a WebGPU candidate behind the same interface
- UI framework: optional (vanilla or React are both acceptable)
If we later change any of these, the API contracts and harness requirements stay the same.
This document specifies a separate module / codebase (a “viz lab”) whose purpose is to build and validate the embedding visualizer across multiple geometries.
The lab must contain:
- A naive reference implementation (correctness-first; may be slow)
- A performance-oriented candidate implementation (fast; must match reference)
- A comparison + benchmarking harness that can prove the candidate is both correct and faster
The lab must support at least two geometry modes:
- Euclidean 2D scatter
- Hyperbolic Poincaré disk scatter with mathematically correct camera pan + hover/picking + lasso selection
The primary requirement is objective verification: the system must be able to say, with evidence, whether the candidate is matching the reference and whether it is improving performance.
The most important part is how we’ll converge on correctness + performance.
We will implement in this order:
- Implement Euclidean rendering + interactions in the simplest way (Canvas2D is fine)
- Implement trace record/replay
- Implement correctness diffs (hover, selection set, view checkpoints)
Exit criteria: Euclidean reference is deterministic under trace replay.
- Implement a WebGL2 renderer and interaction pipeline for Euclidean
- Make it match the Euclidean reference under the harness
Exit criteria: candidate matches reference and is measurably faster on at least one large dataset.
- Implement Poincaré disk transforms + camera navigation semantics
- Implement hyperbolic hover + unprojected lasso selection correctly
- Validate with projection probes + trace replay
Exit criteria: hyperbolic reference is deterministic and “feels right” for pan/zoom (matching common embedding tools, but with correct hyperbolic math).
- Implement WebGL2 hyperbolic path (GPU transforms + fast picking/selection as needed)
- Make it match the hyperbolic reference under the harness
Exit criteria: hyperbolic candidate matches reference and meets performance targets.
- Keep the same contracts/harness
- Add a WebGPU candidate path (feature-flagged), validated against the same reference
We want an interactive embedding visualization component that can handle large point clouds (commonly 100k–20M+ points) and supports multiple geometries.
The intent is not limited to hyperbolic space. In the future, we may support additional geometry/camera models (e.g. spherical projections and other non-Euclidean navigation semantics).
High-performance visualization pipelines (WebGL/WebGPU, GPU shaders, spatial indices, workers) are easy to get fast but hard to get correct, especially when:
- the camera transform is not Euclidean (e.g. hyperbolic pan)
- selection is defined in a different space than rendering (e.g. unprojected lasso)
This lab provides a controlled environment where:
- the naive implementation is treated as “ground truth”
- the optimized implementation is iterated until it matches the ground truth under objective checks
This lab is intended to:
- validate geometry-specific math (hyperbolic now, spherical later)
- validate interaction semantics (hover, lasso, camera)
- measure performance and regressions
- generate replayable interaction traces usable for regression testing
This is not production app code. It is a correctness + performance testbed.
- Provide a framework where an optimized renderer can be iterated repeatedly until:
- it is measurably faster than the naive reference
- it matches the reference outputs within defined tolerances
- Support both:
- Euclidean interactions (pan, zoom, hover, lasso)
- Hyperbolic interactions (correct hyperbolic pan, display zoom, hover, unprojected lasso)
- Make comparisons objective via:
- PNG/screenshot outputs
- numeric projection checks
- interaction trace replay and result comparison
The interaction model should feel like common embedding viewers (e.g. FiftyOne embeddings panel / Embedding Atlas):
- drag to pan
- scroll/wheel to zoom
- lasso/box selection modes
- shift to add to selection (where applicable)
- double-click to clear selection (where applicable)
- Not a production UI/UX application
- Not a dataset management system
- Not a backend service (unless added for CI automation)
For each geometry (Euclidean and Hyperbolic), implement:
- Reference (naive) implementation
- Candidate (optimized) implementation
Both must implement the same interface and semantics.
Must support:
- Side-by-side mode (reference vs candidate)
- A/B toggle mode (swap implementations in the same viewport)
- Visual diff mode (pixel differences)
- Numeric diffs for math and interaction results
Must compute and report:
- FPS / frame times (p50/p95/p99)
- Lasso selection compute time
- Hover/picking compute time
- End-to-end interaction latency under trace replay
Must record and replay:
- pointer events
- wheel events
- modifier key states
- lasso polylines
- resize events
- tool/mode changes (pan vs lasso)
Also support the common “embedding viewer ergonomics” that appear in practice:
- selection add/remove modifiers (shift/cmd)
- double-click to clear
- optional reset-to-fit control
You may rename folders as desired, but keep a clean separation between:
- shared core definitions
- reference impl
- candidate impl
- runner/comparison logic
Suggested layout:
viz-lab/
README.md
src/
core/
types.ts
rng.ts
dataset.ts
view_state.ts
interaction_trace.ts
perf/
timers.ts
stats.ts
comparison/
image_capture.ts
image_diff.ts
numeric_diff.ts
set_diff.ts
math/
euclidean.ts
poincare.ts
selection/
point_in_polygon.ts
impl_reference/
euclidean_reference.ts
hyperbolic_reference.ts
impl_candidate/
euclidean_candidate.ts
hyperbolic_candidate.ts
runner/
scenario_runner.ts
trace_replay.ts
report_writer.ts
ui/ (optional)
app.tsx
panels/
Define a single interface in core/types.ts that both reference and candidate implementations must satisfy.
Each implementation must support:
init(canvas: HTMLCanvasElement, opts: InitOptions): voidsetDataset(dataset: Dataset): voidsetView(view: ViewState): voidgetView(): ViewStaterender(): voidresize(width: number, height: number): voiddestroy(): void
Both must support the same semantics:
pan(deltaX: number, deltaY: number, modifiers: Modifiers): voidzoom(anchorX: number, anchorY: number, wheelDelta: number, modifiers: Modifiers): voidhitTest(screenX: number, screenY: number): HitResult | nulllassoSelect(screenPolyline: Float32Array): SelectionResult
Interaction semantics (explicit, to match common tools):
-
Pan: drag-to-pan. Pan is anchor-invariant: if the user drags with the pointer at screen location
$s$ , the data point under$s$ should remain under$s$ throughout the drag.- Euclidean: this corresponds to a simple translation in data-space.
- Hyperbolic: this must be implemented via a Poincaré-disk isometry update (not Euclidean translation).
- Zoom: scroll/wheel to zoom, also anchor-invariant around the cursor location.
- Selection modes: support at least lasso (polygon) and optionally box selection.
- Selection modifiers: shift adds to selection; cmd/ctrl toggles individual point selection (recommended). Exact bindings can be configurable, but must be recorded in traces.
- Clear selection: double-click clears selection (match common embedding tools).
Given:
- the same dataset (generated with fixed seed)
- the same initial view state
- the same interaction trace
- the same canvas size and device pixel ratio settings
Then:
- selection outputs must match (per defined rules)
- hover outputs must match (per defined rules)
- view state checkpoints must match (within tolerance)
- rendered output must match (within tolerance)
Determinism notes (make these explicit in code):
- Force
devicePixelRatiohandling to be deterministic: either (a) run all comparisons at DPR=1 in headless, or (b) explicitly record DPR in the trace/session config and scale canvases accordingly. - Any randomness (dataset gen, jitter, sampling) must be seeded and recorded.
- For performance benchmarking, determinism is less strict, but trace semantics must match.
Tolerance policy (default):
- Numeric comparisons are authoritative; image diffs are supportive.
- Express projection/view tolerances in screen pixels (e.g.
$\le 0.5$ px p99) and/or in disk coordinates (e.g.$\le 1e{-6}$ in model units) where appropriate.
Implement deterministic dataset generation (seeded PRNG).
Inputs:
seedNlabelCountgeometryMode: "euclidean" | "poincare"
Outputs:
positions: Float32Arraylength2N(x,y)labels: Uint16Array- optional
ids(or implicit indices)
Include at least:
N = 10_000N = 100_000N = 1_000_000- a hyperbolic stress dataset: many points close to disk boundary (e.g. radius in
[0.9, 0.999])
The reference implementation defines the “truth” behavior. It may be slow.
Implement the simplest accurate approach (Canvas2D recommended).
Required behaviors:
- render all points
- Euclidean pan and zoom
- hover: brute-force nearest in screen space
- lasso: brute-force point-in-polygon over all points
This must implement correct hyperbolic navigation.
Separate:
- hyperbolic camera state (an isometry representation)
- display zoom scalar
We will choose a concrete, deterministic isometry representation in code (e.g. Möbius transform parameters) with:
applyCamera(p: vec2_poincare) -> vec2_poincareinvertCamera(p: vec2_poincare) -> vec2_poincare
The critical requirement is that pan/zoom invariants match the Interaction Contract (anchor-invariant pan/zoom).
For each Poincaré point:
- map to an internal model for transforms
- apply isometry
- map back to Poincaré
- apply display zoom and map to screen
- draw
Pan must update the hyperbolic isometry (it is not Euclidean translation). Pan is anchor-invariant as defined above.
- lasso is drawn in screen space (polyline)
- transform lasso vertices back to data-space Poincaré using inverse camera mapping
- brute-force point-in-polygon in data space for all points
Define hover as:
- nearest point in screen pixel distance after projection
- brute force over all points is acceptable
Tie-break rule (make explicit): if multiple points are within
The candidate implementation is optimized for performance but must match reference semantics.
The candidate may use:
- WebGL/WebGPU rendering
- workers
- spatial indices
- shader-based transforms
It must still:
- expose the same interface
- match reference outputs within tolerances
- be benchmarked by the harness
Important: we will not duplicate code per-geometry unless needed. Prefer shared core logic:
- shared trace format + runner
- shared selection set diffing
- shared dataset loaders/generators
- geometry-specific camera/math modules
- renderer-specific code (Canvas2D reference vs WebGL candidate)
For each implementation:
- set dataset + view state
- render
- export image:
canvas.toBlob("image/png")orcanvas.toDataURL("image/png")
- store artifacts with deterministic naming:
artifacts/{geometry}/{dataset}/{impl}/{scenario}/{frameIndex}.png
Use a headless browser test runner to automate exact comparisons:
- open a test page with a deterministic viewport
- load dataset
- replay a trace
- capture screenshots at checkpoints or every N frames
Artifacts:
reference.png,candidate.png,diff.pngmetrics.json,comparison.json
Implement pixel-level comparison:
- same resolution required
- compute:
- max per-channel diff
- MAE or RMSE
- number of pixels over threshold
- generate a
diff.pngheatmap
Tolerances must be explicit and configurable.
Important: WebGL/WebGPU renderers may differ slightly from Canvas2D due to antialiasing and rasterization. The harness should treat image diffs as:
- a strong signal for gross mismatches
- not the sole correctness gate unless we control rendering settings tightly (e.g. disable MSAA, use identical point sprites, lock blending)
Implement:
- trace recording from live UI events
- trace replay with deterministic timing or fixed-step execution
Trace must include:
- pointer events (position, buttons)
- wheel events (delta)
- key modifiers
- mode changes (pan vs lasso)
- resize events
Store traces as JSON:
traces/{geometry}/{name}.json
At a minimum compare:
Selection
- On “lasso end”:
- compare selected indices sets
- save differences (missing/excess indices)
Boundary rule (make explicit): decide whether “point on edge” counts as inside. Whatever we pick, encode it in the reference and document it here.
Hover
- Sample hover at fixed intervals (e.g. every 50ms during replay):
- compare hovered index
- log mismatches
View state checkpoints
- Compare view state:
- start
- after each pan/zoom completion
- end of trace
To isolate math differences quickly:
- choose a fixed set of “probe points” (e.g. 1,000 indices)
- compare
project(point)results (screen x,y):- max error
- p99 error
This is often more diagnostic than pixel diffs alone.
- frame times: p50/p95/p99
- FPS over windows (optional)
- lasso computation time (start → selection result)
- hover computation time (hitTest cost)
Also record (at least in development runs):
- GPU frame time (if available)
- memory usage snapshots (best-effort; optional)
For each dataset size and scenario:
- warm up for a fixed time
- replay trace for fixed duration
- collect metrics
- repeat multiple times
- output JSON summaries
Store artifacts:
artifacts/{geometry}/{dataset}/{impl}/{scenario}/metrics.json
Create standard traces per geometry, recorded once and reused.
- gentle pan + zoom
- aggressive pan
- lasso small region
- lasso large region
- hover scrub across dense region
- gentle hyperbolic pan
- pan near boundary (stress)
- display zoom in/out cycles
- lasso near center
- lasso near boundary
- hover scrub near boundary
Each scenario must be saved and replayable:
traces/euclidean/*.jsontraces/hyperbolic/*.json
All thresholds must be configurable and documented.
Minimum required checks:
- selection set equality (exact unless a boundary tolerance rule is defined)
- hover equality (exact under sampling rule, with a documented tie-break policy)
- view state numeric parameters within tolerance
- image diff below thresholds (max diff, error pixel count, etc.)
Recommended default thresholds (edit to taste):
- Projection probe screen error: p99
$\le 0.5$ px, max$\le 2$ px - View-state numeric tolerance:
$\le 1e{-6}$ in model params or derived screen error$\le 0.5$ px - Image diff: max channel diff
$\le 10$ and error pixels$\le 0.1%$ (only if rendering settings are controlled)
Performance targets (need to be made concrete):
- Define target devices and minimum acceptable interactivity.
- As a starting point, treat these as stretch goals:
- Euclidean: smooth pan/zoom at N=20,000,000
- Hyperbolic: smooth pan/zoom near boundary at N=20,000,000
The harness must report performance numerically so we can compare candidate versions over time.
For each geometry:
- Build and stabilize the reference implementation.
- Generate fixed datasets and record canonical traces.
- Build candidate v1.
- Run verification suite:
- selection diff
- hover diff
- view state diff
- numeric projection diff
- image diff
- If candidate fails:
- use projection probes and diff images to locate mismatches
- correct candidate behavior
- Once candidate passes:
- run performance benchmark suite
- Optimize candidate and repeat verification after each change.
Rule: the candidate should only be considered “better” when it both (a) passes correctness checks and (b) improves performance on the benchmark suite.