Skip to content

Data services refactoring: DataStore, VizServer, unified storage patterns #16

@pskeshu

Description

@pskeshu

Background

Spun off from #14 (device layer refactoring). The data services layer needs engineering attention independent of the hardware abstraction work.

Current State

  • DataStore (gently/core/data_store.py): UID-based persistence with multiple backend options (DatabrokerStore, TiledStore)
  • VizServer (gently/visualization/server.py): Serves volumes via HTTP, maintains its own caching
  • ImageManager (gently/agent/image_manager.py): Agent-side data access, bridges DataStore and agent tools

Areas Needing Work

1. DataStore Interface Cleanup

  • Current interface grew organically; some methods are diSPIM-specific
  • Need clear separation between:
    • Core operations (store, retrieve, delete, query)
    • Backend-specific implementations
    • Lineage/provenance tracking

2. Unified Storage Backend

Current state has multiple storage patterns:

  • TIFF files (raw volumes)
  • Zarr (chunked array storage)
  • Databroker (Bluesky event model)
  • In-memory caches

Questions to resolve:

  • Should we standardize on one format for volumes?
  • How do we handle format conversion transparently?
  • What's the right chunking strategy for large volumes?

3. Streaming Access Patterns

For large volumes (200+ slices × 2048 × 2048):

  • Current: Load entire volume into memory
  • Target: Stream slices on demand, memory-map when possible
  • Affects: VizServer slice endpoints, agent analysis tools

4. Garbage Collection / Retention Policies

  • When should old data be cleaned up?
  • Per-session retention vs. global policies
  • User-configurable cleanup (max age, max size, keep N per session)
  • Crash recovery: reconcile index with actual files on disk

Relationship to #14

The SharedMemoryPool from #14 will become a key component here:

  • Pool handles hot data (recently acquired volumes)
  • DataStore indexes all data (hot and cold)
  • VizServer accesses through unified interface

This issue focuses on the DataStore/VizServer side of that integration.

Proposed Tasks

  • Audit current DataStore interface, document what's generic vs. hardware-specific
  • Design unified volume access API (works whether data is in memory, on disk, or remote)
  • Implement streaming/memory-mapped access for large volumes
  • Add configurable retention policies
  • Update VizServer to use new data access patterns
  • Add crash recovery (index reconciliation on startup)

Files Involved

File Role
gently/core/data_store.py Primary data persistence
gently/visualization/server.py Volume serving
gently/agent/image_manager.py Agent data access
gently/core/memory_pool.py SharedMemoryPool (from #14)

cc @subindevs @pskeshu

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions