CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

toile is a Python package for working with astrocyte dynamics data. It processes TIFF image stacks (particularly OME-TIFF format) from microscopy recordings and exports them to WebDataset format for machine learning workflows.

Development Commands

Package Management

This project uses uv for dependency management:

# Install dependencies (including dev dependencies)
uv sync --all-extras --dev

# Install package in editable mode
uv pip install -e .

Running Tests

# Run all tests
uv run pytest

# Run tests with coverage
uv run pytest --cov

Building and Publishing

# Build distribution
uv build

# Publish to PyPI (requires UV_PUBLISH_TOKEN)
uv publish

Running the CLI

# As a module
python -m toile

# Using the installed command
toile

# Export TIFF frames to WebDataset format
toile export frames <input> <output> [--stem <name>] [--uint8] [--verbose]

Architecture

Core Data Flow

TIFF Import (tiff_import.py): Loads TIFF stacks from microscopy recordings
- Supports OME-TIFF metadata extraction using tifffile and xmltodict
- Parses frame-level metadata (position, timing, UUIDs)
- Handles multi-channel recordings
- Optional uint8 normalization for ML pipelines
Schema Definitions (schema.py): Data structures built on atdata (a PackableSample framework)
- Movie: Container for full TIFF stacks with metadata
- Frame: Individual image frames with metadata
- SliceRecordingFrame: Experimental session frames with mouse/slice identifiers
- ImageSample: Simplified image data for ML
- Uses atdata.lens for data transformations between representations
Export Pipeline (export.py): Converts TIFF data to WebDataset format
- Processes individual TIFF files or batch configs (YAML)
- Writes to sharded tar archives using webdataset library
- Configurable shard sizes (default 850MB, or 38MB for PDS/Bluesky compatibility)
- Supports glob patterns for batch processing

CLI Structure

The CLI uses Typer for command routing:

Main app in __init__.py with subcommands
export subcommand group in export.py
- export frames: Convert TIFFs to per-frame WebDataset archives
- export test-frames: Generate synthetic test datasets

Key Dependencies

atdata: Data structure framework for packable samples
webdataset: Efficient dataset format for ML pipelines
scikit-image & tifffile: TIFF file handling
typer: CLI framework
xmltodict: OME metadata parsing

Filename Parsing

The tiff_import.py module includes a flexible filename parsing system:

_make_filename_parser(): Creates custom parsers from template + transforms
Built-in transforms: identity, float, int, split_age_sex, date_compact
Used to extract experimental metadata from file naming conventions

Metadata Collation

OME-TIFF metadata is extracted and normalized:

_collate_frame_metadata(): Per-frame position, timing, UUID data
_collate_metadata(): Image-level acquisition date, physical scales, channel info
Metadata flows through Movie → Frame → WebDataset samples

Testing

Tests are located in the tests/ directory (currently empty). When adding tests:

Use pytest as the test framework
Test coverage is enabled with pytest-cov
CI runs tests on push to main and release branches via GitHub Actions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLAUDE.md

Project Overview

Development Commands

Package Management

Running Tests

Building and Publishing

Running the CLI

Architecture

Core Data Flow

CLI Structure

Key Dependencies

Filename Parsing

Metadata Collation

Testing

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

CLAUDE.md

Project Overview

Development Commands

Package Management

Running Tests

Building and Publishing

Running the CLI

Architecture

Core Data Flow

CLI Structure

Key Dependencies

Filename Parsing

Metadata Collation

Testing