Skip to content

Releases: forecast-bio/atdata

v0.7.0b1

26 Feb 18:01

Choose a tag to compare

v0.7.0b1 Pre-release
Pre-release

[0.7.0b1] - 2026-02-26

Added

  • Client-side AppView integration: Atmosphere client now supports XRPC queries (xrpc_query()) and procedures (xrpc_procedure()) routed through a configurable AppView service. Schema, lens, label, and record loaders automatically use AppView for listing, search, and resolution when available, falling back to client-side pagination otherwise. New has_appview property and AppViewRequiredError/AppViewUnavailableError exceptions for clean error handling (GH#74)

Changed

  • AppView-aware loaders/publishers: SchemaLoader, LabelLoader, DatasetLoader, LensLoader and their corresponding publishers now prefer AppView XRPC endpoints when configured, with automatic graceful fallback to client-side com.atproto.repo workarounds (GH#50)
  • load_dataset() atmosphere parameter: New optional atmosphere kwarg passes an Atmosphere client (and its AppView) through to AT URI resolution (GH#50)
  • AbstractIndex deprecation: AbstractIndex protocol is deprecated in favor of direct Index usage; a backward-compatible __getattr__ shim emits DeprecationWarning on import (GH#40)
  • Unified search API: New SearchBackend protocol with LocalSearchBackend and AppViewSearchBackend implementations, SearchAggregator for multi-backend queries, and Index.search() integration (GH#33)
  • Lens verification workflow: New VerificationPublisher/VerificationLoader for science.alt.dataset.lensVerification records, with LexCodeHash and LexLensVerification Python types (GH#34)
  • Lens schema version compatibility: LensPublisher.publish() now accepts source_schema_version and target_schema_version parameters; LexLensRecord updated with corresponding fields (GH#34)
  • Array format support: New serialization helpers for sparse matrices (scipy), structured arrays, Arrow tensors (pyarrow), safetensors, and DataFrames (pandas/Parquet). Codegen and pipeline updated to recognize new shim $ref types. Optional dependency groups added to pyproject.toml (GH#76)
  • NDArray v1.1.0 annotations: Schema codegen supports optional dtype, shape, and dimensionNames annotation fields from the v1.1.0 ndarray shim (GH#76)
  • Upstream lexicon sync: Added lensVerification.json, verificationMethod.json, programmingLanguage.json, and updated arrayFormat.json/lens.json from forecast-bio/atdata-lexicon

Fixed

  • Blob URL parameter bug: Fixed incorrect parameter passing in blob URL construction within atmosphere record publishing
  • Fallback logging: Improved diagnostic logging when AppView is unavailable and client-side fallback is used

v0.6.0b1

22 Feb 22:50

Choose a tag to compare

v0.6.0b1 Pre-release
Pre-release

Added

  • Dataset manifest support: Optional manifests property on dataset entry records for per-shard metadata references, with ShardManifestRef and LensCodeRef Python mirror types (GH#62)
  • Handle-based schema resolution: get_schema() and get_schema_type() now accept @handle/TypeName@version format, resolving schemas by handle + name + optional semver instead of requiring raw AT-URIs (GH#61)

Changed

  • Namespace rename: Lexicon namespace renamed from ac.foundation.dataset to science.alt.dataset across all source, tests, and documentation. Lexicon JSON files vendored from forecast-bio/atdata-lexicon with NSID-to-path directory structure. Lexicon loader updated to resolve NSIDs via path traversal. Added label and resolveLabel to LEXICON_IDS (GH#71)
  • Lexicon record → entry rename: The dataset record lexicon is renamed from ac.foundation.dataset.record to ac.foundation.dataset.entry throughout the codebase — lexicon files, Python types, collection constants, tests, and documentation (GH#63)
  • Schema version field rename: $atdataSchemaVersion renamed to atdataSchemaVersion (no $ prefix) to follow ATProto naming conventions for non-reserved properties (GH#65)
  • DID resolution refactor: Extracted Atmosphere.resolve_did() as a public method, deduplicating handle-to-DID resolution across schema, label, and record loaders
  • CI: Redis and Postgres container images pulled from AWS ECR Public Gallery (public.ecr.aws/docker/library/) instead of Docker Hub to avoid rate limits; replaced supercharge/redis-github-action with native service containers

Fixed

  • Manifest serialization: Manifests field now uses truthiness check consistent with the tags field pattern, preventing empty lists from being serialized as None

v0.5.1b1

17 Feb 03:10

Choose a tag to compare

v0.5.1b1 Pre-release
Pre-release

Fixed

  • Cross-account record reads: get_record() and list_records() now route reads for foreign DIDs through the public AppView instead of the authenticated PDS, fixing RecordNotFound errors when fetching records published by other users (e.g. schemas from foundation.ac while logged in as maxine.science)

v0.5.0b1

08 Feb 00:24

Choose a tag to compare

v0.5.0b1 Pre-release
Pre-release

[0.5.0b1] - 2026-02-07

Added

  • Typed content metadata: Datasets can now carry a metadata schema describing dataset-level properties (description, license, creation date) alongside the sample schema (#58)
  • Atmosphere label integration: Index.insert_dataset() publishes a label record alongside the dataset record, enabling @handle/name path resolution through get_label() and get_dataset() (#59)
  • E2E lens integration tests: Comprehensive tests for the lens publish → retrieve → execute lifecycle against a real ATProto PDS (#55)
  • E2E session management tests: Integration tests covering ATProto login, session export/import round-trip, unauthenticated reads, and error handling (#57)
  • XRPC query workaround docs: Documented client-side list_records() + filter patterns used as temporary workarounds pending AppView support (#56)

Changed

  • Schema lexicon rename: getLatestSchema renamed to resolveSchema to match resolveLabel semantics; added $atdataSchemaVersion property for format versioning (#53)
  • Schema API consolidation: decode_schema, load_schema, decode_schema_as, and schema_to_type consolidated into get_schema_type() with deprecation shims for old names (#54)
  • Build config cleanup: .chainlink/ and .claude/ directories excluded from sdist builds (#54)

Fixed

  • Lens discovery pagination: find_by_schemas() now paginates with limit=100 + cursor instead of exceeding ATProto's list_records cap of 100
  • Truncated session test: Rewrote test_truncated_session_string_raises to handle the atproto SDK silently accepting truncated sessions via its 4-field backward compat path (MarshalX/atproto#656)
  • Exception chaining: Proper raise ... from exception chaining and best-effort label publish error handling
  • Content metadata validation: Metadata validated before writing files; dead code removed
  • Chainlink db protection: Added git hooks (.githooks/) to prevent worktree symlinks from overwriting the issues database on merge

v0.4.1b2

05 Feb 22:20

Choose a tag to compare

v0.4.1b2 Pre-release
Pre-release

Fixed

  • Atmosphere schema codec: _convert_atmosphere_schema() now handles the flattened ATProto wire format where properties/required are at the top level of the schema dict, not nested under a schemaBody key. Previously, schemas fetched from a PDS produced types with zero fields.

Full Changelog: https://github.com/forecast-bio/atdata/blob/main/CHANGELOG.md

v0.4.1b1

05 Feb 20:38

Choose a tag to compare

v0.4.1b1 Pre-release
Pre-release

Fixed

  • Atmosphere schema routing: get_schema() / get_schema_record() now handle at:// URI refs by delegating to the atmosphere backend instead of raising ValueError
  • Blob URL resolution: AtmosphereIndexEntry.data_urls resolves storageBlobs CIDs to PDS HTTP URLs via plc.directory (with caching), replacing the empty-list placeholder
  • Indexed path routing: load_dataset('@handle/dataset') now passes the full @handle/name path through to Index._resolve_prefix() so atmosphere routing works correctly
  • Atmosphere schema codec: schema_to_type() handles atmosphere JSON Schema format by converting to local field format before type reconstruction
  • Structural lens fallback: Dataset.as_type() falls back to structural field mapping when no registered lens exists between structurally compatible types (e.g. dynamic vs user-defined classes)
  • Blob shard checksums: SHA-256 digests from PDSBlobStore are now attached per-blob in BlobEntry.checksum instead of being buried in metadata.custom

Changed

  • Checksum extraction: Deduplicated checksum extraction logic into _extract_blob_checksums() helper with warning on count mismatch
  • Test nomenclature: Renamed test_integration_*.pytest_workflow_*.py across the test suite
  • CI workflow triggers: Updated workflow file references to match renamed test files

v0.4.0b2

05 Feb 04:25

Choose a tag to compare

v0.4.0b2 Pre-release
Pre-release

Added

  • Redis live integration tests: 16 tests covering entry CRUD, schema operations, label operations, lens operations, and concurrent access against a real Redis 7 service container
  • Redis service in CI: Integration workflow now includes a Redis 7 service container alongside PostgreSQL and MinIO

Full changelog: https://github.com/forecast-bio/atdata/blob/main/CHANGELOG.md

v0.3.4b1

04 Feb 22:11
5c9cde8

Choose a tag to compare

v0.3.4b1 Pre-release
Pre-release

[0.3.4b1] - 2026-02-04

Added

  • Content checksums: Per-shard SHA-256 digests computed at write time across all storage backends (LocalDiskStore, S3DataStore, PDSBlobStore). Checksums are carried via ShardWriteResult and automatically merged into index entry metadata
  • verify_checksums(): Utility function to verify stored checksums against shard files on disk; remote URLs (s3://, at://, http://) are gracefully skipped
  • atdata verify CLI command: Verify content integrity of indexed datasets from the command line
  • AT URI support in load_dataset(): load_dataset("at://did:plc:abc/.../rkey") now fetches dataset records from ATProto and resolves storage (blobs, HTTP, S3) into streamable datasets with automatic schema decoding
  • Lens composition operators: @ (compose) and | (pipe) operators for chaining lenses, plus identity_lens() factory for pass-through transforms

v0.3.3b2

04 Feb 20:52
16d267d

Choose a tag to compare

v0.3.3b2 Pre-release
Pre-release

[0.3.3b2] - 2026-02-04

Testing

  • Coverage improvements (92% → 94%): 61 new tests across atmosphere client (swap_commit, model conversion fallbacks), DatasetLoader (HTTP/S3 storage paths, get_typed, to_dataset, checksum validation), DatasetPublisher (publish_with_s3), Redis/Postgres provider label CRUD, Redis schema edge cases (bytes decoding, legacy format), and lexicon loading/validation

Fixed

  • CI: Use cp -f in bench workflow to avoid interactive prompt on file overwrite

v0.3.3b1

04 Feb 20:34
73c404f

Choose a tag to compare

v0.3.3b1 Pre-release
Pre-release

[0.3.3b1] - 2026-02-04

Added

  • Dataset labels: Named, versioned pointers to dataset records — separating identity (CID-addressed) from naming (mutable labels). store_label(), get_label(), list_labels(), delete_label() across all index providers (SQLite, Redis, PostgreSQL)
  • Atmosphere label records: LabelPublisher and LabelLoader for publishing and resolving ac.foundation.dataset.label records on ATProto PDS, with ac.foundation.dataset.resolveLabel query lexicon
  • Label-aware load_dataset(): Path resolution now tries label lookup before falling back to dataset name, enabling load_dataset("@local/mnist") to resolve through labels

Changed

  • Git flow: Adopted standard git flow branching model — develop as integration branch, feature/* from develop, release/* cut from develop. Updated /release, /feature, /publish, and /featree skills accordingly
  • Worktree chainlink sharing: /featree now symlinks .chainlink/issues.db to the base clone's copy so all worktrees share a single authoritative issue database