Skip to content

Releases: forecast-bio/atdata

v0.3.2b3

04 Feb 19:12
e908ab5

Choose a tag to compare

v0.3.2b3 Pre-release
Pre-release

Fixed

  • Atmosphere.upload_blob() TypeError: The timeout heuristic passed timeout= to Client.upload_blob() which only accepts (data: bytes). Switched to the namespace method com.atproto.repo.upload_blob() which forwards kwargs through to httpx

Testing

  • ATProto SDK signature compatibility tests: New test_atproto_compat.py with 7 tests that instantiate a real atproto Client (with ClientRaw._invoke patched) to validate method signatures without network I/O. Covers upload_blob, create_record, list_records, get_record, delete_record, and export_session

v0.3.2b2

04 Feb 08:38

Choose a tag to compare

v0.3.2b2 Pre-release
Pre-release

Added

  • Lexicon-mirror type system: StorageHttp, StorageS3, StorageBlobs, BlobEntry, ShardChecksum dataclasses that mirror ATProto lexicon definitions, with storage_from_record() union deserializer
  • ShardUploadResult: Typed return from PDSBlobStore.write_shards() carrying both AT URIs and blob ref dicts
  • Lexicon reference docs: Auto-generated documentation page for the ac.foundation.dataset.* namespace
  • Example docs: dataset-profiler, lens-graph, and query-cookbook with plots and interactive tabsets
  • Typed proxy DSL for manifest queries (foundation-ac #43)

Changed

  • DatasetPublisher refactored: Extracted _create_record() helper, fixing a bug where publish() used dataset.url instead of dataset.list_shards() for multi-shard datasets
  • PDSBlobStore.write_shards() returns ShardUploadResult instead of using a _last_blob_refs side-channel
  • Blob storage uploads: PDS blob uploads now use storageBlobs with embedded blob ref objects instead of string AT URIs in storageExternal, preventing PDS garbage collection of uploaded blobs
  • Replaced lexicon symlinks with real files
  • Guarded redis imports behind TYPE_CHECKING in index/_entry.py and index/_index.py
  • Standardized benchmark outputs to .benchmarks/ directory

Fixed

  • publish() multi-shard bug: was passing single URL instead of full shard list
  • Double-write eliminated in PDSBlobStore
  • Lens-graph example: removed float rounding in calibrate lens that broke law assertions
  • Unused imports and E402 violations in atmosphere module

Testing

  • Strengthened weak mock assertions with argument verification across 4 test files
  • Fixed misleading unicode tests: real emoji (🌍🎉🚀) and CJK characters (日本語テスト, 中文测试, 한국어시험) instead of ASCII placeholders
  • Exact shard count assertions instead of >= 2 bounds
  • Fixed self-referential assertion in test_publish_schema
  • Added content assertions for empty/corrupted shard recovery tests

v0.3.2b1

04 Feb 04:44
3aa426f

Choose a tag to compare

v0.3.2b1 Pre-release
Pre-release

What's Changed

Full Changelog: v0.3.1b1...v0.3.2b1

v0.3.1b1

03 Feb 18:40

Choose a tag to compare

v0.3.1b1 Pre-release
Pre-release

What's Changed

Added

  • Lexicon packaging: ATProto lexicon JSON files bundled in src/atdata/lexicons/ with importlib.resources access via atdata.lexicons.get_lexicon() and list_lexicons()
  • DatasetDict single-split proxy: When a DatasetDict has one split, .ordered(), .shuffled(), .list_shards(), and other Dataset methods are proxied directly
  • write_samples(manifest=True): Opt-in manifest generation during sample writing for query-based access
  • Example documentation: Five executable Quarto example docs covering typed pipelines, lens transforms, manifest queries, index workflows, and multi-split datasets
  • Bounds checking in bytes_to_array() for truncated/corrupted input buffers

Changed

  • AtmosphereClientAtmosphere: Renamed with factory classmethods Atmosphere.login() and Atmosphere.from_env(); AtmosphereClient remains as a deprecated alias
  • sampleSchemaschema: Lexicon record type renamed from ac.foundation.dataset.sampleSchema to ac.foundation.dataset.schema (clean break, no backward compat)
  • Module reorganization: local/ split into index/ (Index, entries, schema management) and stores/ (LocalDiskStore, S3DataStore); local/ remains as backward-compat re-export shim
  • CLI rename: atdata local subcommand renamed to atdata infra
  • Uniform Repository model: Index now treats "local" as a regular Repository, collapsing 3-way routing to 2-way (repo/atmosphere)
  • SampleBatch aggregation uses np.stack() instead of np.array(list(...)) for efficiency
  • Numpy scalar coercion in _make_packable — numpy scalars extracted to Python primitives before msgpack serialization

Fixed

  • Schema round-trip in Index.write() — schemas with NDArray fields now survive publish/decode correctly
  • Duplicate force-include in pyproject.toml causing PyPI wheel upload rejection

Full Changelog: v0.3.0b2...v0.3.1b1

v0.3.0b2

02 Feb 20:35
45b66a4

Choose a tag to compare

v0.3.0b2 Pre-release
Pre-release

What's Changed

New Contributors

Full Changelog: https://github.com/forecast-bio/atdata/commits/v0.3.0b2