Releases: forecast-bio/atdata
Releases · forecast-bio/atdata
v0.7.0b1
[0.7.0b1] - 2026-02-26
Added
- Client-side AppView integration:
Atmosphereclient now supports XRPC queries (xrpc_query()) and procedures (xrpc_procedure()) routed through a configurable AppView service. Schema, lens, label, and record loaders automatically use AppView for listing, search, and resolution when available, falling back to client-side pagination otherwise. Newhas_appviewproperty andAppViewRequiredError/AppViewUnavailableErrorexceptions for clean error handling (GH#74)
Changed
- AppView-aware loaders/publishers:
SchemaLoader,LabelLoader,DatasetLoader,LensLoaderand their corresponding publishers now prefer AppView XRPC endpoints when configured, with automatic graceful fallback to client-sidecom.atproto.repoworkarounds (GH#50) load_dataset()atmosphere parameter: New optionalatmospherekwarg passes anAtmosphereclient (and its AppView) through to AT URI resolution (GH#50)- AbstractIndex deprecation:
AbstractIndexprotocol is deprecated in favor of directIndexusage; a backward-compatible__getattr__shim emitsDeprecationWarningon import (GH#40) - Unified search API: New
SearchBackendprotocol withLocalSearchBackendandAppViewSearchBackendimplementations,SearchAggregatorfor multi-backend queries, andIndex.search()integration (GH#33) - Lens verification workflow: New
VerificationPublisher/VerificationLoaderforscience.alt.dataset.lensVerificationrecords, withLexCodeHashandLexLensVerificationPython types (GH#34) - Lens schema version compatibility:
LensPublisher.publish()now acceptssource_schema_versionandtarget_schema_versionparameters;LexLensRecordupdated with corresponding fields (GH#34) - Array format support: New serialization helpers for sparse matrices (
scipy), structured arrays, Arrow tensors (pyarrow), safetensors, and DataFrames (pandas/Parquet). Codegen and pipeline updated to recognize new shim$reftypes. Optional dependency groups added topyproject.toml(GH#76) - NDArray v1.1.0 annotations: Schema codegen supports optional
dtype,shape, anddimensionNamesannotation fields from the v1.1.0 ndarray shim (GH#76) - Upstream lexicon sync: Added
lensVerification.json,verificationMethod.json,programmingLanguage.json, and updatedarrayFormat.json/lens.jsonfromforecast-bio/atdata-lexicon
Fixed
- Blob URL parameter bug: Fixed incorrect parameter passing in blob URL construction within atmosphere record publishing
- Fallback logging: Improved diagnostic logging when AppView is unavailable and client-side fallback is used
v0.6.0b1
Added
- Dataset manifest support: Optional
manifestsproperty on dataset entry records for per-shard metadata references, withShardManifestRefandLensCodeRefPython mirror types (GH#62) - Handle-based schema resolution:
get_schema()andget_schema_type()now accept@handle/TypeName@versionformat, resolving schemas by handle + name + optional semver instead of requiring raw AT-URIs (GH#61)
Changed
- Namespace rename: Lexicon namespace renamed from
ac.foundation.datasettoscience.alt.datasetacross all source, tests, and documentation. Lexicon JSON files vendored from forecast-bio/atdata-lexicon with NSID-to-path directory structure. Lexicon loader updated to resolve NSIDs via path traversal. AddedlabelandresolveLabeltoLEXICON_IDS(GH#71) - Lexicon record → entry rename: The dataset record lexicon is renamed from
ac.foundation.dataset.recordtoac.foundation.dataset.entrythroughout the codebase — lexicon files, Python types, collection constants, tests, and documentation (GH#63) - Schema version field rename:
$atdataSchemaVersionrenamed toatdataSchemaVersion(no$prefix) to follow ATProto naming conventions for non-reserved properties (GH#65) - DID resolution refactor: Extracted
Atmosphere.resolve_did()as a public method, deduplicating handle-to-DID resolution across schema, label, and record loaders - CI: Redis and Postgres container images pulled from AWS ECR Public Gallery (
public.ecr.aws/docker/library/) instead of Docker Hub to avoid rate limits; replacedsupercharge/redis-github-actionwith native service containers
Fixed
- Manifest serialization: Manifests field now uses truthiness check consistent with the tags field pattern, preventing empty lists from being serialized as
None
v0.5.1b1
Fixed
- Cross-account record reads:
get_record()andlist_records()now route reads for foreign DIDs through the public AppView instead of the authenticated PDS, fixingRecordNotFounderrors when fetching records published by other users (e.g. schemas fromfoundation.acwhile logged in asmaxine.science)
v0.5.0b1
[0.5.0b1] - 2026-02-07
Added
- Typed content metadata: Datasets can now carry a metadata schema describing dataset-level properties (description, license, creation date) alongside the sample schema (#58)
- Atmosphere label integration:
Index.insert_dataset()publishes a label record alongside the dataset record, enabling@handle/namepath resolution throughget_label()andget_dataset()(#59) - E2E lens integration tests: Comprehensive tests for the lens publish → retrieve → execute lifecycle against a real ATProto PDS (#55)
- E2E session management tests: Integration tests covering ATProto login, session export/import round-trip, unauthenticated reads, and error handling (#57)
- XRPC query workaround docs: Documented client-side
list_records()+ filter patterns used as temporary workarounds pending AppView support (#56)
Changed
- Schema lexicon rename:
getLatestSchemarenamed toresolveSchemato matchresolveLabelsemantics; added$atdataSchemaVersionproperty for format versioning (#53) - Schema API consolidation:
decode_schema,load_schema,decode_schema_as, andschema_to_typeconsolidated intoget_schema_type()with deprecation shims for old names (#54) - Build config cleanup:
.chainlink/and.claude/directories excluded from sdist builds (#54)
Fixed
- Lens discovery pagination:
find_by_schemas()now paginates withlimit=100+ cursor instead of exceeding ATProto'slist_recordscap of 100 - Truncated session test: Rewrote
test_truncated_session_string_raisesto handle the atproto SDK silently accepting truncated sessions via its 4-field backward compat path (MarshalX/atproto#656) - Exception chaining: Proper
raise ... fromexception chaining and best-effort label publish error handling - Content metadata validation: Metadata validated before writing files; dead code removed
- Chainlink db protection: Added git hooks (
.githooks/) to prevent worktree symlinks from overwriting the issues database on merge
v0.4.1b2
Fixed
- Atmosphere schema codec:
_convert_atmosphere_schema()now handles the flattened ATProto wire format whereproperties/requiredare at the top level of theschemadict, not nested under aschemaBodykey. Previously, schemas fetched from a PDS produced types with zero fields.
Full Changelog: https://github.com/forecast-bio/atdata/blob/main/CHANGELOG.md
v0.4.1b1
Fixed
- Atmosphere schema routing:
get_schema()/get_schema_record()now handleat://URI refs by delegating to the atmosphere backend instead of raisingValueError - Blob URL resolution:
AtmosphereIndexEntry.data_urlsresolvesstorageBlobsCIDs to PDS HTTP URLs viaplc.directory(with caching), replacing the empty-list placeholder - Indexed path routing:
load_dataset('@handle/dataset')now passes the full@handle/namepath through toIndex._resolve_prefix()so atmosphere routing works correctly - Atmosphere schema codec:
schema_to_type()handles atmosphere JSON Schema format by converting to local field format before type reconstruction - Structural lens fallback:
Dataset.as_type()falls back to structural field mapping when no registered lens exists between structurally compatible types (e.g. dynamic vs user-defined classes) - Blob shard checksums: SHA-256 digests from
PDSBlobStoreare now attached per-blob inBlobEntry.checksuminstead of being buried inmetadata.custom
Changed
- Checksum extraction: Deduplicated checksum extraction logic into
_extract_blob_checksums()helper with warning on count mismatch - Test nomenclature: Renamed
test_integration_*.py→test_workflow_*.pyacross the test suite - CI workflow triggers: Updated workflow file references to match renamed test files
v0.4.0b2
Added
- Redis live integration tests: 16 tests covering entry CRUD, schema operations, label operations, lens operations, and concurrent access against a real Redis 7 service container
- Redis service in CI: Integration workflow now includes a Redis 7 service container alongside PostgreSQL and MinIO
Full changelog: https://github.com/forecast-bio/atdata/blob/main/CHANGELOG.md
v0.3.4b1
[0.3.4b1] - 2026-02-04
Added
- Content checksums: Per-shard SHA-256 digests computed at write time across all storage backends (
LocalDiskStore,S3DataStore,PDSBlobStore). Checksums are carried viaShardWriteResultand automatically merged into index entry metadata verify_checksums(): Utility function to verify stored checksums against shard files on disk; remote URLs (s3://,at://,http://) are gracefully skippedatdata verifyCLI command: Verify content integrity of indexed datasets from the command line- AT URI support in
load_dataset():load_dataset("at://did:plc:abc/.../rkey")now fetches dataset records from ATProto and resolves storage (blobs, HTTP, S3) into streamable datasets with automatic schema decoding - Lens composition operators:
@(compose) and|(pipe) operators for chaining lenses, plusidentity_lens()factory for pass-through transforms
v0.3.3b2
[0.3.3b2] - 2026-02-04
Testing
- Coverage improvements (92% → 94%): 61 new tests across atmosphere client (swap_commit, model conversion fallbacks), DatasetLoader (HTTP/S3 storage paths,
get_typed,to_dataset, checksum validation), DatasetPublisher (publish_with_s3), Redis/Postgres provider label CRUD, Redis schema edge cases (bytes decoding, legacy format), and lexicon loading/validation
Fixed
- CI: Use
cp -fin bench workflow to avoid interactive prompt on file overwrite
v0.3.3b1
[0.3.3b1] - 2026-02-04
Added
- Dataset labels: Named, versioned pointers to dataset records — separating identity (CID-addressed) from naming (mutable labels).
store_label(),get_label(),list_labels(),delete_label()across all index providers (SQLite, Redis, PostgreSQL) - Atmosphere label records:
LabelPublisherandLabelLoaderfor publishing and resolvingac.foundation.dataset.labelrecords on ATProto PDS, withac.foundation.dataset.resolveLabelquery lexicon - Label-aware
load_dataset(): Path resolution now tries label lookup before falling back to dataset name, enablingload_dataset("@local/mnist")to resolve through labels
Changed
- Git flow: Adopted standard git flow branching model —
developas integration branch,feature/*fromdevelop,release/*cut fromdevelop. Updated/release,/feature,/publish, and/featreeskills accordingly - Worktree chainlink sharing:
/featreenow symlinks.chainlink/issues.dbto the base clone's copy so all worktrees share a single authoritative issue database