Skip to content

release: v0.4.0b1#37

Merged
maxine-at-forecast merged 21 commits intomainfrom
release/v0.4.0b1
Feb 26, 2026
Merged

release: v0.4.0b1#37
maxine-at-forecast merged 21 commits intomainfrom
release/v0.4.0b1

Conversation

@maxine-at-forecast
Copy link
Copy Markdown
Contributor

Summary

Test plan

  • 214 tests passing (unit + integration)
  • Lint clean (ruff)
  • Version bumped to 0.4.0b1
  • Lockfile updated

🤖 Generated with Claude Code

maxinelevesque and others added 21 commits February 22, 2026 14:40
… v0.2.1b1

Add forecast-bio/atdata-lexicon as a git submodule at lexicons/ using
HTTPS URL, pinned to the v0.2.1b1 tag. Update all CI checkout steps
to initialize submodules, and document the submodule in CLAUDE.md and
README.md.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ata-lexicon-repo

feat: point lexicons/ at forecast-bio/atdata-lexicon via git submodule
Add science.alt.dataset.sendInteractions POST endpoint that accepts
batches of download, citation, and derivative interaction events.
Validates AT-URIs, interaction types, and optional ISO 8601 timestamps,
then fires analytics events via the existing fire-and-forget infrastructure.

Also extends getEntryStats to surface per-entry interaction counts
(downloads, citations, derivatives) alongside existing view/search metrics.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement index provider registration and the getIndexSkeleton/getIndex
query endpoints following Bluesky's feed generator pattern. Third parties
can register curated dataset index endpoints; the AppView fetches URI
skeletons from them and hydrates entries from the local database.

- Add index_providers table to schema.sql
- Add upsert/query functions in database.py with COLLECTION_TABLE_MAP entry
- Add row_to_index_provider serializer and response models in models.py
- Add getIndexSkeleton, getIndex, listIndexes query endpoints
- Add publishIndex procedure with HTTPS URL validation
- Route science.alt.dataset.index records through ingestion processor
- Add 18 tests covering happy paths, error cases, and ingestion

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ord counts, use UPSERT_FNS dispatch

- row_to_label() and row_to_lens() now include `did` field, consistent
  with row_to_entry() and row_to_schema()
- Extract _fetch_record_counts() helper so query_record_counts() and
  query_analytics_summary() share the same implementation
- Replace if/elif upsert chain in processor.py with db.UPSERT_FNS dict
  lookup; update test_ingestion.py to patch the dict directly
- Use parse_at_uri() in frontend dataset_detail() instead of manual
  string splitting
- Remove duplicate config fixture from test_analytics.py (conftest
  provides it)
- Add sendInteractions edge case tests: missing key, non-dict item,
  boundary at max batch size

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…cal fixes

- Add did field to row_to_label and row_to_lens serializers for
  consistency with row_to_entry, row_to_schema, row_to_index_provider
- Add index_providers to query_active_publishers UNION query
- Fix import ordering in queries.py and procedures.py (PEP 8)
- Add row_to_index_provider unit tests to test_models.py
- Add index_providers table/indexes to integration test expectations
- Remove unnecessary mock patch from test_get_index_skeleton_invalid_uri

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…reaming

Introduces an in-memory broadcast event bus (changestream.py) that the
ingestion processor publishes to after successful upserts/deletes. A new
WebSocket endpoint at /xrpc/science.alt.dataset.subscribeChanges streams
these events to subscribers with cursor-based replay from a bounded buffer.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… display

Add KNOWN_ARRAY_FORMATS constant and ARRAY_FORMAT_LABELS for the six
recognized array format tokens (numpyBytes, parquetBytes, sparseBytes,
structuredBytes, arrowTensor, safetensors). Update row_to_schema() to
surface arrayFormat, dtype, shape, and dimensionNames from the schema
body as top-level fields. Update frontend templates (schema detail,
schemas list, profile, dataset detail) to display format and annotation
info when present. Update MCP server descriptions to mention new formats.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…interactions-endpoint

feat: add sendInteractions XRPC procedure for usage telemetry
…ydration-getindexskeleton

# Conflicts:
#	src/atdata_app/ingestion/processor.py
#	src/atdata_app/xrpc/procedures.py
…etindexskeleton

Add skeleton/hydration pattern for third-party dataset indexes
…hange-stream-subscribechanges

# Conflicts:
#	CHANGELOG.md
…am-subscribechanges

feat: add subscribeChanges WebSocket endpoint for real-time change streaming
…darray-annotations

feat: add array format types and ndarray v1.1.0 annotations
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- SSRF: validate endpoint URLs with DNS resolution and private IP
  blocking at fetch time (queries.py) and ingestion time (database.py)
- Auth: add service auth to sendInteractions endpoint
- Backpressure: track dropped subscribers in ChangeStream instead of
  silently losing events; close WebSocket with code 4000 on drop
- Subscriber limits: cap ChangeStream to 1000 subscribers, reject
  with WebSocket close code 1013 when full
- Replay dedup: track last replayed seq to avoid sending duplicate
  events from both replay buffer and live queue
- Keepalive: fix broken loop structure so timeout re-enters event loop
- Task GC: retain asyncio.Task references to prevent garbage collection
  of fire-and-forget analytics tasks
- Skeleton cap: enforce requested limit on items returned by external
  index providers
- Remove dead timestamp validation code from sendInteractions
- Sanitize error messages to avoid leaking internal URLs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- W2: Cap upstream skeleton response to 1 MiB to prevent memory
  exhaustion from malicious index providers
- W3: Guard query_get_entries with a 100-key limit to prevent
  unbounded OR-clause queries
- W4: Whitelist skeleton item fields to only 'uri', preventing
  injection of unexpected fields by upstream providers
- W6: Validate skeleton cursor passthrough (length cap, no null bytes)
- W8: Validate that sendInteractions datasetUri references
  science.alt.dataset.entry collection specifically
- W14: Prevent javascript:/data: URI XSS in storage URL href by
  only rendering http(s) URLs as clickable links
- W15: Guard template join filter with iterable checks to prevent
  crashes on malformed shape/dimensionNames data
- W16: Add missing ingestion test for index_providers collection
- Harden publishIndex URL validation: reject credentials and fragments

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…l review

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@maxine-at-forecast maxine-at-forecast merged commit 4aa1539 into main Feb 26, 2026
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants