Merged
Conversation
… v0.2.1b1 Add forecast-bio/atdata-lexicon as a git submodule at lexicons/ using HTTPS URL, pinned to the v0.2.1b1 tag. Update all CI checkout steps to initialize submodules, and document the submodule in CLAUDE.md and README.md. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ata-lexicon-repo feat: point lexicons/ at forecast-bio/atdata-lexicon via git submodule
Add science.alt.dataset.sendInteractions POST endpoint that accepts batches of download, citation, and derivative interaction events. Validates AT-URIs, interaction types, and optional ISO 8601 timestamps, then fires analytics events via the existing fire-and-forget infrastructure. Also extends getEntryStats to surface per-entry interaction counts (downloads, citations, derivatives) alongside existing view/search metrics. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement index provider registration and the getIndexSkeleton/getIndex query endpoints following Bluesky's feed generator pattern. Third parties can register curated dataset index endpoints; the AppView fetches URI skeletons from them and hydrates entries from the local database. - Add index_providers table to schema.sql - Add upsert/query functions in database.py with COLLECTION_TABLE_MAP entry - Add row_to_index_provider serializer and response models in models.py - Add getIndexSkeleton, getIndex, listIndexes query endpoints - Add publishIndex procedure with HTTPS URL validation - Route science.alt.dataset.index records through ingestion processor - Add 18 tests covering happy paths, error cases, and ingestion Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ord counts, use UPSERT_FNS dispatch - row_to_label() and row_to_lens() now include `did` field, consistent with row_to_entry() and row_to_schema() - Extract _fetch_record_counts() helper so query_record_counts() and query_analytics_summary() share the same implementation - Replace if/elif upsert chain in processor.py with db.UPSERT_FNS dict lookup; update test_ingestion.py to patch the dict directly - Use parse_at_uri() in frontend dataset_detail() instead of manual string splitting - Remove duplicate config fixture from test_analytics.py (conftest provides it) - Add sendInteractions edge case tests: missing key, non-dict item, boundary at max batch size Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…cal fixes - Add did field to row_to_label and row_to_lens serializers for consistency with row_to_entry, row_to_schema, row_to_index_provider - Add index_providers to query_active_publishers UNION query - Fix import ordering in queries.py and procedures.py (PEP 8) - Add row_to_index_provider unit tests to test_models.py - Add index_providers table/indexes to integration test expectations - Remove unnecessary mock patch from test_get_index_skeleton_invalid_uri Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…reaming Introduces an in-memory broadcast event bus (changestream.py) that the ingestion processor publishes to after successful upserts/deletes. A new WebSocket endpoint at /xrpc/science.alt.dataset.subscribeChanges streams these events to subscribers with cursor-based replay from a bounded buffer. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… display Add KNOWN_ARRAY_FORMATS constant and ARRAY_FORMAT_LABELS for the six recognized array format tokens (numpyBytes, parquetBytes, sparseBytes, structuredBytes, arrowTensor, safetensors). Update row_to_schema() to surface arrayFormat, dtype, shape, and dimensionNames from the schema body as top-level fields. Update frontend templates (schema detail, schemas list, profile, dataset detail) to display format and annotation info when present. Update MCP server descriptions to mention new formats. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…interactions-endpoint feat: add sendInteractions XRPC procedure for usage telemetry
…ydration-getindexskeleton # Conflicts: # src/atdata_app/ingestion/processor.py # src/atdata_app/xrpc/procedures.py
…etindexskeleton Add skeleton/hydration pattern for third-party dataset indexes
…hange-stream-subscribechanges # Conflicts: # CHANGELOG.md
…am-subscribechanges feat: add subscribeChanges WebSocket endpoint for real-time change streaming
…darray-annotations feat: add array format types and ndarray v1.1.0 annotations
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- SSRF: validate endpoint URLs with DNS resolution and private IP blocking at fetch time (queries.py) and ingestion time (database.py) - Auth: add service auth to sendInteractions endpoint - Backpressure: track dropped subscribers in ChangeStream instead of silently losing events; close WebSocket with code 4000 on drop - Subscriber limits: cap ChangeStream to 1000 subscribers, reject with WebSocket close code 1013 when full - Replay dedup: track last replayed seq to avoid sending duplicate events from both replay buffer and live queue - Keepalive: fix broken loop structure so timeout re-enters event loop - Task GC: retain asyncio.Task references to prevent garbage collection of fire-and-forget analytics tasks - Skeleton cap: enforce requested limit on items returned by external index providers - Remove dead timestamp validation code from sendInteractions - Sanitize error messages to avoid leaking internal URLs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- W2: Cap upstream skeleton response to 1 MiB to prevent memory exhaustion from malicious index providers - W3: Guard query_get_entries with a 100-key limit to prevent unbounded OR-clause queries - W4: Whitelist skeleton item fields to only 'uri', preventing injection of unexpected fields by upstream providers - W6: Validate skeleton cursor passthrough (length cap, no null bytes) - W8: Validate that sendInteractions datasetUri references science.alt.dataset.entry collection specifically - W14: Prevent javascript:/data: URI XSS in storage URL href by only rendering http(s) URLs as clickable links - W15: Guard template join filter with iterable checks to prevent crashes on malformed shape/dimensionNames data - W16: Add missing ingestion test for index_providers collection - Harden publishIndex URL validation: reject credentials and fragments Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…l review Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
sendInteractionsXRPC procedure for anonymous usage telemetry (download, citation, derivative events) (Usage telemetry: sendInteractions endpoint #21)getIndexSkeleton,getIndex,listIndexes,publishIndex(Skeleton/hydration for third-party dataset indexes (getIndexSkeleton) #20)subscribeChangesWebSocket endpoint for real-time change streaming with cursor replay (Real-time change stream: subscribeChanges endpoint #22)atdata-lexicongit submodule atlexicons/pinned to v0.2.1b1 (Point lexicon consumption at forecast-bio/atdata-lexicon #27)UPSERT_FNSdispatch dictTest plan
🤖 Generated with Claude Code