Authoritative ATProto lexicon definitions for the science.alt.dataset namespace -- the protocol-level schema for federated scientific datasets.
These lexicons define the record types, queries, and extensible tokens used by the atdata dataset federation protocol. They are independent of any specific language implementation.
See docs/spec.md for the full lexicon inventory, record relationships, and versioning policy.
Follows the ATProto convention of mapping NSIDs to directory paths:
lexicons/
science/
alt/
dataset/
entry.json # science.alt.dataset.entry
schema.json # science.alt.dataset.schema
...
schemas/
ndarray_shim.json # JSON Schema (not an ATProto lexicon)
Raw git reference (recommended for most consumers):
git clone https://github.com/forecast-bio/atdata-lexicon.gitTypeScript codegen with lex-cli:
npx @atproto/lex-cli gen-api ./src/client ./lexicons/science/alt/dataset/*.jsonPython (via atdata):
from atdata.lexicons import load_lexicon
schema = load_lexicon("science.alt.dataset.entry")npm install
npx lex gen-api --yes /tmp/validate-output ./lexicons/science/alt/dataset/*.jsonCI runs this on every push and pull request.
Lexicons are automatically published to the ATProto PDS when a GitHub release is created. For manual publishing, run scripts/publish.sh with GOAT_USERNAME and GOAT_PASSWORD environment variables set. See scripts/publish.sh for details.
JSON Schema shim definitions are published to https://json-schema.alt.science/ on each release.
These define the serialization formats for array types (ndarray, sparse, arrow tensor, etc.).