Skip to content

forecast-bio/atdata-lexicon

Repository files navigation

atdata-lexicon

Authoritative ATProto lexicon definitions for the science.alt.dataset namespace -- the protocol-level schema for federated scientific datasets.

These lexicons define the record types, queries, and extensible tokens used by the atdata dataset federation protocol. They are independent of any specific language implementation.

See docs/spec.md for the full lexicon inventory, record relationships, and versioning policy.

Directory structure

Follows the ATProto convention of mapping NSIDs to directory paths:

lexicons/
  science/
    alt/
      dataset/
        entry.json           # science.alt.dataset.entry
        schema.json          # science.alt.dataset.schema
        ...
schemas/
  ndarray_shim.json          # JSON Schema (not an ATProto lexicon)

Consuming these lexicons

Raw git reference (recommended for most consumers):

git clone https://github.com/forecast-bio/atdata-lexicon.git

TypeScript codegen with lex-cli:

npx @atproto/lex-cli gen-api ./src/client ./lexicons/science/alt/dataset/*.json

Python (via atdata):

from atdata.lexicons import load_lexicon
schema = load_lexicon("science.alt.dataset.entry")

Validation

npm install
npx lex gen-api --yes /tmp/validate-output ./lexicons/science/alt/dataset/*.json

CI runs this on every push and pull request.

Publishing

Lexicons are automatically published to the ATProto PDS when a GitHub release is created. For manual publishing, run scripts/publish.sh with GOAT_USERNAME and GOAT_PASSWORD environment variables set. See scripts/publish.sh for details.

Schema Hosting

JSON Schema shim definitions are published to https://json-schema.alt.science/ on each release. These define the serialization formats for array types (ndarray, sparse, arrow tensor, etc.).

License

MIT

About

ATProto lexicon for atdata

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors