Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
047c6ef
fix(types): change @packable return type to expose PackableSample met…
maxinelevesque Jan 28, 2026
2531f7d
Merge branch 'release/v0.2.3b1' into dev/maxine-at-forecast
maxinelevesque Jan 28, 2026
f6665a5
chore(dev): add just test/lint commands and regenerate documentation
maxinelevesque Jan 29, 2026
518e968
feat(index): add pluggable storage providers for Index (SQLite, Postg…
maxinelevesque Jan 29, 2026
65411e6
refactor: address adversarial review findings across codebase
maxinelevesque Jan 29, 2026
d23b465
Merge branch 'feature/gh-42-index-sql-backend' into dev/maxine-at-for…
maxinelevesque Jan 29, 2026
cdb2f47
feat(index): add Repository system, prefix routing, and default Index…
maxinelevesque Jan 29, 2026
50fea99
feat(index): default to SQLite provider instead of Redis for zero-dep…
maxinelevesque Jan 29, 2026
5854198
refactor(index): post-repository adversarial review cleanup
maxinelevesque Jan 29, 2026
bf5c372
Merge branch 'feature/synthesized-index' into dev/maxine-at-forecast
maxinelevesque Jan 29, 2026
fffbe6f
feat: add Dataset convenience methods and structured exception hierar…
maxinelevesque Jan 29, 2026
329718c
feat(cli): add inspect, schema show/diff, and preview commands (GH#38)
maxinelevesque Jan 29, 2026
592dad3
feat(cli): migrate from argparse to typer and fix auto-stub generatio…
maxinelevesque Jan 29, 2026
77c7951
Merge branch 'feature/gh-38-dev-experience' into dev/maxine-at-forecast
maxinelevesque Jan 29, 2026
8afa909
refactor(local): split local.py monolith into local/ package with foc…
maxinelevesque Jan 29, 2026
cd7f35f
refactor(local): remove LocalIndex factory, consolidate provider sele…
maxinelevesque Jan 30, 2026
70bc639
Merge branch 'feature/split-local' into dev/maxine-at-forecast
maxinelevesque Jan 30, 2026
ec48637
refactor: migrate type bounds from PackableSample to Packable protocol
maxinelevesque Jan 30, 2026
35789a3
Merge branch 'feature/use-packable-protocol' into dev/maxine-at-forecast
maxinelevesque Jan 30, 2026
24a949c
feat: add per-shard manifest and query system (GH#35)
maxinelevesque Jan 30, 2026
e1a8d2c
Merge branch 'feature/gh-35-manifest-and-query' into dev/maxine-at-fo…
maxinelevesque Jan 30, 2026
7d4e49d
feat(bench): add performance benchmark suite with pytest-benchmark
maxinelevesque Jan 30, 2026
df27ac4
test: add coverage tests for CLI, postgres provider, query, repositor…
maxinelevesque Jan 30, 2026
d189f5f
feat(bench): split benchmarks by category, add report rendering and C…
maxinelevesque Jan 30, 2026
0d39ef6
feat(bench): add query result iteration benchmarks
maxinelevesque Jan 30, 2026
68ee480
chore(docs): copy benchmark report into docs site during build
maxinelevesque Jan 30, 2026
23f718f
refactor: remove unused imports, trim docstrings, consolidate test sa…
maxinelevesque Jan 30, 2026
7a2bbaf
refactor: optimize array serialization, trim protocol docstrings, fix…
maxinelevesque Jan 30, 2026
c277dd2
Merge branch 'feature/performance-eval' into dev/maxine-at-forecast
maxinelevesque Jan 30, 2026
b6c39f3
Merge branch 'dev/maxine-at-forecast' into release/v0.3.0b1
maxinelevesque Jan 30, 2026
1c199b1
release: prepare v0.3.0b1
maxinelevesque Jan 30, 2026
3ad01d2
fix(lint): resolve ruff formatting and style errors for CI
maxinelevesque Jan 30, 2026
9346396
feat: add structured logging, partial failure handling, and testing u…
maxinelevesque Jan 30, 2026
3cea15c
Merge branch 'feature/gh-39-hardening' into release/v0.3.0b1
maxinelevesque Jan 30, 2026
33d7d9f
Updates to workspace vocabulary
maxinelevesque Jan 31, 2026
35e959b
ci: auto-commit benchmark report to docs and scope write permissions
maxinelevesque Jan 31, 2026
f9cf2fd
docs: update benchmark report [skip ci]
github-actions[bot] Jan 31, 2026
0ee0a3d
ci: fix duplicate CI runs for push+PR overlap
maxinelevesque Jan 31, 2026
dcf0e90
docs: update benchmark report [skip ci]
github-actions[bot] Jan 31, 2026
f3d45b4
ci: add actions:write permission to benchmark job for artifact upload
maxinelevesque Jan 31, 2026
5475b6e
Merge remote-tracking branch 'origin/release/v0.3.0b1' into release/v…
maxinelevesque Jan 31, 2026
3782607
Merge remote-tracking branch 'upstream/release/v0.3.0b1' into release…
maxinelevesque Jan 31, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file modified .chainlink/issues.db
Binary file not shown.
61 changes: 58 additions & 3 deletions .github/workflows/uv-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,11 @@ on:
push:
branches:
- main
- release/*
pull_request:
branches:
- main

permissions:
contents: read
actions: read

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
Expand Down Expand Up @@ -77,3 +75,60 @@ jobs:
with:
fail_ci_if_error: false
token: ${{ secrets.CODECOV_TOKEN }}

benchmark:
name: Benchmarks
runs-on: ubuntu-latest
needs: [lint]
permissions:
contents: write
actions: write
steps:
- uses: actions/checkout@v5

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.14"

- name: Install uv
uses: astral-sh/setup-uv@v6
with:
enable-cache: true

- name: Install just
uses: extractions/setup-just@v2

- name: Install the project
run: uv sync --locked --all-extras --dev

- name: Start Redis
uses: supercharge/redis-github-action@1.8.1
with:
redis-version: 7

- name: Run benchmarks
run: just bench

- name: Copy report to docs
run: |
mkdir -p docs/benchmarks
cp .bench/report.html docs/benchmarks/index.html

- name: Commit updated benchmark docs
if: github.event_name == 'push'
run: |
git config user.name "github-actions[bot]"
git config user.email "github-actions[bot]@users.noreply.github.com"
git add docs/benchmarks/index.html
git diff --cached --quiet || git commit -m "docs: update benchmark report [skip ci]"
git push

- name: Upload benchmark report
uses: actions/upload-artifact@v4
if: always()
with:
name: benchmark-report
path: |
.bench/report.html
.bench/*.json
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,9 @@ MANIFEST
pip-log.txt
pip-delete-this-directory.txt

# Benchmark results
.bench/

# Unit test / coverage reports
htmlcov/
.tox/
Expand Down
6 changes: 6 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,19 @@
"atproto",
"creds",
"dtype",
"fastparquet",
"getattr",
"hgetall",
"hset",
"libipld",
"maxcount",
"minioadmin",
"msgpack",
"ndarray",
"NSID",
"ormsgpack",
"psycopg",
"pydantic",
"pypi",
"pyproject",
"pytest",
Expand All @@ -24,6 +28,8 @@
"schemamodels",
"shardlists",
"tariterators",
"tqdm",
"typer",
"unpackb",
"webdataset"
],
Expand Down
105 changes: 105 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,111 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
- **Comprehensive integration test suite**: 593 tests covering E2E flows, error handling, edge cases

### Changed
- Investigate upload-artifact not finding benchmark output (#512)
- Fix duplicate CI runs for push+PR overlap (#511)
- Scope contents:write permission to benchmark job only (#510)
- Add benchmark docs auto-commit to CI workflow (#509)
- Submit PR for v0.3.0b1 release to upstream/main (#508)
- Implement GH#39: Production hardening (observability, error handling, testing infra) (#504)
- Add pluggable structured logging via atdata.configure_logging (#507)
- Add PartialFailureError and shard-level error handling to Dataset.map (#506)
- Add atdata.testing module with mock clients, fixtures, and helpers (#505)
- Fix CI linting failures (20 ruff errors) (#503)
- Adversarial review: Post-benchmark suite assessment (#494)
- Remove redundant protocol docstrings that restate signatures (#500)
- Add missing unit tests for _type_utils.py (#499)
- Strengthen weak assertions (assert X is not None → value checks) (#498)
- Trim verbose exception constructor docstrings (#501)
- Analyze benchmark results for performance improvement opportunities (#502)
- Consolidate remaining duplicate sample types in test files (#497)
- Remove dead code: _repo_legacy.py legacy UUID field, unused imports (#496)
- Trim verbose docstrings in dataset.py and _index.py (#495)
- Benchmark report: replace mean/stddev with median/IQR, add per-sample columns (#492)
- Add parameter descriptions to benchmark suite with automatic report introspection (#491)
- HTML benchmark reports with CI integration (#487)
- Add bench + render step to CI on highest Python version only (#490)
- Update justfile bench commands to export JSON and render (#489)
- Create render_report.py script to convert JSON to HTML (#488)
- Increase test coverage for low-coverage modules (#480)
- Add providers/_postgres.py tests (mock-based) (#485)
- Add _stub_manager.py tests (#484)
- Add manifest/_query.py tests (#483)
- Add repository.py tests (#482)
- Add CLI tests (cli/__init__, diagnose, local, preview, schema) (#481)
- Check test coverage for CLI utils (#479)
- Add performance benchmark suite for atdata (#471)
- Verify benchmarks run (#478)
- Update pyproject.toml and justfile (#477)
- Create bench_atmosphere.py (#476)
- Create bench_query.py (#475)
- Create bench_dataset_io.py (#474)
- Create bench_index_providers.py (#473)
- Create benchmarks/conftest.py with shared fixtures (#472)
- Add per-shard manifest and query system (GH #35) (#462)
- Write unit and integration tests (#470)
- Integrate manifest into write path and Dataset.query() (#469)
- Implement QueryExecutor and SampleLocation (#468)
- Implement ManifestWriter (JSON + parquet) (#467)
- Implement ManifestBuilder (#465)
- Implement ShardManifest data model (#466)
- Implement aggregate collectors (categorical, numeric, set) (#464)
- Implement ManifestField annotation and resolve_manifest_fields() (#463)
- Migrate type annotations from PackableSample to Packable protocol (#461)
- Remove LocalIndex factory — consolidate to Index (#460)
- Split local.py monolith into local/ package (#452)
- Verify tests and lint pass (#459)
- Create __init__.py re-export facade and delete local.py (#458)
- Create _repo_legacy.py with deprecated Repo class (#457)
- Create _index.py with Index class and LocalIndex factory (#456)
- Create _s3.py with S3DataStore and S3 helpers (#455)
- Create _schema.py with schema models and helpers (#454)
- Create _entry.py with LocalDatasetEntry and constants (#453)
- Migrate CLI from argparse to typer (#449)
- Investigate test failures (#450)
- Fix ensure_stub receiving LocalSchemaRecord instead of dict (#451)
- GH#38: Developer experience improvements (#437)
- CLI: atdata preview command (#440)
- CLI: atdata schema show/diff commands (#439)
- CLI: atdata inspect command (#438)
- Dataset.__len__ and Dataset.select() for sample count and indexed access (#447)
- Dataset.to_pandas() and Dataset.to_dict() export methods (#446)
- Dataset.filter() and Dataset.map() streaming transforms (#445)
- Dataset.get(key) for keyed sample access (#442)
- Dataset.describe() summary statistics (#444)
- Dataset.schema property and column_names (#443)
- Dataset.head(n) and Dataset.__iter__ convenience methods (#441)
- Custom exception hierarchy with actionable error messages (#448)
- Adversarial review: Post-Repository consolidation assessment (#430)
- Remove backwards-compat dict-access methods from SchemaField and LocalSchemaRecord (#436)
- Add missing test coverage for Repository prefix routing edge cases and error paths (#435)
- Trim over-verbose docstrings in local.py module/class level (#434)
- Fix formally incorrect test assertions (batch_size, CID, brace notation) (#433)
- Consolidate duplicate test sample types across test files into conftest.py (#432)
- Consolidate duplicate entry-creation logic in Index (add_entry vs _insert_dataset_to_provider) (#431)
- Switch default Index provider from Redis to SQLite (#429)
- Consolidated Index with Repository system (#424)
- Phase 4: Deprecate AtmosphereIndex, update exports (#428)
- Phase 3: Default Index singleton and load_dataset integration (#427)
- Phase 2: Extend Index with repos/atmosphere params and prefix routing (#426)
- Phase 1: Create Repository dataclass and _AtmosphereBackend in repository.py (#425)
- Adversarial review: Post-IndexProvider pluggable storage assessment (#417)
- Convert TODO comments to tracked issues or remove (#422)
- Remove deprecated shard_list property references from docstrings (#421)
- Replace bare except in _stub_manager.py and cli/local.py with specific exceptions (#423)
- Tighten generic pytest.raises(Exception) to specific exception types in tests (#420)
- Replace assert statements with ValueError in production code (#419)
- Consolidate duplicated _parse_semver into _type_utils.py (#418)
- feat: Add SQLite/PostgreSQL index providers (GH #42) (#409)
- Update documentation and public API exports (#416)
- Add tests for all providers (#415)
- Refactor Index class to accept provider parameter (#414)
- Implement PostgresIndexProvider (#413)
- Implement SqliteIndexProvider (#412)
- Implement RedisIndexProvider (extract from Index class) (#411)
- Define IndexProvider protocol in _protocols.py (#410)
- Add just lint command to justfile (#408)
- Add SQLite/PostgreSQL providers for LocalIndex (in addition to Redis) (#407)
- Fix type hints for @atdata.packable decorator to show PackableSample methods (#406)
- Review GitHub workflows and recommend CI improvements (#405)
- Fix type signatures for Dataset.ordered and Dataset.shuffled (GH#28) (#404)
- Investigate quartodoc Example section rendering - missing CSS classes on pre/code tags (#401)
Expand Down
6 changes: 4 additions & 2 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,8 +46,10 @@ uv build
Development tasks are managed with [just](https://github.com/casey/just), a command runner. Available commands:

```bash
# Build documentation (runs quartodoc + quarto)
just docs
just test # Run all tests with coverage
just test tests/test_dataset.py # Run specific test file
just lint # Run ruff check + format check
just docs # Build documentation (runs quartodoc + quarto)
```

The `justfile` is in the project root. Add new dev tasks there rather than creating shell scripts.
Expand Down
Empty file added benchmarks/__init__.py
Empty file.
Loading
Loading