foundation-ac
diff --git a/‎.chainlink/issues.db‎
64 KB b/‎.chainlink/issues.db‎
64 KB
diff --git a/‎.github/workflows/uv-test.yml‎
Lines changed: 58 additions & 3 deletions b/‎.github/workflows/uv-test.yml‎
Lines changed: 58 additions & 3 deletions
diff --git a/‎.gitignore‎
Lines changed: 3 additions & 0 deletions b/‎.gitignore‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎.vscode/settings.json‎
Lines changed: 6 additions & 0 deletions b/‎.vscode/settings.json‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎CHANGELOG.md‎
Lines changed: 105 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 105 additions & 0 deletions
diff --git a/‎CLAUDE.md‎
Lines changed: 4 additions & 2 deletions b/‎CLAUDE.md‎
Lines changed: 4 additions & 2 deletions
diff --git a/‎benchmarks/__init__.py‎ b/‎benchmarks/__init__.py‎
@@ -4,13 +4,11 @@ on:
   push:
     branches:
       - main
-      - release/*
   pull_request:
-    branches:
-      - main
 
 permissions:
   contents: read
+  actions: read
 
 concurrency:
   group: ${{ github.workflow }}-${{ github.ref }}
@@ -77,3 +75,60 @@ jobs:
         with:
           fail_ci_if_error: false
           token: ${{ secrets.CODECOV_TOKEN }}
+
+  benchmark:
+    name: Benchmarks
+    runs-on: ubuntu-latest
+    needs: [lint]
+    permissions:
+      contents: write
+      actions: write
+    steps:
+      - uses: actions/checkout@v5
+
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: "3.14"
+
+      - name: Install uv
+        uses: astral-sh/setup-uv@v6
+        with:
+          enable-cache: true
+
+      - name: Install just
+        uses: extractions/setup-just@v2
+
+      - name: Install the project
+        run: uv sync --locked --all-extras --dev
+
+      - name: Start Redis
+        uses: supercharge/redis-github-action@1.8.1
+        with:
+          redis-version: 7
+
+      - name: Run benchmarks
+        run: just bench
+
+      - name: Copy report to docs
+        run: |
+          mkdir -p docs/benchmarks
+          cp .bench/report.html docs/benchmarks/index.html
+
+      - name: Commit updated benchmark docs
+        if: github.event_name == 'push'
+        run: |
+          git config user.name "github-actions[bot]"
+          git config user.email "github-actions[bot]@users.noreply.github.com"
+          git add docs/benchmarks/index.html
+          git diff --cached --quiet || git commit -m "docs: update benchmark report [skip ci]"
+          git push
+
+      - name: Upload benchmark report
+        uses: actions/upload-artifact@v4
+        if: always()
+        with:
+          name: benchmark-report
+          path: |
+            .bench/report.html
+            .bench/*.json
@@ -52,6 +52,9 @@ MANIFEST
 pip-log.txt
 pip-delete-this-directory.txt
 
+# Benchmark results
+.bench/
+
 # Unit test / coverage reports
 htmlcov/
 .tox/
 
@@ -5,15 +5,19 @@
         "atproto",
         "creds",
         "dtype",
+        "fastparquet",
         "getattr",
         "hgetall",
         "hset",
+        "libipld",
         "maxcount",
         "minioadmin",
         "msgpack",
         "ndarray",
         "NSID",
         "ormsgpack",
+        "psycopg",
+        "pydantic",
         "pypi",
         "pyproject",
         "pytest",
@@ -24,6 +28,8 @@
         "schemamodels",
         "shardlists",
         "tariterators",
+        "tqdm",
+        "typer",
         "unpackb",
         "webdataset"
     ],  
 
@@ -25,6 +25,111 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
 - **Comprehensive integration test suite**: 593 tests covering E2E flows, error handling, edge cases
 
 ### Changed
+- Investigate upload-artifact not finding benchmark output (#512)
+- Fix duplicate CI runs for push+PR overlap (#511)
+- Scope contents:write permission to benchmark job only (#510)
+- Add benchmark docs auto-commit to CI workflow (#509)
+- Submit PR for v0.3.0b1 release to upstream/main (#508)
+- Implement GH#39: Production hardening (observability, error handling, testing infra) (#504)
+- Add pluggable structured logging via atdata.configure_logging (#507)
+- Add PartialFailureError and shard-level error handling to Dataset.map (#506)
+- Add atdata.testing module with mock clients, fixtures, and helpers (#505)
+- Fix CI linting failures (20 ruff errors) (#503)
+- Adversarial review: Post-benchmark suite assessment (#494)
+- Remove redundant protocol docstrings that restate signatures (#500)
+- Add missing unit tests for _type_utils.py (#499)
+- Strengthen weak assertions (assert X is not None → value checks) (#498)
+- Trim verbose exception constructor docstrings (#501)
+- Analyze benchmark results for performance improvement opportunities (#502)
+- Consolidate remaining duplicate sample types in test files (#497)
+- Remove dead code: _repo_legacy.py legacy UUID field, unused imports (#496)
+- Trim verbose docstrings in dataset.py and _index.py (#495)
+- Benchmark report: replace mean/stddev with median/IQR, add per-sample columns (#492)
+- Add parameter descriptions to benchmark suite with automatic report introspection (#491)
+- HTML benchmark reports with CI integration (#487)
+- Add bench + render step to CI on highest Python version only (#490)
+- Update justfile bench commands to export JSON and render (#489)
+- Create render_report.py script to convert JSON to HTML (#488)
+- Increase test coverage for low-coverage modules (#480)
+- Add providers/_postgres.py tests (mock-based) (#485)
+- Add _stub_manager.py tests (#484)
+- Add manifest/_query.py tests (#483)
+- Add repository.py tests (#482)
+- Add CLI tests (cli/__init__, diagnose, local, preview, schema) (#481)
+- Check test coverage for CLI utils (#479)
+- Add performance benchmark suite for atdata (#471)
+- Verify benchmarks run (#478)
+- Update pyproject.toml and justfile (#477)
+- Create bench_atmosphere.py (#476)
+- Create bench_query.py (#475)
+- Create bench_dataset_io.py (#474)
+- Create bench_index_providers.py (#473)
+- Create benchmarks/conftest.py with shared fixtures (#472)
+- Add per-shard manifest and query system (GH #35) (#462)
+- Write unit and integration tests (#470)
+- Integrate manifest into write path and Dataset.query() (#469)
+- Implement QueryExecutor and SampleLocation (#468)
+- Implement ManifestWriter (JSON + parquet) (#467)
+- Implement ManifestBuilder (#465)
+- Implement ShardManifest data model (#466)
+- Implement aggregate collectors (categorical, numeric, set) (#464)
+- Implement ManifestField annotation and resolve_manifest_fields() (#463)
+- Migrate type annotations from PackableSample to Packable protocol (#461)
+- Remove LocalIndex factory — consolidate to Index (#460)
+- Split local.py monolith into local/ package (#452)
+- Verify tests and lint pass (#459)
+- Create __init__.py re-export facade and delete local.py (#458)
+- Create _repo_legacy.py with deprecated Repo class (#457)
+- Create _index.py with Index class and LocalIndex factory (#456)
+- Create _s3.py with S3DataStore and S3 helpers (#455)
+- Create _schema.py with schema models and helpers (#454)
+- Create _entry.py with LocalDatasetEntry and constants (#453)
+- Migrate CLI from argparse to typer (#449)
+- Investigate test failures (#450)
+- Fix ensure_stub receiving LocalSchemaRecord instead of dict (#451)
+- GH#38: Developer experience improvements (#437)
+- CLI: atdata preview command (#440)
+- CLI: atdata schema show/diff commands (#439)
+- CLI: atdata inspect command (#438)
+- Dataset.__len__ and Dataset.select() for sample count and indexed access (#447)
+- Dataset.to_pandas() and Dataset.to_dict() export methods (#446)
+- Dataset.filter() and Dataset.map() streaming transforms (#445)
+- Dataset.get(key) for keyed sample access (#442)
+- Dataset.describe() summary statistics (#444)
+- Dataset.schema property and column_names (#443)
+- Dataset.head(n) and Dataset.__iter__ convenience methods (#441)
+- Custom exception hierarchy with actionable error messages (#448)
+- Adversarial review: Post-Repository consolidation assessment (#430)
+- Remove backwards-compat dict-access methods from SchemaField and LocalSchemaRecord (#436)
+- Add missing test coverage for Repository prefix routing edge cases and error paths (#435)
+- Trim over-verbose docstrings in local.py module/class level (#434)
+- Fix formally incorrect test assertions (batch_size, CID, brace notation) (#433)
+- Consolidate duplicate test sample types across test files into conftest.py (#432)
+- Consolidate duplicate entry-creation logic in Index (add_entry vs _insert_dataset_to_provider) (#431)
+- Switch default Index provider from Redis to SQLite (#429)
+- Consolidated Index with Repository system (#424)
+- Phase 4: Deprecate AtmosphereIndex, update exports (#428)
+- Phase 3: Default Index singleton and load_dataset integration (#427)
+- Phase 2: Extend Index with repos/atmosphere params and prefix routing (#426)
+- Phase 1: Create Repository dataclass and _AtmosphereBackend in repository.py (#425)
+- Adversarial review: Post-IndexProvider pluggable storage assessment (#417)
+- Convert TODO comments to tracked issues or remove (#422)
+- Remove deprecated shard_list property references from docstrings (#421)
+- Replace bare except in _stub_manager.py and cli/local.py with specific exceptions (#423)
+- Tighten generic pytest.raises(Exception) to specific exception types in tests (#420)
+- Replace assert statements with ValueError in production code (#419)
+- Consolidate duplicated _parse_semver into _type_utils.py (#418)
+- feat: Add SQLite/PostgreSQL index providers (GH #42) (#409)
+- Update documentation and public API exports (#416)
+- Add tests for all providers (#415)
+- Refactor Index class to accept provider parameter (#414)
+- Implement PostgresIndexProvider (#413)
+- Implement SqliteIndexProvider (#412)
+- Implement RedisIndexProvider (extract from Index class) (#411)
+- Define IndexProvider protocol in _protocols.py (#410)
+- Add just lint command to justfile (#408)
+- Add SQLite/PostgreSQL providers for LocalIndex (in addition to Redis) (#407)
+- Fix type hints for @atdata.packable decorator to show PackableSample methods (#406)
 - Review GitHub workflows and recommend CI improvements (#405)
 - Fix type signatures for Dataset.ordered and Dataset.shuffled (GH#28) (#404)
 - Investigate quartodoc Example section rendering - missing CSS classes on pre/code tags (#401)
 
@@ -46,8 +46,10 @@ uv build
 Development tasks are managed with [just](https://github.com/casey/just), a command runner. Available commands:
 
 ```bash
-# Build documentation (runs quartodoc + quarto)
-just docs
+just test              # Run all tests with coverage
+just test tests/test_dataset.py  # Run specific test file
+just lint              # Run ruff check + format check
+just docs              # Build documentation (runs quartodoc + quarto)
 ```
 
 The `justfile` is in the project root. Add new dev tasks there rather than creating shell scripts.