Skip to content

Commit 43a0204

Browse files
Merge pull request #44 from forecast-bio/release/v0.3.0b1
release: atdata v0.3.0b1
2 parents 2e6a93d + 3782607 commit 43a0204

File tree

155 files changed

+16482
-12752
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

155 files changed

+16482
-12752
lines changed

.chainlink/issues.db

64 KB
Binary file not shown.

.github/workflows/uv-test.yml

Lines changed: 58 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,11 @@ on:
44
push:
55
branches:
66
- main
7-
- release/*
87
pull_request:
9-
branches:
10-
- main
118

129
permissions:
1310
contents: read
11+
actions: read
1412

1513
concurrency:
1614
group: ${{ github.workflow }}-${{ github.ref }}
@@ -77,3 +75,60 @@ jobs:
7775
with:
7876
fail_ci_if_error: false
7977
token: ${{ secrets.CODECOV_TOKEN }}
78+
79+
benchmark:
80+
name: Benchmarks
81+
runs-on: ubuntu-latest
82+
needs: [lint]
83+
permissions:
84+
contents: write
85+
actions: write
86+
steps:
87+
- uses: actions/checkout@v5
88+
89+
- name: Set up Python
90+
uses: actions/setup-python@v5
91+
with:
92+
python-version: "3.14"
93+
94+
- name: Install uv
95+
uses: astral-sh/setup-uv@v6
96+
with:
97+
enable-cache: true
98+
99+
- name: Install just
100+
uses: extractions/setup-just@v2
101+
102+
- name: Install the project
103+
run: uv sync --locked --all-extras --dev
104+
105+
- name: Start Redis
106+
uses: supercharge/redis-github-action@1.8.1
107+
with:
108+
redis-version: 7
109+
110+
- name: Run benchmarks
111+
run: just bench
112+
113+
- name: Copy report to docs
114+
run: |
115+
mkdir -p docs/benchmarks
116+
cp .bench/report.html docs/benchmarks/index.html
117+
118+
- name: Commit updated benchmark docs
119+
if: github.event_name == 'push'
120+
run: |
121+
git config user.name "github-actions[bot]"
122+
git config user.email "github-actions[bot]@users.noreply.github.com"
123+
git add docs/benchmarks/index.html
124+
git diff --cached --quiet || git commit -m "docs: update benchmark report [skip ci]"
125+
git push
126+
127+
- name: Upload benchmark report
128+
uses: actions/upload-artifact@v4
129+
if: always()
130+
with:
131+
name: benchmark-report
132+
path: |
133+
.bench/report.html
134+
.bench/*.json

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,9 @@ MANIFEST
5252
pip-log.txt
5353
pip-delete-this-directory.txt
5454

55+
# Benchmark results
56+
.bench/
57+
5558
# Unit test / coverage reports
5659
htmlcov/
5760
.tox/

.vscode/settings.json

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,15 +5,19 @@
55
"atproto",
66
"creds",
77
"dtype",
8+
"fastparquet",
89
"getattr",
910
"hgetall",
1011
"hset",
12+
"libipld",
1113
"maxcount",
1214
"minioadmin",
1315
"msgpack",
1416
"ndarray",
1517
"NSID",
1618
"ormsgpack",
19+
"psycopg",
20+
"pydantic",
1721
"pypi",
1822
"pyproject",
1923
"pytest",
@@ -24,6 +28,8 @@
2428
"schemamodels",
2529
"shardlists",
2630
"tariterators",
31+
"tqdm",
32+
"typer",
2733
"unpackb",
2834
"webdataset"
2935
],

CHANGELOG.md

Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,111 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
2525
- **Comprehensive integration test suite**: 593 tests covering E2E flows, error handling, edge cases
2626

2727
### Changed
28+
- Investigate upload-artifact not finding benchmark output (#512)
29+
- Fix duplicate CI runs for push+PR overlap (#511)
30+
- Scope contents:write permission to benchmark job only (#510)
31+
- Add benchmark docs auto-commit to CI workflow (#509)
32+
- Submit PR for v0.3.0b1 release to upstream/main (#508)
33+
- Implement GH#39: Production hardening (observability, error handling, testing infra) (#504)
34+
- Add pluggable structured logging via atdata.configure_logging (#507)
35+
- Add PartialFailureError and shard-level error handling to Dataset.map (#506)
36+
- Add atdata.testing module with mock clients, fixtures, and helpers (#505)
37+
- Fix CI linting failures (20 ruff errors) (#503)
38+
- Adversarial review: Post-benchmark suite assessment (#494)
39+
- Remove redundant protocol docstrings that restate signatures (#500)
40+
- Add missing unit tests for _type_utils.py (#499)
41+
- Strengthen weak assertions (assert X is not None → value checks) (#498)
42+
- Trim verbose exception constructor docstrings (#501)
43+
- Analyze benchmark results for performance improvement opportunities (#502)
44+
- Consolidate remaining duplicate sample types in test files (#497)
45+
- Remove dead code: _repo_legacy.py legacy UUID field, unused imports (#496)
46+
- Trim verbose docstrings in dataset.py and _index.py (#495)
47+
- Benchmark report: replace mean/stddev with median/IQR, add per-sample columns (#492)
48+
- Add parameter descriptions to benchmark suite with automatic report introspection (#491)
49+
- HTML benchmark reports with CI integration (#487)
50+
- Add bench + render step to CI on highest Python version only (#490)
51+
- Update justfile bench commands to export JSON and render (#489)
52+
- Create render_report.py script to convert JSON to HTML (#488)
53+
- Increase test coverage for low-coverage modules (#480)
54+
- Add providers/_postgres.py tests (mock-based) (#485)
55+
- Add _stub_manager.py tests (#484)
56+
- Add manifest/_query.py tests (#483)
57+
- Add repository.py tests (#482)
58+
- Add CLI tests (cli/__init__, diagnose, local, preview, schema) (#481)
59+
- Check test coverage for CLI utils (#479)
60+
- Add performance benchmark suite for atdata (#471)
61+
- Verify benchmarks run (#478)
62+
- Update pyproject.toml and justfile (#477)
63+
- Create bench_atmosphere.py (#476)
64+
- Create bench_query.py (#475)
65+
- Create bench_dataset_io.py (#474)
66+
- Create bench_index_providers.py (#473)
67+
- Create benchmarks/conftest.py with shared fixtures (#472)
68+
- Add per-shard manifest and query system (GH #35) (#462)
69+
- Write unit and integration tests (#470)
70+
- Integrate manifest into write path and Dataset.query() (#469)
71+
- Implement QueryExecutor and SampleLocation (#468)
72+
- Implement ManifestWriter (JSON + parquet) (#467)
73+
- Implement ManifestBuilder (#465)
74+
- Implement ShardManifest data model (#466)
75+
- Implement aggregate collectors (categorical, numeric, set) (#464)
76+
- Implement ManifestField annotation and resolve_manifest_fields() (#463)
77+
- Migrate type annotations from PackableSample to Packable protocol (#461)
78+
- Remove LocalIndex factory — consolidate to Index (#460)
79+
- Split local.py monolith into local/ package (#452)
80+
- Verify tests and lint pass (#459)
81+
- Create __init__.py re-export facade and delete local.py (#458)
82+
- Create _repo_legacy.py with deprecated Repo class (#457)
83+
- Create _index.py with Index class and LocalIndex factory (#456)
84+
- Create _s3.py with S3DataStore and S3 helpers (#455)
85+
- Create _schema.py with schema models and helpers (#454)
86+
- Create _entry.py with LocalDatasetEntry and constants (#453)
87+
- Migrate CLI from argparse to typer (#449)
88+
- Investigate test failures (#450)
89+
- Fix ensure_stub receiving LocalSchemaRecord instead of dict (#451)
90+
- GH#38: Developer experience improvements (#437)
91+
- CLI: atdata preview command (#440)
92+
- CLI: atdata schema show/diff commands (#439)
93+
- CLI: atdata inspect command (#438)
94+
- Dataset.__len__ and Dataset.select() for sample count and indexed access (#447)
95+
- Dataset.to_pandas() and Dataset.to_dict() export methods (#446)
96+
- Dataset.filter() and Dataset.map() streaming transforms (#445)
97+
- Dataset.get(key) for keyed sample access (#442)
98+
- Dataset.describe() summary statistics (#444)
99+
- Dataset.schema property and column_names (#443)
100+
- Dataset.head(n) and Dataset.__iter__ convenience methods (#441)
101+
- Custom exception hierarchy with actionable error messages (#448)
102+
- Adversarial review: Post-Repository consolidation assessment (#430)
103+
- Remove backwards-compat dict-access methods from SchemaField and LocalSchemaRecord (#436)
104+
- Add missing test coverage for Repository prefix routing edge cases and error paths (#435)
105+
- Trim over-verbose docstrings in local.py module/class level (#434)
106+
- Fix formally incorrect test assertions (batch_size, CID, brace notation) (#433)
107+
- Consolidate duplicate test sample types across test files into conftest.py (#432)
108+
- Consolidate duplicate entry-creation logic in Index (add_entry vs _insert_dataset_to_provider) (#431)
109+
- Switch default Index provider from Redis to SQLite (#429)
110+
- Consolidated Index with Repository system (#424)
111+
- Phase 4: Deprecate AtmosphereIndex, update exports (#428)
112+
- Phase 3: Default Index singleton and load_dataset integration (#427)
113+
- Phase 2: Extend Index with repos/atmosphere params and prefix routing (#426)
114+
- Phase 1: Create Repository dataclass and _AtmosphereBackend in repository.py (#425)
115+
- Adversarial review: Post-IndexProvider pluggable storage assessment (#417)
116+
- Convert TODO comments to tracked issues or remove (#422)
117+
- Remove deprecated shard_list property references from docstrings (#421)
118+
- Replace bare except in _stub_manager.py and cli/local.py with specific exceptions (#423)
119+
- Tighten generic pytest.raises(Exception) to specific exception types in tests (#420)
120+
- Replace assert statements with ValueError in production code (#419)
121+
- Consolidate duplicated _parse_semver into _type_utils.py (#418)
122+
- feat: Add SQLite/PostgreSQL index providers (GH #42) (#409)
123+
- Update documentation and public API exports (#416)
124+
- Add tests for all providers (#415)
125+
- Refactor Index class to accept provider parameter (#414)
126+
- Implement PostgresIndexProvider (#413)
127+
- Implement SqliteIndexProvider (#412)
128+
- Implement RedisIndexProvider (extract from Index class) (#411)
129+
- Define IndexProvider protocol in _protocols.py (#410)
130+
- Add just lint command to justfile (#408)
131+
- Add SQLite/PostgreSQL providers for LocalIndex (in addition to Redis) (#407)
132+
- Fix type hints for @atdata.packable decorator to show PackableSample methods (#406)
28133
- Review GitHub workflows and recommend CI improvements (#405)
29134
- Fix type signatures for Dataset.ordered and Dataset.shuffled (GH#28) (#404)
30135
- Investigate quartodoc Example section rendering - missing CSS classes on pre/code tags (#401)

CLAUDE.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -46,8 +46,10 @@ uv build
4646
Development tasks are managed with [just](https://github.com/casey/just), a command runner. Available commands:
4747

4848
```bash
49-
# Build documentation (runs quartodoc + quarto)
50-
just docs
49+
just test # Run all tests with coverage
50+
just test tests/test_dataset.py # Run specific test file
51+
just lint # Run ruff check + format check
52+
just docs # Build documentation (runs quartodoc + quarto)
5153
```
5254

5355
The `justfile` is in the project root. Add new dev tasks there rather than creating shell scripts.

benchmarks/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)