Skip to content

Commit 82c78a9

Browse files
ci: modernize GitHub Actions with caching, concurrency, lint checks, and trusted publishing
- Add uv caching (enable-cache: true) for faster CI runs - Add concurrency control to cancel in-progress runs on new commits - Add ruff lint job (check + format) targeting src/ and tests/ - Switch to --locked flag for reproducible dependency resolution - Add fail-fast: false to test matrix to see all failures - Enable Codecov coverage upload - Switch PyPI publishing to trusted publishing (OIDC) - Split publish workflow into build and publish jobs with artifacts - Fix all ruff lint issues (unused imports, undefined names) - Format entire codebase with ruff format (42 files) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1 parent 926f4ab commit 82c78a9

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

48 files changed

+2081
-1217
lines changed

.chainlink/issues.db

0 Bytes
Binary file not shown.
Lines changed: 40 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,3 @@
1-
#
2-
31
name: Build and upload package to PyPI
42

53
on:
@@ -9,38 +7,59 @@ on:
97

108
permissions:
119
contents: read
10+
id-token: write
1211

13-
jobs:
12+
concurrency:
13+
group: ${{ github.workflow }}-${{ github.ref }}
14+
cancel-in-progress: true
1415

15-
uv-build-release-pypi-publish:
16-
name: "Build release distribution and publish to PyPI"
16+
jobs:
17+
build:
18+
name: Build release distribution
1719
runs-on: ubuntu-latest
18-
environment:
19-
name: pypi
20-
20+
2121
steps:
2222
- uses: actions/checkout@v5
23-
24-
- name: "Set up Python"
23+
24+
- name: Set up Python
2525
uses: actions/setup-python@v5
2626
with:
2727
python-version-file: "pyproject.toml"
2828

2929
- name: Install uv
3030
uses: astral-sh/setup-uv@v6
31-
31+
with:
32+
enable-cache: true
33+
3234
- name: Install project
33-
run: uv sync --all-extras --dev
34-
# TODO Better to use --locked for author control over versions?
35-
# run: uv sync --locked --all-extras --dev
36-
35+
run: uv sync --locked --all-extras --dev
36+
3737
- name: Build release distributions
3838
run: uv build
39-
40-
- name: Publish to PyPI
41-
env:
42-
UV_PUBLISH_TOKEN: ${{ secrets.UV_PUBLISH_TOKEN }}
43-
run: uv publish
4439

40+
- name: Upload dist artifacts
41+
uses: actions/upload-artifact@v4
42+
with:
43+
name: dist
44+
path: dist/
4545

46-
##
46+
publish:
47+
name: Publish to PyPI
48+
runs-on: ubuntu-latest
49+
needs: build
50+
environment:
51+
name: pypi
52+
url: https://pypi.org/project/atdata/
53+
54+
steps:
55+
- name: Download dist artifacts
56+
uses: actions/download-artifact@v4
57+
with:
58+
name: dist
59+
path: dist/
60+
61+
- name: Install uv
62+
uses: astral-sh/setup-uv@v6
63+
64+
- name: Publish to PyPI
65+
run: uv publish --trusted-publishing always dist/*

.github/workflows/uv-test.yml

Lines changed: 40 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,3 @@
1-
#
2-
31
name: Run tests with `uv`
42

53
on:
@@ -11,33 +9,60 @@ on:
119
branches:
1210
- main
1311

12+
permissions:
13+
contents: read
14+
15+
concurrency:
16+
group: ${{ github.workflow }}-${{ github.ref }}
17+
cancel-in-progress: true
18+
1419
jobs:
15-
uv-test:
16-
name: Run tests
20+
lint:
21+
name: Lint
22+
runs-on: ubuntu-latest
23+
steps:
24+
- uses: actions/checkout@v5
25+
26+
- name: Install uv
27+
uses: astral-sh/setup-uv@v6
28+
with:
29+
enable-cache: true
30+
31+
- name: Install the project
32+
run: uv sync --locked --dev
33+
34+
- name: Run ruff check
35+
run: uv run ruff check src/ tests/
36+
37+
- name: Run ruff format check
38+
run: uv run ruff format --check src/ tests/
39+
40+
test:
41+
name: Test (py${{ matrix.python-version }}, redis${{ matrix.redis-version }})
1742
runs-on: ubuntu-latest
1843
environment:
1944
name: test
2045
strategy:
46+
fail-fast: false
2147
matrix:
22-
python-version: [3.12, 3.13, 3.14]
48+
python-version: ["3.10", "3.11", "3.12", "3.13", "3.14"]
2349
redis-version: [6, 7]
2450

2551
steps:
2652
- uses: actions/checkout@v5
2753

28-
- name: "Set up Python"
54+
- name: Set up Python
2955
uses: actions/setup-python@v5
3056
with:
3157
python-version: ${{ matrix.python-version }}
32-
# python-version-file: "pyproject.toml"
3358

3459
- name: Install uv
3560
uses: astral-sh/setup-uv@v6
61+
with:
62+
enable-cache: true
3663

3764
- name: Install the project
38-
run: uv sync --all-extras --dev
39-
# TODO Better to use --locked for author control over versions?
40-
# run: uv sync --locked --all-extras --dev
65+
run: uv sync --locked --all-extras --dev
4166

4267
- name: Start Redis
4368
uses: supercharge/redis-github-action@1.8.1
@@ -47,12 +72,8 @@ jobs:
4772
- name: Run tests with coverage
4873
run: uv run pytest --cov=atdata --cov-report=xml --cov-report=term
4974

50-
# - name: Upload coverage to Codecov
51-
# uses: codecov/codecov-action@v5
52-
# with:
53-
# # file: ./coverage.xml # Claude hallucination -- fascinating!
54-
# fail_ci_if_error: false
55-
# token: ${{ secrets.CODECOV_TOKEN }}
56-
57-
58-
#
75+
- name: Upload coverage to Codecov
76+
uses: codecov/codecov-action@v5
77+
with:
78+
fail_ci_if_error: false
79+
token: ${{ secrets.CODECOV_TOKEN }}

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
2525
- **Comprehensive integration test suite**: 593 tests covering E2E flows, error handling, edge cases
2626

2727
### Changed
28+
- Review GitHub workflows and recommend CI improvements (#405)
2829
- Fix type signatures for Dataset.ordered and Dataset.shuffled (GH#28) (#404)
2930
- Investigate quartodoc Example section rendering - missing CSS classes on pre/code tags (#401)
3031
- Update all docstrings from Example: to Examples: format (#403)

src/atdata/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -88,4 +88,4 @@
8888
from . import atmosphere as atmosphere
8989

9090
# CLI entry point
91-
from .cli import main as main
91+
from .cli import main as main

src/atdata/_cid.py

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,9 @@ def generate_cid(data: Any) -> str:
6464
# Build raw CID bytes:
6565
# CIDv1 = version(1) + codec(dag-cbor) + multihash
6666
# Multihash = code(sha256) + size(32) + digest
67-
raw_cid_bytes = bytes([CID_VERSION_1, CODEC_DAG_CBOR, HASH_SHA256, SHA256_SIZE]) + sha256_hash
67+
raw_cid_bytes = (
68+
bytes([CID_VERSION_1, CODEC_DAG_CBOR, HASH_SHA256, SHA256_SIZE]) + sha256_hash
69+
)
6870

6971
# Encode to base32 multibase string
7072
return libipld.encode_cid(raw_cid_bytes)
@@ -87,7 +89,9 @@ def generate_cid_from_bytes(data_bytes: bytes) -> str:
8789
>>> cid = generate_cid_from_bytes(cbor_bytes)
8890
"""
8991
sha256_hash = hashlib.sha256(data_bytes).digest()
90-
raw_cid_bytes = bytes([CID_VERSION_1, CODEC_DAG_CBOR, HASH_SHA256, SHA256_SIZE]) + sha256_hash
92+
raw_cid_bytes = (
93+
bytes([CID_VERSION_1, CODEC_DAG_CBOR, HASH_SHA256, SHA256_SIZE]) + sha256_hash
94+
)
9195
return libipld.encode_cid(raw_cid_bytes)
9296

9397

src/atdata/_helpers.py

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,8 @@
2222

2323
##
2424

25-
def array_to_bytes( x: np.ndarray ) -> bytes:
25+
26+
def array_to_bytes(x: np.ndarray) -> bytes:
2627
"""Convert a numpy array to bytes for msgpack serialization.
2728
2829
Uses numpy's native ``save()`` format to preserve array dtype and shape.
@@ -37,10 +38,11 @@ def array_to_bytes( x: np.ndarray ) -> bytes:
3738
Uses ``allow_pickle=True`` to support object dtypes.
3839
"""
3940
np_bytes = BytesIO()
40-
np.save( np_bytes, x, allow_pickle = True )
41+
np.save(np_bytes, x, allow_pickle=True)
4142
return np_bytes.getvalue()
4243

43-
def bytes_to_array( b: bytes ) -> np.ndarray:
44+
45+
def bytes_to_array(b: bytes) -> np.ndarray:
4446
"""Convert serialized bytes back to a numpy array.
4547
4648
Reverses the serialization performed by ``array_to_bytes()``.
@@ -54,5 +56,5 @@ def bytes_to_array( b: bytes ) -> np.ndarray:
5456
Note:
5557
Uses ``allow_pickle=True`` to support object dtypes.
5658
"""
57-
np_bytes = BytesIO( b )
58-
return np.load( np_bytes, allow_pickle = True )
59+
np_bytes = BytesIO(b)
60+
return np.load(np_bytes, allow_pickle=True)

src/atdata/_hf_api.py

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,6 @@
4646

4747
if TYPE_CHECKING:
4848
from ._protocols import AbstractIndex
49-
from .local import S3DataStore
5049

5150
##
5251
# Type variables
@@ -77,6 +76,7 @@ class DatasetDict(Generic[ST], dict):
7776
>>> for split_name, dataset in ds_dict.items():
7877
... print(f"{split_name}: {len(dataset.shard_list)} shards")
7978
"""
79+
8080
# TODO The above has a line for "Parameters:" that should be "Type Parameters:"; this is a temporary fix for `quartodoc` auto-generation bugs.
8181

8282
def __init__(
@@ -464,7 +464,7 @@ def _resolve_indexed_path(
464464
data_urls = entry.data_urls
465465

466466
# Check if index has a data store
467-
if hasattr(index, 'data_store') and index.data_store is not None:
467+
if hasattr(index, "data_store") and index.data_store is not None:
468468
store = index.data_store
469469

470470
# Import here to avoid circular imports at module level
@@ -638,7 +638,9 @@ def load_dataset(
638638
source, schema_ref = _resolve_indexed_path(path, index)
639639

640640
# Resolve sample_type from schema if not provided
641-
resolved_type: Type = sample_type if sample_type is not None else index.decode_schema(schema_ref)
641+
resolved_type: Type = (
642+
sample_type if sample_type is not None else index.decode_schema(schema_ref)
643+
)
642644

643645
# Create dataset from the resolved source (includes credentials if S3)
644646
ds = Dataset[resolved_type](source)
@@ -647,7 +649,9 @@ def load_dataset(
647649
# Indexed datasets are single-split by default
648650
return ds
649651

650-
return DatasetDict({"train": ds}, sample_type=resolved_type, streaming=streaming)
652+
return DatasetDict(
653+
{"train": ds}, sample_type=resolved_type, streaming=streaming
654+
)
651655

652656
# Use DictSample as default when no type specified
653657
resolved_type = sample_type if sample_type is not None else DictSample

src/atdata/_protocols.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,6 @@
3232
from typing import (
3333
IO,
3434
Any,
35-
ClassVar,
3635
Iterator,
3736
Optional,
3837
Protocol,

src/atdata/_schema_codec.py

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -203,7 +203,9 @@ def schema_to_type(
203203
namespace={
204204
"__post_init__": lambda self: PackableSample.__post_init__(self),
205205
"__schema_version__": version,
206-
"__schema_ref__": schema.get("$ref", None), # Store original ref if available
206+
"__schema_ref__": schema.get(
207+
"$ref", None
208+
), # Store original ref if available
207209
},
208210
)
209211

@@ -239,7 +241,9 @@ def _field_type_to_stub_str(field_type: dict, optional: bool = False) -> str:
239241

240242
if kind == "primitive":
241243
primitive = field_type.get("primitive", "str")
242-
py_type = primitive # str, int, float, bool, bytes are all valid Python type names
244+
py_type = (
245+
primitive # str, int, float, bool, bytes are all valid Python type names
246+
)
243247
elif kind == "ndarray":
244248
py_type = "NDArray[Any]"
245249
elif kind == "array":

0 commit comments

Comments
 (0)