Fix partitioning regression#49
Conversation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR modernizes the repository’s Python packaging and CI setup (moving from Poetry to PEP 621 + Hatchling and introducing uv-based CI), and updates the connector base class to support PortalApi dependency injection to improve testability.
Changes:
- Migrate packaging metadata to PEP 621 in
pyproject.tomland switch build backend to Hatchling; removepoetry.lock. - Add GitHub Actions CI workflow to install dependencies via
uvand run pytest. - Update
AbstractConnectorto accept an optionalPortalApiinstance and reuse it across concurrent connector tasks.
Reviewed changes
Copilot reviewed 4 out of 6 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| pyproject.toml | Replaces Poetry config with PEP 621 project metadata, dependency declarations, and Hatchling build backend. |
| poetry.lock | Removed Poetry lockfile as part of packaging tool migration. |
| cdip_connector/core/connector_base.py | Adds PortalApi dependency injection and reuses the same portal instance when spawning concurrent tasks. |
| CLAUDE.md | Adds contributor guidance and architecture notes for Claude Code usage. |
| .github/workflows/ci.yml | Introduces a uv-based CI job that installs deps and runs tests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for idx in range(0, len(integrations), self.concurrency): | ||
| tasks = [ | ||
| asyncio.ensure_future(self.__class__().extract_load(integration)) | ||
| asyncio.ensure_future(self.__class__(portal=self.portal).extract_load(integration)) | ||
| for integration in integrations[idx: idx + self.concurrency] | ||
| ] |
There was a problem hiding this comment.
main() constructs per-integration tasks with self.__class__(portal=self.portal). This will break any connector subclass that overrides __init__ without accepting a portal kwarg (a common pattern), making this a potentially breaking API change for downstream connectors. Consider instantiating without kwargs and then assigning portal, or provide a dedicated factory/clone method in the base class that injects the shared PortalApi without requiring subclass constructor changes.
| [project] | ||
| name = "cdip_connector" | ||
| version = "1.5.7" | ||
| version = "1.7.0" | ||
| description = "SMART Integrate Connector Library" |
There was a problem hiding this comment.
The PR title (“Fix partitioning regression”) doesn’t match the actual changes here (packaging migration to Hatch/PEP 621, new CI workflow, contributor docs, and connector DI). Please align the PR title (or description) with the scope of the changes so release notes/reviews aren’t misleading.
| - run: uv sync --group dev | ||
|
|
||
| - run: uv run pytest |
There was a problem hiding this comment.
CI runs uv sync --group dev without enforcing that uv.lock is up-to-date (e.g., via --locked/--frozen). That means dependency drift can slip in (pyproject changes not reflected in the lockfile) while CI still passes. Consider making CI fail when the lockfile is missing/out-of-sync.
| test: | ||
| runs-on: ubuntu-latest | ||
| steps: | ||
| - uses: actions/checkout@v4 | ||
|
|
||
| - uses: astral-sh/setup-uv@v5 | ||
|
|
There was a problem hiding this comment.
The workflow doesn’t pin a Python version. Since ubuntu-latest can change its default Python over time, CI results may unexpectedly change or fail even if the project hasn’t. Consider explicitly selecting a Python version (or a test matrix) consistent with requires-python.
| - **`cdip_settings.py`** — All configuration via environment variables using `environs`. Includes Keycloak auth settings, API endpoints, GCP settings, and job partitioning (JOB_COMPLETION_INDEX/COUNT). Can load from a custom env file via `CDIP_SDK_ENVFILE`. | ||
| - **`schemas/__init__.py`** — Re-exports all schemas from `gundi_core.schemas` for backward compatibility. | ||
| - **`routing.py`** — Pub/Sub topic name definitions for the observation processing pipeline. | ||
| - **`cloudstorage.py`** — Abstract `CloudStorage` with `GoogleCloudStorage` and `LocalStorage` implementations for camera trap image handling. |
There was a problem hiding this comment.
CLAUDE.md references a GoogleCloudStorage implementation, but the current class in cdip_connector/core/cloudstorage.py is named GoogleCouldStorage (note the missing “l”). This mismatch can mislead contributors; either update the doc to match the actual symbol name or rename the class for consistency.
| - **`cloudstorage.py`** — Abstract `CloudStorage` with `GoogleCloudStorage` and `LocalStorage` implementations for camera trap image handling. | |
| - **`cloudstorage.py`** — Abstract `CloudStorage` with `GoogleCouldStorage` and `LocalStorage` implementations for camera trap image handling. |
| "opentelemetry-propagator-gcp>=1.3.0,<2", | ||
| "opentelemetry-instrumentation-requests>=0.35b0,<1", | ||
| "opentelemetry-exporter-otlp-proto-http>=1.14.0,<2", | ||
| "opentelemetry-instrumentation-httpx>=0.35b0,<1", |
There was a problem hiding this comment.
cdip_connector imports and uses httpx (e.g., AsyncClient in connector_base.py), but httpx is not declared in [project].dependencies. Relying on it as a transitive dependency (e.g., via gundi-client) can break installs if upstream changes; add an explicit httpx requirement here.
| "opentelemetry-instrumentation-httpx>=0.35b0,<1", | |
| "opentelemetry-instrumentation-httpx>=0.35b0,<1", | |
| "httpx>=0.23.0,<1", |
This pull request modernizes the project by migrating from Poetry to PEP 621 (
pyproject.toml) metadata and Hatch for builds, introduces a GitHub Actions CI workflow, and makes the core connector base class easier to test by allowing dependency injection. It also adds a comprehensiveCLAUDE.mdguide for contributors using Claude Code.Build system and dependency management modernization:
pyproject.toml, using Hatch as the build backend and updating dependency and dev dependency declarations accordingly. This simplifies dependency management and aligns with modern Python packaging standards.Continuous integration improvements:
.github/workflows/ci.yml) to automate dependency installation, run tests, and ensure CI coverage using theuvpackage manager.Developer experience and documentation:
CLAUDE.md, a detailed guide for using Claude Code with this repository, covering project architecture, build/dev instructions, and connector patterns.Codebase maintainability and testability:
AbstractConnectorincdip_connector/core/connector_base.pyto accept an optionalPortalApiinstance, enabling easier dependency injection for testing and subclassing.PortalApiinstance is reused, improving consistency and testability.Minor improvements:
Optionalincdip_connector/core/connector_base.py.