Skip to content

πŸ”’ Harden DI container string type resolution: remove type builtin, consistent error handling#238

Open
Copilot wants to merge 26 commits intofix-eval-vulnerability-di-container-15906456217921985240from
copilot/sub-pr-229
Open

πŸ”’ Harden DI container string type resolution: remove type builtin, consistent error handling#238
Copilot wants to merge 26 commits intofix-eval-vulnerability-di-container-15906456217921985240from
copilot/sub-pr-229

Conversation

Copy link
Contributor

Copilot AI commented Mar 17, 2026

The _safe_eval_type method had two issues: type was included in safe_builtins (unnecessary and exploitable via type('X', (object,), {}) with ast.Call allowed), and error handling was inconsistent β€” SyntaxError returned None while TypeError from TypeValidator propagated up.

Changes

  • src/codeweaver/core/di/container.py

    • Removed type from safe_builtins; added inline comment explaining the exclusion
    • TypeValidator now raises TypeError (per TRY004); wrapped in try/except TypeError: return None β€” method now consistently returns None for all invalid inputs
    • Updated return type annotation to Any | None and docstring to match
  • tests/di/test_container_security.py

    • Added from __future__ import annotations, pytestmark, and per-function docstrings to match project conventions
    • Added test_type_builtin_not_exploitable verifying type('X', (object,), {}) is blocked
  • Pre-existing lint fixes (required to unblock push after rebase onto main)

    • node_type_parser.py: assert adapter is not None β†’ if adapter is None: return False
    • providers/config/clients/agent.py: removed stale ty: ignore[invalid-method-override]
    • providers/config/clients/multi.py: removed unused # type: ignore from else branches
    • scripts/model_data/mteb_to_codeweaver.py: added # ty: ignore[invalid-type-form] to Literal[Provider.X] members (ty doesn't recognize BaseEnum subclass members as valid Literal args)

πŸ“ Connect Copilot coding agent with Jira, Azure Boards or Linear to delegate work to Copilot in one click without leaving your project management tool.

bashandbone and others added 25 commits March 17, 2026 01:22
…ly (#227)

* Refactor: Remove bare `except Exception:` suppressing errors silently

Modifies `_resolve_indexer_from_container` in `src/codeweaver/server/agent_api/search/__init__.py` to capture exceptions as `e` and log them using `logger.warning`. This improves observability and code health by preventing silent failures without changing the return logic.

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com>

---------

Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com>
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
The `combined_lifespan` assignment and associated comment in `src/codeweaver/server/lifespan.py` were unused dead code that provided a deprecated alias. Removing this improves code health and maintainability by keeping the module surface smaller and less confusing. All tests pass with no regressions.

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
…imports and sys.path (#214)

* Initial plan

* Fix mteb_to_codeweaver.py: add KNOWN_ALIASES, fix init block, fix imports and sys.path

* Apply suggestions from code review

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com>

* fix: .gitignore

---------

Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
Co-authored-by: Adam Poulemanos <adam@knit.li>
Updates test_error_recovery.py and test_health_monitoring.py to align
with the BaseEnum interface by using `.variable` on CircuitBreakerState
enum entries instead of `.value` or string literals. Also removed
the `# FIX:` comments regarding this change.

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
* perf: use tuple instead of list for enum membership check

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>

* fix(di): correctly resolve Union[None] type to None

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>

* fix(di): properly handle NoneType inside Union checking

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>

---------

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
* perf: optimize membership check in _check_profile by using set instead of list

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>

* ci: add Copilot to allowed_bots in claude-code-action

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com>

---------

Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com>
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* test(core): add unit tests for DiscoveredFile.absolute_path

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>

* fix(ci): add allowed_bots and allowed_non_write_users to PR review comment job

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>

* fix(ci): add allowed_bots and allowed_non_write_users to PR review comment job

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>

* fix(ci): add allowed_bots to Issue Assigned job

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>

* fix(ci): add allowed bots to all jobs

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>

* fix: handle missing optional imports safely in Python 3.14t and 3.13t

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>

* fix: correctly handle ImportError in Python 3.14t for `uuid_extensions`

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>

---------

Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com>
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
* perf: Optimize membership check for HTML block tags by using set literal

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>

* fix(di): allow type(None) in `_get_signature_and_hints` on Python 3.13

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>

* fix: restore duckduckgo types and handle telemetry safely

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>

* perf: Optimize membership check for HTML block tags by using set literal

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com>

---------

Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com>
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* tests: fix mock_provider_lazy configuration in client factory integration tests

Added missing `__name__` and `return_value` assignments to `mock_provider_lazy` in Qdrant provider integration tests. This ensures the mocks are consistent with how they are used by the provider factory and matches the pattern used for other providers in the same test file.

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>

* refactor: consolidate mock_provider_lazy setup and improve comments

- Implemented `make_lazy_provider_mock` helper to centralize mock configuration.
- Replaced brittle line-number comments with behavioral descriptions.
- Updated all Voyage and Qdrant provider tests to use the new helper.

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>

---------

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Changed list membership check to a set to leverage Python's constant-folded frozenset optimization.
This provides O(1) lookup time instead of O(n) and avoids list creation at runtime.

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
* test: add coverage for force shutdown conditional branches

Added comprehensive unit tests for `_setup_signal_handler` in `src/codeweaver/main.py`
to cover conditional logic within the `force_shutdown_handler` closure.
The tests verify correct behavior for initial and subsequent SIGINT interruptions,
as well as standard error handling upon signal registration failure.

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>

* test: add coverage for force shutdown conditional branches

Added comprehensive unit tests for `_setup_signal_handler` in `src/codeweaver/main.py`
to cover conditional logic within the `force_shutdown_handler` closure.
The tests verify correct behavior for initial and subsequent SIGINT interruptions,
as well as standard error handling upon signal registration failure.

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>

* test: add coverage for force shutdown conditional branches

Added comprehensive unit tests for `_setup_signal_handler` in `src/codeweaver/main.py`
to cover conditional logic within the `force_shutdown_handler` closure.
The tests verify correct behavior for initial and subsequent SIGINT interruptions,
as well as standard error handling upon signal registration failure.

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>

* test: add coverage for force shutdown conditional branches

Added comprehensive unit tests for `_setup_signal_handler` in `src/codeweaver/main.py`
to cover conditional logic within the `force_shutdown_handler` closure.
The tests verify correct behavior for initial and subsequent SIGINT interruptions,
as well as standard error handling upon signal registration failure.

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com>

* test: add coverage for force shutdown conditional branches

Added comprehensive unit tests for `_setup_signal_handler` in `src/codeweaver/main.py`
to cover conditional logic within the `force_shutdown_handler` closure.
The tests verify correct behavior for initial and subsequent SIGINT interruptions,
as well as standard error handling upon signal registration failure.

Also added Copilot to allowed_bots and allowed_non_write_users in `.github/workflows/claude.yml`
to prevent CI failures.

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>

* test: add coverage for force shutdown conditional branches

Added comprehensive unit tests for `_setup_signal_handler` in `src/codeweaver/main.py`
to cover conditional logic within the `force_shutdown_handler` closure.
The tests verify correct behavior for initial and subsequent SIGINT interruptions,
as well as standard error handling upon signal registration failure.

Also fixed Copilot permissions in `.github/workflows/claude.yml` to resolve CI failure.

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>

* test: add coverage for force shutdown conditional branches

Added comprehensive unit tests for `_setup_signal_handler` in `src/codeweaver/main.py`
to cover conditional logic within the `force_shutdown_handler` closure.
The tests verify correct behavior for initial and subsequent SIGINT interruptions,
as well as standard error handling upon signal registration failure.

Also fixed Copilot permissions in `.github/workflows/claude.yml` to resolve CI failure.

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>

* test: add coverage for force shutdown conditional branches

Added comprehensive unit tests for `_setup_signal_handler` in `src/codeweaver/main.py`
to cover conditional logic within the `force_shutdown_handler` closure.
The tests verify correct behavior for initial and subsequent SIGINT interruptions,
as well as standard error handling upon signal registration failure.

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>

---------

Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com>
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* test: Add tests for `get_version` function

Added comprehensive unit tests for `get_version` in `tests/unit/test_init.py`. The tests verify that all mechanisms to retrieve the package version function as intended:
- Reading `codeweaver._version.__version__`
- Using `importlib.metadata.version`
- Falling back to `git describe`
- Handling missing `git` command
- Handling broader unexpected exceptions
All fallback scenarios now properly ensure the code returns the default `0.0.0` string or the valid version.

No functional code was changed, only new tests were introduced.

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com>

* test: Add tests for `get_version` function

Added unit tests for `get_version` in `tests/unit/test_init.py`. The tests verify that all mechanisms to retrieve the package version function as intended:
- Reading `codeweaver._version.__version__`
- Using `importlib.metadata.version`
- Falling back to `git describe`
- Handling missing `git` command
- Handling broader unexpected exceptions

Uses `monkeypatch` safely by keeping original references to avoid polluting the global namespace and Pytest internal packages. No functional code changes.

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>

* test: Add tests for `get_version` function

Added comprehensive unit tests for `get_version` in `tests/unit/test_init.py`. The tests verify that all mechanisms to retrieve the package version function as intended:
- Reading `codeweaver._version.__version__`
- Using `importlib.metadata.version`
- Falling back to `git describe`
- Handling missing `git` command
- Handling broader unexpected exceptions

Uses `monkeypatch` safely by keeping original references to avoid polluting the global namespace and Pytest internal packages. No functional code changes.

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>

* test: Add tests for `get_version` function

Added comprehensive unit tests for `get_version` in `tests/unit/test_init.py`. The tests verify that all mechanisms to retrieve the package version function as intended:
- Reading `codeweaver._version.__version__`
- Using `importlib.metadata.version`
- Falling back to `git describe`
- Handling missing `git` command
- Handling broader unexpected exceptions

Uses `monkeypatch` safely by keeping original references to avoid polluting the global namespace and Pytest internal packages. No functional code changes.

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>

* test: add robust coverage for get_version fallbacks

Adds comprehensive tests for `codeweaver.get_version()` in `tests/unit/test_init.py`.
Properly mocks the module import mechanisms, importlib.metadata, and git subprocess calls
to ensure all fallback paths (file -> metadata -> git -> default) are tested.
Addresses PR feedback by isolating the `_original_version` scope and adding explanatory
comments about module reloading requirements during tests. Removes `test_investigate.py`.

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>

* fix: spdx information

---------

Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com>
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
* πŸ”’ Replace insecure pickle with JSON for node types cache

Mitigate insecure deserialization vulnerability by switching the
tree-sitter node types cache from pickle to JSON.

- Updated scripts/build/preprocess-node-types.py to serialize cache to JSON.
- Updated src/codeweaver/semantic/node_type_parser.py to load and validate
  the JSON cache using pydantic.TypeAdapter.
- Corrected build and CI artifact paths in mise.dev.toml.
- Updated documentation in src/codeweaver/semantic/data/__init__.py.

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>

* πŸ”’ Switch node types cache to JSON and fix CI issues

- Switch tree-sitter node types cache from pickle to JSON to mitigate
  insecure deserialization (S301).
- Use Pydantic's TypeAdapter for safe validation and loading.
- Fix CI failures on Python 3.14 by using the project's internal uuid7
  utility instead of uuid_extensions.
- Clean up classification_result during cache loading to ensure fresh
  recomputation.
- Correct artifact paths in mise.dev.toml and documentation.

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>

* Potential fix for pull request finding 'Unused import'

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com>

* Fix JSON node types cache: commit missing file and repair union deserialization (#235)

* Initial plan

* Fix JSON cache loading: resolve connection union, fix type errors, extract helper methods, add cache file

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>

* Address code review: clarify noqa comment, improve circular-import doc, simplify adapter guard

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com>

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com>

---------

Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

---------

Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com>
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
…, consistent None returns, test improvements

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>
…, consistent None returns, test improvements

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>
…, consistent None returns, test improvements, fix pre-existing lint issues

Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>
Copilot AI changed the title [WIP] [1590645] Fix dangerous eval() in DI container resolution πŸ”’ Harden DI container string type resolution: remove type builtin, consistent error handling Mar 17, 2026
Copilot AI requested a review from bashandbone March 17, 2026 01:25
@bashandbone bashandbone marked this pull request as ready for review March 17, 2026 02:42
Copilot AI review requested due to automatic review settings March 17, 2026 02:42
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR primarily hardens CodeWeaver’s DI container string-type resolution to reduce the eval attack surface and make invalid-type handling consistent, while also including a semantic node-types cache migration (pickle β†’ JSON), dependency bumps, and various lint/test cleanups.

Changes:

  • Hardened _safe_eval_type in the DI container (remove type builtin, unify invalid-input behavior to return None, and handle NoneType resolution).
  • Migrated semantic node-types cache loading/generation from pickle to JSON with Pydantic validation.
  • Updated dependencies/locks and applied assorted test/lint refactors and reliability tweaks across providers, scripts, and tests.

Reviewed changes

Copilot reviewed 44 out of 51 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
uv.lock Updates locked dependency versions (multiple provider/client libs).
pyproject.toml Updates runtime dependency constraints (e.g., cyclopts, rich, openai, qdrant-client, huggingface-hub).
mise.toml Adds hk tool version to dev toolchain.
mise.dev.toml Updates build artifact checks/add steps for JSON node-types cache path.
hk.pkl Raises allowed max size for added files (supports large generated artifacts).
.gitignore Adjusts ignored directories/files (exportify config handling).
.github/workflows/claude.yml Expands allowed bot/non-write users for the Claude workflow.
src/codeweaver/core/di/container.py Hardens type-string eval and normalizes invalid type handling; adds NoneType resolution.
tests/di/test_container_security.py Aligns with unit-test conventions and adds regression test blocking type(...) exploitation.
src/codeweaver/semantic/node_type_parser.py Switches node-types cache loader from pickle to JSON + TypeAdapter validation and reconstruction.
src/codeweaver/semantic/data/init.py Updates package docs to reference JSON cache artifact.
src/codeweaver/semantic/data/node_types_cache.json New/updated generated node-types JSON cache artifact.
src/codeweaver/semantic/data/node_types_cache.json.license Adds SPDX license sidecar for generated JSON cache.
scripts/build/preprocess-node-types.py Generates JSON cache instead of pickle during build preprocessing.
src/codeweaver/semantic/ast_grep.py Fixes has_package usage to align with its boolean contract.
src/codeweaver/providers/vector_stores/inmemory.py Ensures persistence parent directory exists before creating Qdrant client.
src/codeweaver/providers/reranking/providers/base.py Minor style/robustness tweak (tuple membership check).
src/codeweaver/providers/embedding/capabilities/base.py Fixes has_package boolean usage.
src/codeweaver/providers/config/providers.py Fixes has_package boolean usage for default provider selection.
src/codeweaver/providers/config/profiles.py Adjusts default data-provider selection based on ddgs availability.
src/codeweaver/providers/config/clients/multi.py Improves optional imports and renames FastEmbed ONNX providers field with aliasing.
src/codeweaver/providers/config/clients/agent.py Removes stale type-ignore and fixes has_package boolean usage in TYPE_CHECKING block.
src/codeweaver/providers/config/backup_models.py Fixes fastembed/sentence-transformers availability checks to use boolean has_package.
src/codeweaver/engine/chunker/delimiters/custom.py Replaces inline HTML block tag list with a constant frozenset.
src/codeweaver/core/utils/generation.py Refactors UUID7 generator import behavior (but introduces a regression in timestamp parsing).
src/codeweaver/core/types/service_cards.py Updates DuckDuckGo tool import path and related comment.
src/codeweaver/core/statistics.py Refines profile comparison logic (avoids list membership).
src/codeweaver/cli/commands/index.py Uses set membership for confirmation checks.
src/codeweaver/server/agent_api/search/init.py Improves warning logs with exception context (exc_info=True).
src/codeweaver/server/lifespan.py Removes deprecated backward-compatibility alias.
tests/unit/test_main.py Adds new unit tests for signal handler behavior (needs unit-test lint alignment).
tests/unit/test_init.py Adds new unit tests for version resolution logic (needs unit-test lint alignment).
tests/unit/core/test_discovery.py Adds unit tests for DiscoveredFile path resolution behavior.
tests/unit/core/telemetry/test_privacy.py Marks test class with pytest.mark.unit.
tests/unit/config/test_versioned_profile.py Adds pytest.mark.unit to test classes.
tests/unit/cli/test_httpx_lazy_import.py Adds additional pytest markers to the test class.
tests/unit/core/types/test_embedding_batch_info_intent.py Switches UUID7 generator import to internal utility.
tests/unit/core/types/test_chunk_embeddings_properties.py Switches UUID7 generator import to internal utility.
tests/integration/conftest.py Fixes has_package boolean usage for feature flags.
tests/integration/chunker/config/test_client_factory_integration.py Adds helper for lazy provider mocks to reduce duplication.
tests/integration/server/test_health_monitoring.py Uses CircuitBreakerState.*.variable instead of hardcoded strings.
tests/integration/server/test_error_recovery.py Uses CircuitBreakerState.*.variable instead of .value.
tests/integration/providers/env_registry/test_definitions.py Adds pytest.mark.integration to multiple test classes.
scripts/model_data/mteb_to_codeweaver.py Refactors script imports/types and path setup (but currently imports Provider from a non-existent module).
scripts/model_data/secondary_providers.json Adds model lists for secondary providers.
scripts/model_data/secondary_providers.json.license Adds SPDX license for secondary providers JSON.
scripts/model_data/hf-models.json Updates HF model metadata cache (but currently malformed JSON).
scripts/model_data/hf-models.json.license Adds SPDX license for hf-models.json.
scripts/build/generate-mcp-server-json.py Minor import ordering adjustment.
docs/plans/2026-02-24-provider-config-consolidation.md Updates has_package examples to boolean usage.

πŸ’‘ Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

]
}
},
"models": {},
Comment on lines +3928 to +3930
{
"models": {}
} No newline at end of file
Comment on lines +6 to +16
import signal

from unittest.mock import patch

import pytest

from codeweaver.main import _setup_signal_handler


def test_setup_signal_handler_first_interrupt():
"""Test that the first interrupt raises KeyboardInterrupt."""
Comment on lines +1 to +20
# SPDX-FileCopyrightText: 2026 Knitli Inc.
#
# SPDX-License-Identifier: MIT OR Apache-2.0

import importlib.metadata
import shutil
import subprocess
import sys

from unittest.mock import MagicMock

import pytest

import codeweaver


@pytest.fixture
def mock_no_version_file(monkeypatch):
"""Remove _version module from sys.modules and block re-import from disk.


from codeweaver.core import Provider
from codeweaver.providers import PartialCapabilities
from codeweaver.providers.provider import Provider
Comment on lines +87 to +90
def _safe_eval_type(self, type_str: str, globalns: dict[str, Any]) -> Any | None:
"""Safely evaluate a type string using AST validation.

Parses the type string into an AST, validates that it contains only safe
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. This PR conflates multiple concerns which makes review difficult and rollback risky. The security fix itself is solid, but it should be separated from:

  1. Major dependency bumps (huggingface-hub 0.36β†’1.7, openai 2.17β†’2.28, etc.)
  2. Pickle→JSON cache migration
  3. Pre-existing lint fixes

Each deserves independent review and testing. If a dependency update causes issues, rolling back would also revert the security fix.

Comment on lines 50 to 55
if sys.version_info < (3, 14):
from uuid_extensions import time_ns, uuid_to_datetime

return uuid_to_datetime(uuid) if as_datetime else time_ns(uuid)
from uuid import uuid7

uuid = uuid7(uuid) if isinstance(uuid, str | int) else uuid
return (
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Hardening Review

I've reviewed PR #238 which hardens the DI container's string type resolution. Here's my assessment:

βœ… Security Improvements (Excellent)

The core security changes in src/codeweaver/core/di/container.py are well-designed and effective:

  1. Removed type builtin (line 150-162): Correctly excludes type from safe_builtins to prevent dynamic class creation via type('X', (object,), {}). The inline comment clearly explains the security rationale.

  2. Consistent error handling: Changed from mixed error handling (SyntaxError β†’ None, TypeError propagated) to uniform TypeError β†’ None returns. This is safer and more predictable.

  3. Updated return type: Properly changed return annotation to Any | None with matching docstring.

  4. Test coverage: test_type_builtin_not_exploitable() (line 98-102) properly validates the exploit is blocked.

  5. Added None type handling (line 644-645): Good defensive addition for edge cases.

⚠️ Lint Fixes Review

Most lint fixes are reasonable improvements:

  • Replacing has_package(x) is not None with has_package(x) is correct (assuming has_package returns bool)
  • Logic fix in backup_models.py:57: Changed and to or - this is correct (FastEmbed available if either package exists)
  • Import guards with try/except in multi.py add safety
  • Using is for enum comparisons in statistics.py is good practice
  • FastEmbedClientOptions refactor to use model_copy(update=...) instead of object.__setattr__ is cleaner

🚨 Critical Issue: Invalid JSON

scripts/model_data/hf-models.json is malformed:

Line 216:  "models": {},
Line 217:  "models": {    ← Duplicate key\!
...
Line 3927: }
Line 3928: {              ← Missing comma, new object instead of continuation
Line 3929:   "models": {}
Line 3930: }

This breaks JSON parsing. The file needs to be fixed - either:

  1. Remove duplicate "models": {} entries
  2. Add proper comma/structure between objects
  3. Regenerate the file if it's auto-generated

Minor Concerns

  1. .gitignore changes (line 232-233): Changed from .gemini/ to .exportify/. Consider if this should be in a separate PR focused on configuration changes.

  2. Workflow changes: Adding jules[bot] and copilot to allowed users/bots - ensure this aligns with security policies for automated PR creation.

  3. hk.pkl: Increased max file size from 10MB to 15MB - ensure this won't allow accidentally committing large artifacts.

Recommendation

Request changes due to the malformed JSON file. The security improvements are solid, but the broken JSON will cause runtime errors and must be fixed before merging.

Once the JSON is fixed, this PR will be ready to merge. The security hardening is well-implemented and the test coverage is appropriate.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Hardening Review

I've reviewed PR #238 which hardens the DI container's string type resolution. Here's my assessment:

βœ… Security Improvements (Excellent)

The core security changes in src/codeweaver/core/di/container.py are well-designed and effective:

  1. Removed type builtin (line 150-162): Correctly excludes type from safe_builtins to prevent dynamic class creation via type('X', (object,), {}). The inline comment clearly explains the security rationale.

  2. Consistent error handling: Changed from mixed error handling (SyntaxError β†’ None, TypeError propagated) to uniform TypeError β†’ None returns. This is safer and more predictable.

  3. Updated return type: Properly changed return annotation to Any | None with matching docstring.

  4. Test coverage: test_type_builtin_not_exploitable() (line 98-102) properly validates the exploit is blocked.

  5. Added None type handling (line 644-645): Good defensive addition for edge cases.

⚠️ Lint Fixes Review

Most lint fixes are reasonable improvements:

  • Replacing has_package(x) is not None with has_package(x) is correct (assuming has_package returns bool)
  • Logic fix in backup_models.py:57: Changed and to or - this is correct (FastEmbed available if either package exists)
  • Import guards with try/except in multi.py add safety
  • Using is for enum comparisons in statistics.py is good practice
  • FastEmbedClientOptions refactor to use model_copy(update=...) instead of object.__setattr__ is cleaner

🚨 Critical Issue: Invalid JSON

scripts/model_data/hf-models.json is malformed:

Line 216:  "models": {},
Line 217:  "models": {    ← Duplicate key\!
...
Line 3927: }
Line 3928: {              ← Missing comma, new object instead of continuation
Line 3929:   "models": {}
Line 3930: }

This breaks JSON parsing. The file needs to be fixed - either:

  1. Remove duplicate "models": {} entries
  2. Add proper comma/structure between objects
  3. Regenerate the file if it's auto-generated

Minor Concerns

  1. .gitignore changes (line 232-233): Changed from .gemini/ to .exportify/. Consider if this should be in a separate PR focused on configuration changes.

  2. Workflow changes: Adding jules[bot] and copilot to allowed users/bots - ensure this aligns with security policies for automated PR creation.

  3. hk.pkl: Increased max file size from 10MB to 15MB - ensure this won't allow accidentally committing large artifacts.

Recommendation

Requesting changes due to the malformed JSON file. The security improvements are solid, but the broken JSON will cause runtime errors and must be fixed before merging.

Once the JSON is fixed, this PR will be ready to merge. The security hardening is well-implemented and the test coverage is appropriate.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

I've reviewed PR #238 focusing on the security hardening changes to the DI container and the broader codebase modifications. Here's my detailed analysis:

βœ… Security Hardening (Primary Changes)

DI Container (src/codeweaver/core/di/container.py)

The security improvements to _safe_eval_type are excellent:

  1. Removal of type from safe_builtins (lines 148-149)

    • βœ… Correct decision - type is not needed for type annotation resolution
    • βœ… Prevents exploitation via type('X', (object,), {}) with ast.Call nodes
    • βœ… Well-documented inline comment explains the rationale
  2. Consistent error handling (lines 141-144)

    • βœ… Now wraps TypeValidator in try/except TypeError to return None
    • βœ… Matches the existing SyntaxError handling behavior
    • βœ… Method now consistently returns None for all invalid inputs
    • βœ… Updated return type annotation to Any | None (line 87)
  3. Test coverage (tests/di/test_container_security.py)

    • βœ… New test_type_builtin_not_exploitable verifies the fix (lines 98-103)
    • βœ… Added proper module docstring and from __future__ import annotations
    • βœ… Added pytestmark for test categorization

Minor improvement note: The eval() call at line 165 has a noqa: S307 comment, which is appropriate since the security is enforced through AST validation and restricted builtins.

⚠️ Concerns & Questions

1. Large scope - many unrelated changes
This PR includes 51 changed files with multiple concerns mixed together:

  • Security fixes (main focus)
  • Pre-existing lint fixes across multiple files
  • Workflow configuration changes (.github/workflows/claude.yml)
  • Dependency version updates (pyproject.toml)
  • File format changes (node_types_cache.pkl β†’ node_types_cache.json)
  • Multiple lint suppressions added

Recommendation: Consider splitting this into focused PRs for easier review and rollback if needed.

2. hf-models.json malformation (lines 216, 3928-3930)

"models": {},  // Line 216 - duplicate key
"models": {
  // ... 3700+ lines ...
}
{  // Line 3928 - Invalid JSON structure
    "models": {}
}

This file has invalid JSON with:

  • Duplicate "models" key at line 216
  • Orphaned object at the end (lines 3928-3930)

Action required: This will break JSON parsing and needs immediate fix.

3. Type ignore annotations in mteb_to_codeweaver.py (lines 375-381)

type HFModelProviders = Literal[
    Provider.FASTEMBED,  # ty: ignore[invalid-type-form]
    Provider.HUGGINGFACE_INFERENCE,  # ty: ignore[invalid-type-form]
    # ... etc
]

While the comment explains this is because ty doesn't recognize BaseEnum subclass members, consider:

  • Is this a ty bug that should be reported?
  • Could the type definition be restructured to avoid this?

4. FastEmbedClientOptions field aliasing (src/codeweaver/providers/config/clients/multi.py, lines 243-246)
The change from providers to onnx_providers with aliasing is good for clarity, but:

  • Ensure backward compatibility if this is a user-facing API
  • Consider deprecation notice if the old name was public

5. Import guards and type stubs (lines 46-65 in multi.py)
Multiple conditional imports with fallback to Any:

if has_package("fastembed") or has_package("fastembed_gpu"):
    try:
        from fastembed.common.types import OnnxProvider
    except ImportError:
        OnnxProvider = Any  # type: ignore[assignment, misc]

This pattern could hide import errors. Consider logging when falling back to Any.

βœ… Positive Changes

  1. Consistent use of has_package() - Many places changed from has_package(...) is not None to just has_package(...), which is cleaner
  2. Error handling improvements in container.py - More defensive checks for None
  3. Better code organization - Added docstrings and type hints in tests
  4. GitHub Actions updates - Added Jules bot to allowed users/bots

πŸ” Testing Status

CI checks show:

  • βœ… CLA check passed
  • βœ… Socket security passed
  • ⏳ Claude response pending

Recommendations

Before merging:

  1. Fix the hf-models.json syntax errors (critical)
  2. Verify the node_types_cache.json format change doesn't break existing deployments
  3. Run full test suite to ensure all 51 file changes are compatible
  4. Consider splitting unrelated changes into separate PRs

Security assessment: The core security fixes are sound and well-tested. The removal of type from safe builtins is the right call, and the consistent error handling is a good improvement.

Code quality: Generally good, but the PR scope is too broad for optimal review. The security fixes themselves are excellent.

Verdict

The security hardening changes are approved and ready to merge from a security perspective. However, the JSON syntax error must be fixed first, and I recommend addressing the scope concerns for future PRs.

Would you like me to prepare a fix for the hf-models.json issue?

@github-actions
Copy link
Contributor

I've reviewed this PR and I agree with the concern raised by Copilot about the PR scope.

Review Summary

Security Fix (DI Container) βœ…
The core security changes to container.py and test_container_security.py are excellent:

  • Removing type from safe_builtins closes a potential exploit vector (type('X', (object,), {}))
  • Consistent None return handling improves API predictability
  • The new test test_type_builtin_not_exploitable provides good coverage
  • Return type annotation Any | None accurately reflects behavior

Scope Concern ⚠️
However, this PR contains significantly more than the security fix described in the title/description:

  1. Major dependency bumps that could introduce breaking changes:

    • huggingface-hub: 0.36.2 β†’ 1.7.1 (major version bump)
    • openai: 2.17.0 β†’ 2.28.0
    • pydantic-ai-slim: 1.56.0 β†’ 1.68.0
    • cohere, qdrant-client, cyclopts, rich, and others
  2. Format migration: Pickle β†’ JSON for node types cache (in preprocess-node-types.py and node_type_parser.py)

  3. CI/workflow configuration changes to allowed bots/users

  4. Multiple pre-existing lint fixes across various files

Recommendations

  1. Split this PR into separate PRs:

    • Security fix (DI container hardening) - should be merged ASAP
    • Dependency updates - needs separate testing/validation
    • Cache format migration - deserves its own review for rollback strategy
    • Lint fixes - can be batched separately
  2. Update the PR description to comprehensively document all changes if keeping them together

  3. Consider rollback complexity: If any of the dependency updates cause issues in production, rolling back this atomic commit would also revert the security fix.

The security improvements are solid, but the PR scope makes review, testing, and potential rollback more risky than necessary.

@github-actions
Copy link
Contributor

🚨 Critical Issue Found: Invalid JSON in hf-models.json

I've confirmed that scripts/model_data/hf-models.json has invalid JSON syntax that will cause parsing failures:

Issues:

  1. Duplicate models key at line 216 (after provider_to_models block)
  2. Orphaned JSON object at the end of the file (lines 3928-3930)

Validation:

$ python3 -m json.tool scripts/model_data/hf-models.json
# Returns: Invalid JSON

Expected Structure:

{
  "provider_to_models": { ... },
  "models": { ... }
}

Current (Broken) Structure:

{
  "provider_to_models": { ... },
  "models": {},          // ❌ Empty duplicate
  "models": { ... },      // ❌ Duplicate key
}                          // ❌ Extra closing brace
{
  "models": {}            // ❌ Orphaned object
}

This needs to be fixed before merging to avoid runtime failures when this file is parsed.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Security Hardening for DI Container

Summary

This PR addresses important security vulnerabilities in the DI container's type evaluation mechanism and includes necessary test improvements and lint fixes. The changes are well-structured and thoroughly tested.

Key Security Improvements βœ…

1. Removed type builtin from safe_builtins

  • File: src/codeweaver/core/di/container.py:148-149
  • Excellent fix! The type builtin was unnecessary for type resolution and could be exploited via type('X', (object,), {}) to create dynamic classes
  • Clear inline comment explains the exclusion rationale
  • Test coverage added in test_type_builtin_not_exploitable

2. Consistent error handling

  • File: src/codeweaver/core/di/container.py:87,141-144
  • TypeError from TypeValidator is now caught and returns None
  • Previously SyntaxError returned None but TypeError propagated - now consistent
  • Return type properly updated to Any | None
  • Docstring accurately reflects behavior

3. Enhanced test coverage

  • tests/di/test_container_security.py follows project conventions with:
    • Module docstring βœ…
    • from __future__ import annotations βœ…
    • pytestmark for test categorization βœ…
    • Per-function docstrings βœ…
    • Return type annotations βœ…

Issue Found: test_main.py Linting πŸ”΄

File: tests/unit/test_main.py

Unit tests under tests/unit/** are not covered by the tests/*.py Ruff per-file ignores (see ruff.toml:179-197), so they must satisfy stricter linting rules:

Required changes:

  1. Add module docstring
  2. Add from __future__ import annotations
  3. Add return type annotations (-> None) to all test functions

Suggested fix:

# SPDX headers...

"""Unit tests for signal handler setup in main module.

This module tests the signal handler registration and interrupt handling
logic in the CodeWeaver main entry point.
"""

from __future__ import annotations

import signal

from unittest.mock import patch

import pytest

from codeweaver.main import _setup_signal_handler


def test_setup_signal_handler_first_interrupt() -> None:
    """Test that the first interrupt raises KeyboardInterrupt."""
    # ... rest of test

def test_setup_signal_handler_second_interrupt() -> None:
    """Test that the second interrupt exits immediately."""
    # ... rest of test

def test_setup_signal_handler_suppress_errors() -> None:
    """Test that ValueError and OSError are suppressed when setting the signal."""
    # ... rest of test

Pre-existing Lint Fixes βœ…

The PR includes several necessary lint fixes to unblock push after rebase:

  1. node_type_parser.py: Changed assert adapter is not None to proper conditional check
  2. providers/config/clients/agent.py: Removed stale ty:ignore[invalid-method-override]
  3. providers/config/clients/multi.py: Removed unused type ignores from else branches
  4. scripts/model_data/mteb_to_codeweaver.py: Added proper type ignore comments for Literal[Provider.X] members

All these are appropriate fixes that improve code quality.

Additional Observations

Performance Optimizations:

  • src/codeweaver/cli/commands/index.py:128: Changed list to set for membership check (good optimization!)
  • src/codeweaver/engine/chunker/delimiters/custom.py:360: Used frozenset for HTML block tags

Code Quality:

  • Clear comments explaining security decisions
  • Comprehensive test coverage for security-critical code
  • Docstrings properly updated to reflect behavior

Recommendation

Request Changes - The PR needs the test_main.py linting issue fixed before merge. Once that's addressed, this will be ready to merge.

Testing

  • Security tests are comprehensive and well-written
  • Test coverage for the type builtin exclusion is excellent
  • All tests follow project conventions (except test_main.py)

Great work on the security hardening! πŸ”’

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Fix: Excellent βœ…

The core security improvements are well-implemented:

Primary Security Changes

  • βœ… Removal of type from safe_builtins (line 148-149): Critical fix that prevents type('X', (object,), {}) exploitation. The inline comment clearly explains the rationale.
  • βœ… Consistent error handling (line 141-144): TypeValidator now raises TypeError (wrapped to return None), making all error paths consistent. Good defensive programming.
  • βœ… Test coverage: test_type_builtin_not_exploitable specifically validates the security fix. Well done.

Security Assessment: The vulnerability is properly mitigated. The fix correctly identifies that type() is unnecessary for type annotation resolution and represents an exploitable surface when combined with allowed ast.Call nodes.


Critical Issue: Malformed JSON ❌

// scripts/model_data/hf-models.json line 216
  },
  "models": {},  // ← Duplicate key!
  "models": {

This file has:

  1. Duplicate "models": {} key at line 216
  2. Malformed closing structure (line 3929-3930)

This will break JSON parsing and must be fixed before merge.


Code Quality Observations

Good Changes βœ…

  1. multi.py: Using model_copy(update=...) instead of object.__setattr__ is proper Pydantic practice
  2. multi.py: Proper fallbacks for conditional imports (OnnxProvider, Module β†’ Any)
  3. Pickle β†’ JSON cache migration: Improves security (JSON can't execute code like pickle)
  4. Test formatting: Added proper module conventions for unit tests

Questions ⚠️

  1. Dependency bumps (pyproject.toml): Have these been tested for compatibility? Notable changes:

    • huggingface-hub: 0.36.2 β†’ >=1.7.1 (major version jump)
    • openai: 2.17.0 β†’ 2.28.0
    • pydantic-ai-slim: >=1.56.0 β†’ >=1.68.0
  2. File size limit increase (hk.pkl): 10MB β†’ 15MB - what's driving this need?

  3. Scope: This PR mixes security fixes, cache format migration, lint fixes, and dependency updates. Consider splitting for easier review/rollback.


Recommendation

Request Changes solely due to the JSON syntax error. Once that's fixed, the security improvements are solid and should merge.

The core changes to container.py:87-165 and test_container_security.py:98-103 are excellent defensive security work. πŸ‘

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review - Security Hardening PR

I've reviewed this pull request, focusing on the security improvements and code quality. Here's my detailed assessment:

βœ… Excellent Security Improvements

The core security changes in src/codeweaver/core/di/container.py are well-designed and effective:

  1. Removed type from safe_builtins (container.py:148-149)

    • βœ… Correctly prevents exploitation via type('X', (object,), {})
    • βœ… Well-documented with clear inline comment explaining the security rationale
    • βœ… Not needed for legitimate type annotation resolution
  2. Consistent error handling (container.py:141-144)

    • βœ… Changed from mixed behavior (SyntaxError β†’ None, TypeError propagated) to uniform TypeError β†’ None
    • βœ… Updated return type annotation to Any | None with matching docstring
    • βœ… More predictable and safer behavior
  3. Added NoneType handling (container.py:644-645)

    • βœ… Good defensive addition for edge case resolution
  4. Test coverage (tests/di/test_container_security.py)

    • βœ… New test_type_builtin_not_exploitable validates the exploit is blocked
    • βœ… Added proper pytest markers and module docstring
    • βœ… Follows project testing conventions

🚨 Critical Issues That Must Be Fixed

1. Invalid JSON syntax in hf-models.json

The file has two critical syntax errors:

Line 216-217: Duplicate "models" key

},
"models": {},     ← First declaration (empty)
"models": {       ← Duplicate key - invalid JSON\!
  "Alibaba-NLP/gte-modernbert-base": {

Line 3927-3929: Orphaned object at end of file

  }
}                  ← Should be end of file
{                  ← Missing comma / invalid structure
    "models": {}
}

Impact: This will cause JSON parsing to fail at runtime.

Fix: Remove the duplicate empty "models": {} on line 216 and the orphaned object at the end (lines 3928-3929).


2. Incorrect import path in mteb_to_codeweaver.py

Line 52: Imports from non-existent module

from codeweaver.providers.provider import Provider

Issue: The module codeweaver.providers.provider does not exist.

Correct import: Based on the codebase structure, Provider is defined in codeweaver.core.types.provider and re-exported via codeweaver.core:

from codeweaver.core import Provider

Evidence:

  • Provider class is defined in: src/codeweaver/core/types/provider.py:252
  • It's exported from: src/codeweaver/core/__init__.py (line 487, 960)
  • No file exists at: src/codeweaver/providers/provider.py

Impact: This script will fail with ModuleNotFoundError when executed.


⚠️ Additional Observations

Lint Fixes (Generally Good)

  • βœ… has_package(x) is not None β†’ has_package(x) - correct simplification
  • βœ… backup_models.py:57 changed and to or - correct logic (FastEmbed available if either package exists)
  • βœ… Import guards with try/except in multi.py add robustness
  • βœ… Using is for enum comparisons in statistics.py - best practice
  • βœ… FastEmbedClientOptions refactor to use model_copy(update=...) instead of object.__setattr__ - cleaner

Scope Concerns

This PR includes 51 changed files with multiple unrelated concerns:

  • Security fixes (primary focus) βœ…
  • Pickle β†’ JSON migration for node types cache
  • Dependency version updates
  • Workflow configuration changes
  • Multiple lint fixes across the codebase

While all changes appear reasonable, this broad scope makes review and potential rollback more challenging.

Type Annotations (mteb_to_codeweaver.py:375-381)

Multiple # ty: ignore[invalid-type-form] annotations added for Provider enum members in Literal types. The comment explains ty doesn't recognize BaseEnum subclass members - this is acceptable but consider reporting to the ty project.


πŸ“‹ Recommendations

Before merging:

  1. FIX the malformed JSON in hf-models.json (critical)
  2. FIX the incorrect import in mteb_to_codeweaver.py line 52 (critical)
  3. VERIFY the node_types_cache format migration doesn't break existing deployments
  4. RUN full test suite to ensure compatibility across all 51 changed files

For future PRs:
Consider splitting large multi-concern PRs into focused changes for easier review and safer rollback.


🎯 Verdict

Security improvements: βœ… Excellent - well-designed, properly tested, ready to merge

Blocking issues: ❌ Two critical bugs must be fixed first:

  • Invalid JSON syntax
  • Incorrect module import path

Once these two issues are resolved, this PR will be ready to merge. The security hardening is solid and the test coverage is appropriate.

@github-actions
Copy link
Contributor

Response to Review Comment on test_main.py

@copilot You're absolutely correct about the linting requirements! The tests/unit/** path is not covered by the tests/*.py per-file ignores in ruff.toml (lines 179-197), so unit tests need to follow stricter linting rules.

Required changes for tests/unit/test_main.py:

  1. Add module docstring (D100)
  2. Add from __future__ import annotations
  3. Add return type annotations (-> None) on all test functions (ANN201)

Here's the corrected version that should pass linting:

# SPDX-FileCopyrightText: 2025 Knitli Inc.
# SPDX-FileContributor: Adam Poulemanos <adam@knit.li>
#
# SPDX-License-Identifier: MIT OR Apache-2.0

"""Unit tests for signal handler setup in main module.

This module tests the signal handler registration and interrupt handling
logic in the CodeWeaver main entry point.
"""

from __future__ import annotations

import signal

from unittest.mock import patch

import pytest

from codeweaver.main import _setup_signal_handler


def test_setup_signal_handler_first_interrupt() -> None:
    """Test that the first interrupt raises KeyboardInterrupt."""
    with patch("signal.signal") as mock_signal:
        _setup_signal_handler()

        # Get the registered handler
        mock_signal.assert_called_with(signal.SIGINT, mock_signal.call_args[0][1])
        force_shutdown_handler = mock_signal.call_args[0][1]

        # Test first call raises KeyboardInterrupt
        with pytest.raises(KeyboardInterrupt):
            force_shutdown_handler(signal.SIGINT, None)

def test_setup_signal_handler_second_interrupt() -> None:
    """Test that the second interrupt exits immediately."""
    with patch("signal.signal") as mock_signal:
        _setup_signal_handler()

        force_shutdown_handler = mock_signal.call_args[0][1]

        # First call raises KeyboardInterrupt
        with pytest.raises(KeyboardInterrupt):
            force_shutdown_handler(signal.SIGINT, None)

        # Second call exits with 1
        with patch("sys.exit") as mock_exit, patch("codeweaver.main.logger.warning") as mock_warning:
            force_shutdown_handler(signal.SIGINT, None)

            mock_warning.assert_called_with("Force shutdown requested, exiting immediately...")
            mock_exit.assert_called_with(1)

def test_setup_signal_handler_suppress_errors() -> None:
    """Test that ValueError and OSError are suppressed when setting the signal."""
    # Test ValueError
    with patch("signal.signal", side_effect=ValueError):
        original_handler = _setup_signal_handler()
        assert original_handler is None

    # Test OSError
    with patch("signal.signal", side_effect=OSError):
        original_handler = _setup_signal_handler()
        assert original_handler is None

This should resolve the linting issues and make the tests compliant with the project's unit test standards. Once this is fixed, the PR will be ready to merge! πŸ”’

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Fix: Excellent βœ…

The core security improvements are well-implemented:

Primary Security Changes

  • βœ… Removal of type from safe_builtins (line 148-149): Critical fix that prevents type('X', (object,), {}) exploitation. The inline comment clearly explains the rationale.
  • βœ… Consistent error handling (line 141-144): TypeValidator now raises TypeError (wrapped to return None), making all error paths consistent. Good defensive programming.
  • βœ… Test coverage: test_type_builtin_not_exploitable specifically validates the security fix. Well done.

Security Assessment: The vulnerability is properly mitigated. The fix correctly identifies that type() is unnecessary for type annotation resolution and represents an exploitable surface when combined with allowed ast.Call nodes.


Critical Issue: Malformed JSON ❌

// scripts/model_data/hf-models.json line 216
  },
  "models": {},  // ← Duplicate key!
  "models": {

This file has:

  1. Duplicate "models": {} key at line 216
  2. Malformed closing structure (line 3929-3930)

This will break JSON parsing and must be fixed before merge.


Code Quality Observations

Good Changes βœ…

  1. multi.py: Using model_copy(update=...) instead of object.__setattr__ is proper Pydantic practice
  2. multi.py: Proper fallbacks for conditional imports (OnnxProvider, Module β†’ Any)
  3. Pickle β†’ JSON cache migration: Improves security (JSON can't execute code like pickle)
  4. Test formatting: Added proper module conventions for unit tests

Questions ⚠️

  1. Dependency bumps (pyproject.toml): Have these been tested for compatibility? Notable changes:

    • huggingface-hub: 0.36.2 β†’ >=1.7.1 (major version jump)
    • openai: 2.17.0 β†’ 2.28.0
    • pydantic-ai-slim: >=1.56.0 β†’ >=1.68.0
  2. File size limit increase (hk.pkl): 10MB β†’ 15MB - what's driving this need?

  3. Scope: This PR mixes security fixes, cache format migration, lint fixes, and dependency updates. Consider splitting for easier review/rollback.


Recommendation

Request Changes solely due to the JSON syntax error. Once that's fixed, the security improvements are solid and should merge.

The core changes to container.py:87-165 and test_container_security.py:98-103 are excellent defensive security work. πŸ‘

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary - Requesting Changes

I've thoroughly reviewed this security hardening PR. While the core security improvements are excellent, there are two critical bugs that must be fixed before merging.


🚨 Critical Issues (Must Fix)

1. Invalid Import Path in scripts/model_data/mteb_to_codeweaver.py

File: scripts/model_data/mteb_to_codeweaver.py:52

Current (incorrect):

from codeweaver.providers.provider import Provider

Issue: The module codeweaver.providers.provider does not exist in the codebase. This script will fail with ModuleNotFoundError when executed.

Required fix:

from codeweaver.core import Provider

Verification:

  • Provider enum is defined in: src/codeweaver/core/types/provider.py:252
  • Re-exported from: src/codeweaver/core/__init__.py (lines 487, 960)
  • No file exists at: src/codeweaver/providers/provider.py

This issue was correctly identified by Copilot in comment #2944055716.


2. Malformed JSON in scripts/model_data/hf-models.json

Issue 1 - Duplicate key (lines 216-217):

},
"models": {},     ← Empty models dict
"models": {       ← Duplicate key - INVALID JSON\!
  "Alibaba-NLP/gte-modernbert-base": {

Issue 2 - Orphaned object (lines 3927-3929):

    ]
  }
}                  ← Should be end of file
{                  ← Invalid structure
    "models": {}
}

Impact: This file will fail JSON parsing at runtime.

Required fix:

  1. Remove the empty "models": {} on line 216
  2. Remove the orphaned object structure at the end (lines 3928-3929)

Verification:

python -m json.tool scripts/model_data/hf-models.json
# Currently fails with JSON syntax error

βœ… Excellent Security Improvements

The primary security changes in src/codeweaver/core/di/container.py are well-designed and properly implemented:

1. Removed type Builtin (lines 148-162)

  • βœ… Correctly prevents exploitation via type('X', (object,), {})
  • βœ… Well-documented with clear security rationale
  • βœ… Not needed for legitimate type annotation resolution

2. Consistent Error Handling (lines 141-144)

  • βœ… Changed from mixed behavior to uniform TypeError β†’ None
  • βœ… Updated return type annotation to Any | None
  • βœ… More predictable and safer

3. Test Coverage (tests/di/test_container_security.py)

  • βœ… New test_type_builtin_not_exploitable validates the fix
  • βœ… Proper pytest markers and documentation
  • βœ… Follows project conventions

4. NoneType Handling (container.py:644-645)

  • βœ… Good defensive addition for edge cases

⚠️ Additional Observations

Positive Changes

  • βœ… Consistent has_package() usage (removed unnecessary is not None)
  • βœ… Correct logic fix in backup_models.py:57 (changed and to or)
  • βœ… Import guards with try/except for robustness
  • βœ… Enum comparisons using is operator (best practice)
  • βœ… Cleaner FastEmbedClientOptions using model_copy()

Scope Note

This PR touches 51 files with multiple concerns:

  • Security hardening (primary)
  • Pickle β†’ JSON cache migration
  • Dependency updates
  • Workflow changes
  • Various lint fixes

While all changes appear reasonable, consider splitting future PRs for easier review and rollback.


πŸ“‹ Action Items

Before merging:

  1. βœ… Security improvements - ready to merge
  2. ❌ Fix import path in mteb_to_codeweaver.py line 52
  3. ❌ Fix JSON syntax in hf-models.json (remove duplicate key and orphaned object)
  4. ⚠️ Verify node_types_cache format migration doesn't break existing deployments
  5. ⚠️ Run full test suite

Verdict

Security hardening: Excellent work - well-designed, properly tested, and ready to merge.

Blocking issues: Two critical bugs prevent merging. Once the import path and JSON syntax are fixed, this PR will be ready to approve.

The security improvements are solid and address the vulnerability appropriately. Great job on the implementation and test coverage!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants