-
Notifications
You must be signed in to change notification settings - Fork 108
Open
Description
Problem Statement
Currently, each package manager (crates, pkgx, homebrew) implements its own diff_* method with significant code duplication. While the core diffing algorithm is identical across all package managers, each implementation duplicates the same priority-based deduplication logic, set operations, and dependency object creation.
Current State
Duplicated Logic Across Package Managers:
- Priority-based deduplication: All PMs handle conflicts when the same dependency appears with multiple types (runtime/build/test) by choosing the highest priority type
- Set-based diffing: All use actual - existing and existing - actual to calculate new/removed dependencies
- Cache interaction: All query the same cache structure (self.caches.dependencies.get(pkg_id, set()))
- Object creation: All create LegacyDependency objects with identical fields
- Error handling: All have similar error handling for missing dependencies in cache
Package Manager Differences:
The only differences between implementations are in the data extraction phase:
- Crates: Extracts dependencies from pkg.latest_version.dependencies
- Pkgx: Extracts from pkg.dependencies, pkg.build.dependencies, pkg.test.dependencies
- Homebrew: Extracts from JSON structure with build/runtime/test arrays
Solution: A Data Normalization Layer
Solving one problem generally requires moving it somewhere else. So, let's create an intermediate normalization step that converts package manager-specific data into a common format, allowing a single, shared diff implementation.
# 1. Common intermediate representation
@dataclass
class ParsedDependency:
name: str
dependency_type: DependencyType # Enum: RUNTIME, BUILD, TEST
@dataclass
class NormalizedPackage:
identifier: str # import_id, crate_id, etc.
dependencies: list[ParsedDependency]
# 2. Package manager-specific normalizers
def normalize_pkgx_package(pkg: PkgxPackage) -> NormalizedPackage
def normalize_crates_package(pkg: Crate) -> NormalizedPackage
def normalize_homebrew_package(pkg: dict) -> NormalizedPackage
# 3. Single, shared diff implementation
def diff_deps(
normalized_pkg: NormalizedPackage,
cache: Cache,
config: Config
) -> tuple[list[LegacyDependency], list[LegacyDependency]]PkgxPackage → normalize_pkgx() → NormalizedPackage → diff_deps() → results
Crate → normalize_crates() → NormalizedPackage → diff_deps() → results
HomebrewData → normalize_homebrew() → NormalizedPackage → diff_deps() → resultsBenefits
- Code Deduplication
- ~150 lines of identical logic consolidated into single implementation
- Priority handling logic defined once, used everywhere
- Set operations and cache interaction logic shared
- Consistency
- Identical behavior across all package managers
- Same priority handling: Runtime > Build > Test
- Consistent error messages and logging
- Maintainability
- Bug fixes apply to all package managers simultaneously
- Core algorithm changes in one place
- Package manager complexities isolated to normalizers
- Testability
- Core diff logic tested independently of package manager quirks
- Normalizers can be unit tested in isolation
- Easier to test edge cases once vs. three times
- Extensibility
- Adding new package managers requires only implementing a normalizer
- Core diff logic doesn't need to change
- Clear contract for what new package managers need to provide
Implementation Plan
Should ideally do it per diff method, but this for Diff Dependency, for example:
- Create Common Infrastructure
- Define ParsedDependency and NormalizedPackage data classes
- Create DependencyType enum (RUNTIME, BUILD, TEST)
- Implement shared diff_deps() function in core/
- Add comprehensive unit tests for shared logic
- Implement Normalizers
- Create normalize_crates_package() in package_managers/crates/
- Create normalize_pkgx_package() in package_managers/pkgx/
- Create normalize_homebrew_package() in package_managers/homebrew/
- Unit test each normalizer
- Refactor Package Managers
- Update CratesDiff.diff_deps() to use normalized approach
- Update PkgxDiff.diff_deps() to use normalized approach
- Update HomebrewDiff.diff_deps() to use normalized approach
- Ensure existing integration tests still pass
- Cleanup
- Remove old diff_deps implementations
- Update documentation
- Add integration tests for the new flow
Metadata
Metadata
Assignees
Labels
No labels