Skip to content

Standardize Dependency Diffing Logic Across Package Managers #126

@sanchitram1

Description

@sanchitram1

Problem Statement

Currently, each package manager (crates, pkgx, homebrew) implements its own diff_* method with significant code duplication. While the core diffing algorithm is identical across all package managers, each implementation duplicates the same priority-based deduplication logic, set operations, and dependency object creation.

Current State

Duplicated Logic Across Package Managers:

  • Priority-based deduplication: All PMs handle conflicts when the same dependency appears with multiple types (runtime/build/test) by choosing the highest priority type
  • Set-based diffing: All use actual - existing and existing - actual to calculate new/removed dependencies
  • Cache interaction: All query the same cache structure (self.caches.dependencies.get(pkg_id, set()))
  • Object creation: All create LegacyDependency objects with identical fields
  • Error handling: All have similar error handling for missing dependencies in cache

Package Manager Differences:

The only differences between implementations are in the data extraction phase:

  • Crates: Extracts dependencies from pkg.latest_version.dependencies
  • Pkgx: Extracts from pkg.dependencies, pkg.build.dependencies, pkg.test.dependencies
  • Homebrew: Extracts from JSON structure with build/runtime/test arrays

Solution: A Data Normalization Layer

Solving one problem generally requires moving it somewhere else. So, let's create an intermediate normalization step that converts package manager-specific data into a common format, allowing a single, shared diff implementation.

# 1. Common intermediate representation
@dataclass
class ParsedDependency:
    name: str
    dependency_type: DependencyType  # Enum: RUNTIME, BUILD, TEST

@dataclass 
class NormalizedPackage:
    identifier: str  # import_id, crate_id, etc.
    dependencies: list[ParsedDependency]

# 2. Package manager-specific normalizers
def normalize_pkgx_package(pkg: PkgxPackage) -> NormalizedPackage
def normalize_crates_package(pkg: Crate) -> NormalizedPackage  
def normalize_homebrew_package(pkg: dict) -> NormalizedPackage

# 3. Single, shared diff implementation
def diff_deps(
    normalized_pkg: NormalizedPackage, 
    cache: Cache, 
    config: Config
) -> tuple[list[LegacyDependency], list[LegacyDependency]]
PkgxPackage → normalize_pkgx() → NormalizedPackage → diff_deps() → results
Crate → normalize_crates() → NormalizedPackage → diff_deps() → results  
HomebrewData → normalize_homebrew() → NormalizedPackage → diff_deps() → results

Benefits

  1. Code Deduplication
  • ~150 lines of identical logic consolidated into single implementation
  • Priority handling logic defined once, used everywhere
  • Set operations and cache interaction logic shared
  1. Consistency
  • Identical behavior across all package managers
  • Same priority handling: Runtime > Build > Test
  • Consistent error messages and logging
  1. Maintainability
  • Bug fixes apply to all package managers simultaneously
  • Core algorithm changes in one place
  • Package manager complexities isolated to normalizers
  1. Testability
  • Core diff logic tested independently of package manager quirks
  • Normalizers can be unit tested in isolation
  • Easier to test edge cases once vs. three times
  1. Extensibility
  • Adding new package managers requires only implementing a normalizer
  • Core diff logic doesn't need to change
  • Clear contract for what new package managers need to provide

Implementation Plan

Should ideally do it per diff method, but this for Diff Dependency, for example:

  1. Create Common Infrastructure
  • Define ParsedDependency and NormalizedPackage data classes
  • Create DependencyType enum (RUNTIME, BUILD, TEST)
  • Implement shared diff_deps() function in core/
  • Add comprehensive unit tests for shared logic
  1. Implement Normalizers
  • Create normalize_crates_package() in package_managers/crates/
  • Create normalize_pkgx_package() in package_managers/pkgx/
  • Create normalize_homebrew_package() in package_managers/homebrew/
  • Unit test each normalizer
  1. Refactor Package Managers
  • Update CratesDiff.diff_deps() to use normalized approach
  • Update PkgxDiff.diff_deps() to use normalized approach
  • Update HomebrewDiff.diff_deps() to use normalized approach
  • Ensure existing integration tests still pass
  1. Cleanup
  • Remove old diff_deps implementations
  • Update documentation
  • Add integration tests for the new flow

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions