Skip to content

Latest commit

 

History

History
77 lines (51 loc) · 3.06 KB

File metadata and controls

77 lines (51 loc) · 3.06 KB

AGENTS.md

This file provides guidance to AI assistants like Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Diffly is a utility package for comparing Polars DataFrames/LazyFrames with detailed analysis capabilities. It identifies differences between datasets including row-level mismatches, missing rows, and column value changes.

Package Management

This repository uses the pixi package manager. Full documentation: https://pixi.prefix.dev/latest/llms-full.txt

All commands must be prefixed with pixi run. If you change pixi.toml, run pixi lock afterwards.

Development Commands

pixi run test                    # Run all unit tests
pixi run test-coverage           # Run tests with coverage report
pixi run pre-commit-run          # Run all pre-commit hooks (formatting, linting, type checks)
pixi run docs                    # Build Sphinx documentation
pixi run pytest tests/test_equal.py::test_equal  # Run a specific test

Required Before Each Commit

  1. Run pixi run test to ensure all tests pass
  2. Run pixi run pre-commit-run to format code and generate summary fixtures if needed

Architecture

Core Modules

  • comparison.py: Main logic with compare_frames() entry point and DataFrameComparison class
  • summary.py: Summary class for rich-formatted comparison reports
  • testing.py: assert_collection_equal() and assert_frame_equal() utilities
  • cli.py: Typer-based CLI for comparing parquet files
  • _conditions.py: Polars expressions for type-aware equality checks (floats with tolerance, temporal types)
  • _utils.py: Helper functions and tolerance defaults (ABS_TOL_DEFAULT=1e-08, REL_TOL_DEFAULT=1e-05)
  • _cache.py: @cached_method decorator for caching comparison results

Key Design Patterns

  • Lazy Evaluation: Uses Polars LazyFrames internally for efficiency
  • Method Caching: Results cached to avoid recomputation
  • Per-Column Tolerances: Tolerances can be scalar or dict[str, float/timedelta] for per-column values

Public API

from diffly import compare_frames

comparison = compare_frames(left, right, primary_key="id", abs_tol=1e-08, rel_tol=1e-05)
comparison.equal()           # Check if frames are equal
comparison.fraction_same()   # Match rates per column
comparison.summary()         # Rich-formatted report

Code Standards

Style

  • Minimal comments - only where code is non-obvious
  • Commit titles follow Conventional Commits with capitalized first letter: feat: Add new feature

API Consistency Requirements

When modifying compare_frames() or summary() arguments:

  • diffly/cli.py must expose all arguments (except tolerances don't support mappings in CLI)
  • diffly/testing.py functions must support all arguments plus check_dtypes

Testing

  • Test fixtures for summary outputs are in tests/summary/fixtures/
  • Fixtures are auto-generated by pre-commit hooks when summary output changes
  • Use pixi run pytest -m generate to manually regenerate fixtures