This file provides guidance to AI assistants like Claude Code (claude.ai/code) when working with code in this repository.
Diffly is a utility package for comparing Polars DataFrames/LazyFrames with detailed analysis capabilities. It identifies differences between datasets including row-level mismatches, missing rows, and column value changes.
This repository uses the pixi package manager. Full documentation: https://pixi.prefix.dev/latest/llms-full.txt
All commands must be prefixed with pixi run. If you change pixi.toml, run pixi lock afterwards.
pixi run test # Run all unit tests
pixi run test-coverage # Run tests with coverage report
pixi run pre-commit-run # Run all pre-commit hooks (formatting, linting, type checks)
pixi run docs # Build Sphinx documentation
pixi run pytest tests/test_equal.py::test_equal # Run a specific test- Run
pixi run testto ensure all tests pass - Run
pixi run pre-commit-runto format code and generate summary fixtures if needed
- comparison.py: Main logic with
compare_frames()entry point andDataFrameComparisonclass - summary.py:
Summaryclass for rich-formatted comparison reports - testing.py:
assert_collection_equal()andassert_frame_equal()utilities - cli.py: Typer-based CLI for comparing parquet files
- _conditions.py: Polars expressions for type-aware equality checks (floats with tolerance, temporal types)
- _utils.py: Helper functions and tolerance defaults (
ABS_TOL_DEFAULT=1e-08,REL_TOL_DEFAULT=1e-05) - _cache.py:
@cached_methoddecorator for caching comparison results
- Lazy Evaluation: Uses Polars LazyFrames internally for efficiency
- Method Caching: Results cached to avoid recomputation
- Per-Column Tolerances: Tolerances can be scalar or
dict[str, float/timedelta]for per-column values
from diffly import compare_frames
comparison = compare_frames(left, right, primary_key="id", abs_tol=1e-08, rel_tol=1e-05)
comparison.equal() # Check if frames are equal
comparison.fraction_same() # Match rates per column
comparison.summary() # Rich-formatted report- Minimal comments - only where code is non-obvious
- Commit titles follow Conventional Commits with capitalized first letter:
feat: Add new feature
When modifying compare_frames() or summary() arguments:
diffly/cli.pymust expose all arguments (except tolerances don't support mappings in CLI)diffly/testing.pyfunctions must support all arguments pluscheck_dtypes
- Test fixtures for summary outputs are in
tests/summary/fixtures/ - Fixtures are auto-generated by pre-commit hooks when summary output changes
- Use
pixi run pytest -m generateto manually regenerate fixtures