Skip to content

Conversation

@dcherian
Copy link

@dcherian dcherian commented Dec 28, 2025

Here is some testing technology I'm in love with nowadays. I am optimistic it can help build some confidence in the array expressions work. The intent here is to test correctness, not the optimizations (it would be fun to figure out property testing for the optimization rules)

It generates a random sequence of operations, applies them to both dask and numpy arrays; and then compares them.

Run it with

python -m pytest dask/tests/test_stateful_array.py --hypothesis-verbosity=verbose --array-expr -s

to see a lot of informative output.

you'll need the hypothesis package installed.

It seems to be finding some edge-case bugs. The most severe one I've seen so far is a hang in broadcast. I'll add instructions to reproduce in a comment.

TODO:

  • add binary ops against self.numpy_array or self.dask_array.
  • vindex
  • oindex
  • shuffle / take

mrocklin and others added 30 commits December 17, 2025 09:15
- designs/array-expr.md: Architecture principles and patterns
- plans/array-expr-migration.md: Prioritized work list and TDD workflow
- .claude/skills/: Migration skill with phased guidance

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implement structured array field access (e.g., x['a'], x[['a', 'b']])
using map_blocks, matching the traditional implementation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove module-level skips and add individual xfail markers for tests
that don't yet work with array-expr. This makes test coverage visible
and progress trackable.

- test_array_core.py: 221 pass, 240 xfail
- test_array_utils.py: 30 pass, 2 xfail
- test_rechunk.py: 79 pass, 3 xfail
- test_dispatch.py: kept skip (fundamental API difference)
- test_routines.py: kept skip (too many unimplemented)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements Squeeze expression class that removes length-1 dimensions.

- Added Squeeze class in _slicing.py with proper chunks/meta/layer
- Wired up da.squeeze() and Array.squeeze() methods
- Added tests to verify correctness
- Updated skill doc with 4-file wiring pattern

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove module-level skip and add individual xfail markers for
unimplemented functions. Now 25 tests pass with 700 xfailed.

Removed duplicate squeeze tests from _array_expr tests since the
main test_routines.py tests now run.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Changed signature from transpose(axes=None) to transpose(*axes)
- Added identity transpose optimization (return self)
- Removed xfail markers for passing transpose tests

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Implement Reshape and ReshapeLowered expression classes
- Wire up Array.reshape() method and da.reshape() function
- Remove xfail markers for reshape tests
- Update migration plan and skill docs

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements the numpy __array_function__ protocol for the array-expr
Array class, enabling np.* function calls like np.sum(dask_array) to
dispatch to dask.array functions.

- Add __array_function__ method to _collection.py
- Convert test_array_function.py from module-level skip to whitelist
- Extract shared test helper classes to conftest.py

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- New _linalg.py with tensordot, matmul, dot implementations
- Added Array.dot() method to _collection.py
- 47 tests now passing (previously xfailing)
- Updated migration plan with Tier 2 progress

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Implement ArgChunk expression class for arg reductions
- Add arg_reduction function for array-expr
- Wire up argmin, argmax, nanargmin, nanargmax
- Wire up cumsum, cumprod, nancumsum, nancumprod
- Add flatten() and ravel() methods to Array
- Add Array methods for argmin, argmax, cumsum, cumprod

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Enable median, nanmedian, percentile for array-expr
- Fix map_blocks to create synthetic meta when compute_meta fails
- Fix __dask_postpersist__ to use original array's meta (like legacy)
- Fix asanyarray to use is_arraylike check (excludes DataFrames)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements ravel, flatten, expand_dims, atleast_1d/2d/3d, broadcast_to, roll.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- vdot: vector dot product using ravel and conj
- diff: discrete differences with prepend/append support
- Fix BroadcastTo._meta for 0-d arrays

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Override _tree_repr_lines in ArrayExpr to show hierarchical expression
structure with indentation. Simplifies verbose repr patterns:
- Functions/partials → function name
- numpy dtypes → string (e.g., 'float64')
- Objects with memory addresses → <ClassName> or ...

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- gradient: uses map_overlap for numerical gradient computation
- compress: boolean selection (numpy array conditions only)
- searchsorted: binary search using blockwise

Also fixes Shuffle._name to use deterministic_token and removes
xfail markers from 49 tests now passing (matmul, tensordot, dot, etc.)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Support for indexing dask arrays with boolean dask arrays:
- Full-dimensional boolean masks (d[bool_mask])
- 1D boolean arrays on specific dimensions (d[bool_1d, :])

Also adds general utilities:
- ChunksOverride: override chunk metadata for unknown sizes
- ConcatenateFinalize: finalize unknown-chunk arrays

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add HistogramBinned expression class for per-chunk histogram computation
- Add histogram function with bins, range, weights, density support
- Add outer function using flatten + blockwise
- Remove xfails for 17 passing histogram tests and outer tests

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Reorganize into 11 tiers based on complexity and impact
- Tier 1-2: Quick wins (~155 lines, 30+ tests) - stacking, flip, axis ops
- Tier 3: block (~27 tests)
- Tier 4: store/IO (~12 tests)
- Tier 5-9: Indexing, creation, histograms, selection, linalg
- Tier 10-11: Submodules and specialized ops
- Note that tiers are largely independent and parallelizable

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- vstack, hstack, dstack: stack arrays along various axes
- flip, flipud, fliplr: reverse element order
- rot90: rotate arrays 90 degrees
- transpose: function wrapper for array transpose
- swapaxes, moveaxis, rollaxis: axis manipulation

All implementations use simple wrappers around existing transpose,
concatenate, and slicing infrastructure. 70 tests now passing.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implement all Tier 2 operations from the migration plan:
- round/around, isclose, allclose, isnull/notnull, append
- count_nonzero, ndim, shape, result_type
- broadcast_arrays, unify_chunks

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add FromDelayed and FromNpyStack expression classes
- Implement store function with map_blocks
- Fix ArrayBlockwiseDep handling in map_blocks
- Add Array.store() method
- 12 IO tests now passing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix block function (was implemented but not exported)
- Add module-level skips for incompatible test modules:
  - test_fft.py (fft not implemented)
  - test_masked.py (ma not implemented)
  - test_atop.py (blockwise internals)
  - test_optimization.py (optimization internals)
  - test_shuffle.py (shuffle internals)
- Add xfails to test_reductions.py for unimplemented features
- Remove xfail from test_memmap (now passing)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add vindex with VIndexArray expression class for point-wise vectorized
  indexing with broadcasting
- Add take function for indexing along an axis
- Remove 9 xfail markers from tests

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- eye: Eye expression class for identity matrices
- diag/diagonal: Diag1D, Diag2DSimple, Diagonal expression classes
- tri, tril, triu: triangle mask operations
- fromfunction, indices, meshgrid: grid creation
- pad: constant/edge/linear_ramp/empty modes (stat modes pending)
- tile: uses block

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Split _collection.py into focused modules where each file contains
both the expression class AND its collection-level function:

- _stack.py: Stack expr + stack()
- _concatenate.py: Concatenate/ConcatenateFinalize exprs + concatenate()
- _rechunk.py: adds rechunk()
- _reshape.py: adds ravel()
- manipulation/_transpose.py: Transpose expr + transpose(), swapaxes(), etc.

New subdirectories group related functions:
- core/: asarray, from_array, blockwise, elemwise
- manipulation/: transpose, flip, roll, expand
- stacking/: block, vstack, hstack, dstack
- routines/: diff, where

Updated designs/array-expr.md with file organization guidelines.

_collection.py reduced from ~2700 to ~1070 lines.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Implement histogramdd with expression class for N-D histogram
- histogram2d as wrapper around histogramdd
- digitize using map_blocks
- Add __bool__ and chunksize properties to Array class
- Fix is_valid_chunk_type import, range shadowing in histogram

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- trace: simple wrapper using diagonal().sum()
- einsum: full implementation with array-expr blockwise
- outer was already working

Removes 48 xfails (1 trace, 47 einsum tests)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
mrocklin and others added 19 commits December 26, 2025 18:51
…n tests

- Enhanced expr-optimization skill doc with clearer testing guidance
- Updated array-expr skill to reference expr-optimization for testing patterns
- Added structural equality assertions to broadcast pushdown tests

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The base Expr._layer assumed npartitions which is DataFrame-specific.
Array expressions have multi-dimensional chunking and were getting
confusing "no attribute npartitions" errors when _layer was called
on expressions without lowering first.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The Shuffle expression class existed but wasn't exported to the public API.
Added shuffle() wrapper function and Array.shuffle() method for expression mode.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When rechunk is called with target chunks that match the input's
existing chunks, return the original array directly instead of
creating a Rechunk expression. This preserves the array name and
avoids unnecessary graph nodes.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Return merge_name instead of name from _compute_rechunk so that
subsequent steps correctly reference task names from previous steps.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Integer indices must remove dimensions even when the dimension is
size 1. Previously, size-1 dims were always converted to slice(None)
which incorrectly preserved the dimension.

Also adds regression tests for this fix and the previous rechunk fixes.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Implements a comprehensive stateful test suite that compares Dask array operations against NumPy using Hypothesis's RuleBasedStateMachine. Tests run with array expression mode enabled and verify that Dask arrays remain equivalent to NumPy arrays through arbitrary sequences of operations.

Test operations:
- rechunk: Test chunking changes while preserving array values
- transpose: Test axis permutation
- binary_op: Test broadcasting arithmetic (+, *, -, /)
- basic_indexing: Test slicing and integer indexing
- broadcast: Test np.broadcast_to with valid target shapes

New hypothesis strategies in dask/tests/strategies.py:
- chunks: Generate valid chunk specs for a shape
- broadcastable_shape: Generate shapes broadcastable WITH target
- broadcastable_array: Generate arrays broadcastable WITH target
- broadcast_to_shape: Generate shapes that source can broadcast TO
- axis_strategy, slice_strategy, index_strategy: For reductions/slicing
- chunked_arrays: Generate (numpy_arr, dask_arr) pairs

Added hypothesis to test dependencies.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Adds a new reduction rule that tests array reduction operations (sum, mean, std, var, min, max, prod, any, all) with:
- Random axis selection (None, single axis, or tuple of axes)
- Optional nan-skipping versions (nansum, nanmean, etc.)
- All operations compared against NumPy reference

New strategy in dask/tests/strategies.py:
- axes_strategy: Generates valid axis specifications for reductions

Also:
- Prioritize single-axis reductions over None in strategy sampling
- Set concise numpy print options for cleaner test output
- Reduced max_examples to 10 and stateful_step_count to 10 for reasonable test duration (~3s)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Adds a new scan rule that tests cumulative operations (cumsum, cumprod) with:
- Random single axis selection (scans require a single axis)
- Optional nan-skipping versions (nancumsum, nancumprod)
- All operations compared against NumPy reference
- Shape-preserving (unlike reductions which can reduce dimensions)

New strategy in dask/tests/strategies.py:
- scans: Generates scan operation names

Also renamed strategies for clarity:
- reduction_ops → reductions
- scan_ops → scans

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Updates the binary_op rule to randomly choose between:
- Scalar values (float in range -100 to 100)
- Broadcastable arrays (existing behavior)

This tests that Dask correctly handles scalar operations just like NumPy.

Also refactored if-else assignments to single-line ternary expressions for conciseness:
- Scan op_name assignment
- Reduction op_name assignment
- Binary op dask conversion

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Adds a persist rule that calls persist() on the Dask array, which:
- Forces computation and keeps result in memory
- No-op for NumPy arrays (already in memory)
- Tests that persist doesn't change array values

Reduced test parameters to 5 examples × 5 steps since persist operations
add significant overhead when called repeatedly during testing.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Updates the reduction rule to skip when any axis being reduced over has size 0.
This prevents errors with empty reductions.

Also adds warning suppression to persist() since it triggers computation
which can produce RuntimeWarnings (divide by zero, overflow, etc.).

The check converts axes to a tuple (handling int, tuple, and None cases)
and skips the reduction if any of the specific axes have size 0.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…ns strategy

Two changes:

1. Refactored all rules to use consistent `data=st.data()` parameter naming
   instead of varied names like `axes_data`, `other`, `index`, etc.
   This improves code consistency and readability.

2. Moved nan-version selection logic into the reductions composite strategy.
   The strategy now returns operation names directly ('sum', 'nansum', etc.)
   instead of requiring a separate use_nan_version parameter.

This simplifies the reduction rule by removing the nan_ops dictionary
and parameter, making the code cleaner and easier to maintain.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Two fixes for stateful test failures:

1. **Empty array reductions**: When axis=None (reduce over all axes),
   now check if ANY dimension has size 0 to avoid "zero-size array to
   reduction operation which has no identity" errors.

2. **Float32 precision**: Relaxed rtol to 1e-5 for float32 arrays
   (from default 1e-7) to handle numerical precision differences in
   edge cases like std([1.961, 1.961, 1.961]) which should be 0.0
   but produces tiny floating-point errors.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
self.numpy_array = self.numpy_array[idx]
self.dask_array = self.dask_array[idx]

@precondition(lambda self: False)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this precondition to reproduce a hang with broadcast with --hypothesis-seed=307603228577777922830132553021337866664 . YMMV with the seed, but the hang should reproduce even without that.

@mrocklin
Copy link
Owner

mrocklin commented Dec 28, 2025

Oh fun. I'm excited to work with this (I'm pretty curious about workflows that might help AI be more helpful in developing software, and this seems potentially useful).

Question though, I'm getting a lot of output, only a small amount of which I suspect are genuine issues. Is there a good way to filter things here to genuine failures? I'd love something that generated fail cases to feed into CC.

@mrocklin
Copy link
Owner

Ah, I'm only just now noticing the --verbose and -s flags. Working through this...

@mrocklin
Copy link
Owner

Yeah, this is cool. Thanks @dcherian . I'm looking forward to playing with this.

@dcherian
Copy link
Author

dcherian commented Dec 29, 2025

Nice, glad you got oriented.

It is random search through what could be a very large parameter space. So I'm thinking we probably want one rule per "layer" .

PS: One blogpost I've found useful is https://hypothesis.works/articles/how-not-to-die-hard-with-hypothesis/

@mrocklin mrocklin force-pushed the array-expr branch 3 times, most recently from c9da9cf to 57fa657 Compare January 2, 2026 16:18
@dcherian dcherian closed this Jan 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants