Add stateful testing against numpy arrays #15

dcherian · 2025-12-28T05:38:55Z

Here is some testing technology I'm in love with nowadays. I am optimistic it can help build some confidence in the array expressions work. The intent here is to test correctness, not the optimizations (it would be fun to figure out property testing for the optimization rules)

It generates a random sequence of operations, applies them to both dask and numpy arrays; and then compares them.

Run it with

python -m pytest dask/tests/test_stateful_array.py --hypothesis-verbosity=verbose --array-expr -s

to see a lot of informative output.

you'll need the hypothesis package installed.

It seems to be finding some edge-case bugs. The most severe one I've seen so far is a hang in broadcast. I'll add instructions to reproduce in a comment.

TODO:

add binary ops against self.numpy_array or self.dask_array.
vindex
oindex
shuffle / take

- designs/array-expr.md: Architecture principles and patterns - plans/array-expr-migration.md: Prioritized work list and TDD workflow - .claude/skills/: Migration skill with phased guidance 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Implement structured array field access (e.g., x['a'], x[['a', 'b']]) using map_blocks, matching the traditional implementation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Remove module-level skips and add individual xfail markers for tests that don't yet work with array-expr. This makes test coverage visible and progress trackable. - test_array_core.py: 221 pass, 240 xfail - test_array_utils.py: 30 pass, 2 xfail - test_rechunk.py: 79 pass, 3 xfail - test_dispatch.py: kept skip (fundamental API difference) - test_routines.py: kept skip (too many unimplemented) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Implements Squeeze expression class that removes length-1 dimensions. - Added Squeeze class in _slicing.py with proper chunks/meta/layer - Wired up da.squeeze() and Array.squeeze() methods - Added tests to verify correctness - Updated skill doc with 4-file wiring pattern 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Remove module-level skip and add individual xfail markers for unimplemented functions. Now 25 tests pass with 700 xfailed. Removed duplicate squeeze tests from _array_expr tests since the main test_routines.py tests now run. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Changed signature from transpose(axes=None) to transpose(*axes) - Added identity transpose optimization (return self) - Removed xfail markers for passing transpose tests 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Implement Reshape and ReshapeLowered expression classes - Wire up Array.reshape() method and da.reshape() function - Remove xfail markers for reshape tests - Update migration plan and skill docs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Implements the numpy __array_function__ protocol for the array-expr Array class, enabling np.* function calls like np.sum(dask_array) to dispatch to dask.array functions. - Add __array_function__ method to _collection.py - Convert test_array_function.py from module-level skip to whitelist - Extract shared test helper classes to conftest.py 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- New _linalg.py with tensordot, matmul, dot implementations - Added Array.dot() method to _collection.py - 47 tests now passing (previously xfailing) - Updated migration plan with Tier 2 progress 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Implement ArgChunk expression class for arg reductions - Add arg_reduction function for array-expr - Wire up argmin, argmax, nanargmin, nanargmax - Wire up cumsum, cumprod, nancumsum, nancumprod - Add flatten() and ravel() methods to Array - Add Array methods for argmin, argmax, cumsum, cumprod 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Enable median, nanmedian, percentile for array-expr - Fix map_blocks to create synthetic meta when compute_meta fails - Fix __dask_postpersist__ to use original array's meta (like legacy) - Fix asanyarray to use is_arraylike check (excludes DataFrames) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Implements ravel, flatten, expand_dims, atleast_1d/2d/3d, broadcast_to, roll. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- vdot: vector dot product using ravel and conj - diff: discrete differences with prepend/append support - Fix BroadcastTo._meta for 0-d arrays 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Override _tree_repr_lines in ArrayExpr to show hierarchical expression structure with indentation. Simplifies verbose repr patterns: - Functions/partials → function name - numpy dtypes → string (e.g., 'float64') - Objects with memory addresses → <ClassName> or ... 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- gradient: uses map_overlap for numerical gradient computation - compress: boolean selection (numpy array conditions only) - searchsorted: binary search using blockwise Also fixes Shuffle._name to use deterministic_token and removes xfail markers from 49 tests now passing (matmul, tensordot, dot, etc.) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Support for indexing dask arrays with boolean dask arrays: - Full-dimensional boolean masks (d[bool_mask]) - 1D boolean arrays on specific dimensions (d[bool_1d, :]) Also adds general utilities: - ChunksOverride: override chunk metadata for unknown sizes - ConcatenateFinalize: finalize unknown-chunk arrays 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add HistogramBinned expression class for per-chunk histogram computation - Add histogram function with bins, range, weights, density support - Add outer function using flatten + blockwise - Remove xfails for 17 passing histogram tests and outer tests 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Reorganize into 11 tiers based on complexity and impact - Tier 1-2: Quick wins (~155 lines, 30+ tests) - stacking, flip, axis ops - Tier 3: block (~27 tests) - Tier 4: store/IO (~12 tests) - Tier 5-9: Indexing, creation, histograms, selection, linalg - Tier 10-11: Submodules and specialized ops - Note that tiers are largely independent and parallelizable 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- vstack, hstack, dstack: stack arrays along various axes - flip, flipud, fliplr: reverse element order - rot90: rotate arrays 90 degrees - transpose: function wrapper for array transpose - swapaxes, moveaxis, rollaxis: axis manipulation All implementations use simple wrappers around existing transpose, concatenate, and slicing infrastructure. 70 tests now passing. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Implement all Tier 2 operations from the migration plan: - round/around, isclose, allclose, isnull/notnull, append - count_nonzero, ndim, shape, result_type - broadcast_arrays, unify_chunks 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add FromDelayed and FromNpyStack expression classes - Implement store function with map_blocks - Fix ArrayBlockwiseDep handling in map_blocks - Add Array.store() method - 12 IO tests now passing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Fix block function (was implemented but not exported) - Add module-level skips for incompatible test modules: - test_fft.py (fft not implemented) - test_masked.py (ma not implemented) - test_atop.py (blockwise internals) - test_optimization.py (optimization internals) - test_shuffle.py (shuffle internals) - Add xfails to test_reductions.py for unimplemented features - Remove xfail from test_memmap (now passing) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add vindex with VIndexArray expression class for point-wise vectorized indexing with broadcasting - Add take function for indexing along an axis - Remove 9 xfail markers from tests 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- eye: Eye expression class for identity matrices - diag/diagonal: Diag1D, Diag2DSimple, Diagonal expression classes - tri, tril, triu: triangle mask operations - fromfunction, indices, meshgrid: grid creation - pad: constant/edge/linear_ramp/empty modes (stat modes pending) - tile: uses block 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Split _collection.py into focused modules where each file contains both the expression class AND its collection-level function: - _stack.py: Stack expr + stack() - _concatenate.py: Concatenate/ConcatenateFinalize exprs + concatenate() - _rechunk.py: adds rechunk() - _reshape.py: adds ravel() - manipulation/_transpose.py: Transpose expr + transpose(), swapaxes(), etc. New subdirectories group related functions: - core/: asarray, from_array, blockwise, elemwise - manipulation/: transpose, flip, roll, expand - stacking/: block, vstack, hstack, dstack - routines/: diff, where Updated designs/array-expr.md with file organization guidelines. _collection.py reduced from ~2700 to ~1070 lines. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Implement histogramdd with expression class for N-D histogram - histogram2d as wrapper around histogramdd - digitize using map_blocks - Add __bool__ and chunksize properties to Array class - Fix is_valid_chunk_type import, range shadowing in histogram 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- trace: simple wrapper using diagonal().sum() - einsum: full implementation with array-expr blockwise - outer was already working Removes 48 xfails (1 trace, 47 einsum tests) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…n tests - Enhanced expr-optimization skill doc with clearer testing guidance - Updated array-expr skill to reference expr-optimization for testing patterns - Added structural equality assertions to broadcast pushdown tests 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The base Expr._layer assumed npartitions which is DataFrame-specific. Array expressions have multi-dimensional chunking and were getting confusing "no attribute npartitions" errors when _layer was called on expressions without lowering first. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The Shuffle expression class existed but wasn't exported to the public API. Added shuffle() wrapper function and Array.shuffle() method for expression mode. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

When rechunk is called with target chunks that match the input's existing chunks, return the original array directly instead of creating a Rechunk expression. This preserves the array name and avoids unnecessary graph nodes. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Return merge_name instead of name from _compute_rechunk so that subsequent steps correctly reference task names from previous steps. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Integer indices must remove dimensions even when the dimension is size 1. Previously, size-1 dims were always converted to slice(None) which incorrectly preserved the dimension. Also adds regression tests for this fix and the previous rechunk fixes. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Implements a comprehensive stateful test suite that compares Dask array operations against NumPy using Hypothesis's RuleBasedStateMachine. Tests run with array expression mode enabled and verify that Dask arrays remain equivalent to NumPy arrays through arbitrary sequences of operations. Test operations: - rechunk: Test chunking changes while preserving array values - transpose: Test axis permutation - binary_op: Test broadcasting arithmetic (+, *, -, /) - basic_indexing: Test slicing and integer indexing - broadcast: Test np.broadcast_to with valid target shapes New hypothesis strategies in dask/tests/strategies.py: - chunks: Generate valid chunk specs for a shape - broadcastable_shape: Generate shapes broadcastable WITH target - broadcastable_array: Generate arrays broadcastable WITH target - broadcast_to_shape: Generate shapes that source can broadcast TO - axis_strategy, slice_strategy, index_strategy: For reductions/slicing - chunked_arrays: Generate (numpy_arr, dask_arr) pairs Added hypothesis to test dependencies. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Adds a new reduction rule that tests array reduction operations (sum, mean, std, var, min, max, prod, any, all) with: - Random axis selection (None, single axis, or tuple of axes) - Optional nan-skipping versions (nansum, nanmean, etc.) - All operations compared against NumPy reference New strategy in dask/tests/strategies.py: - axes_strategy: Generates valid axis specifications for reductions Also: - Prioritize single-axis reductions over None in strategy sampling - Set concise numpy print options for cleaner test output - Reduced max_examples to 10 and stateful_step_count to 10 for reasonable test duration (~3s) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Adds a new scan rule that tests cumulative operations (cumsum, cumprod) with: - Random single axis selection (scans require a single axis) - Optional nan-skipping versions (nancumsum, nancumprod) - All operations compared against NumPy reference - Shape-preserving (unlike reductions which can reduce dimensions) New strategy in dask/tests/strategies.py: - scans: Generates scan operation names Also renamed strategies for clarity: - reduction_ops → reductions - scan_ops → scans 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Updates the binary_op rule to randomly choose between: - Scalar values (float in range -100 to 100) - Broadcastable arrays (existing behavior) This tests that Dask correctly handles scalar operations just like NumPy. Also refactored if-else assignments to single-line ternary expressions for conciseness: - Scan op_name assignment - Reduction op_name assignment - Binary op dask conversion 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Adds a persist rule that calls persist() on the Dask array, which: - Forces computation and keeps result in memory - No-op for NumPy arrays (already in memory) - Tests that persist doesn't change array values Reduced test parameters to 5 examples × 5 steps since persist operations add significant overhead when called repeatedly during testing. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Updates the reduction rule to skip when any axis being reduced over has size 0. This prevents errors with empty reductions. Also adds warning suppression to persist() since it triggers computation which can produce RuntimeWarnings (divide by zero, overflow, etc.). The check converts axes to a tuple (handling int, tuple, and None cases) and skips the reduction if any of the specific axes have size 0. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

…ns strategy Two changes: 1. Refactored all rules to use consistent `data=st.data()` parameter naming instead of varied names like `axes_data`, `other`, `index`, etc. This improves code consistency and readability. 2. Moved nan-version selection logic into the reductions composite strategy. The strategy now returns operation names directly ('sum', 'nansum', etc.) instead of requiring a separate use_nan_version parameter. This simplifies the reduction rule by removing the nan_ops dictionary and parameter, making the code cleaner and easier to maintain. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Two fixes for stateful test failures: 1. **Empty array reductions**: When axis=None (reduce over all axes), now check if ANY dimension has size 0 to avoid "zero-size array to reduction operation which has no identity" errors. 2. **Float32 precision**: Relaxed rtol to 1e-5 for float32 arrays (from default 1e-7) to handle numerical precision differences in edge cases like std([1.961, 1.961, 1.961]) which should be 0.0 but produces tiny floating-point errors. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

dcherian · 2025-12-28T05:40:00Z

dask/tests/test_stateful_array.py

+        self.numpy_array = self.numpy_array[idx]
+        self.dask_array = self.dask_array[idx]
+
+    @precondition(lambda self: False)


remove this precondition to reproduce a hang with broadcast with --hypothesis-seed=307603228577777922830132553021337866664 . YMMV with the seed, but the hang should reproduce even without that.

mrocklin · 2025-12-28T16:18:34Z

Oh fun. I'm excited to work with this (I'm pretty curious about workflows that might help AI be more helpful in developing software, and this seems potentially useful).

Question though, I'm getting a lot of output, only a small amount of which I suspect are genuine issues. Is there a good way to filter things here to genuine failures? I'd love something that generated fail cases to feed into CC.

mrocklin · 2025-12-28T16:23:23Z

Ah, I'm only just now noticing the --verbose and -s flags. Working through this...

mrocklin · 2025-12-28T16:24:34Z

Yeah, this is cool. Thanks @dcherian . I'm looking forward to playing with this.

dcherian · 2025-12-29T16:18:48Z

Nice, glad you got oriented.

It is random search through what could be a very large parameter space. So I'm thinking we probably want one rule per "layer" .

PS: One blogpost I've found useful is https://hypothesis.works/articles/how-not-to-die-hard-with-hypothesis/

mrocklin and others added 30 commits December 17, 2025 09:15

Add where and setitem to array-expr

bc3d322

Add shape manipulation to array-expr

5cae55f

Implements ravel, flatten, expand_dims, atleast_1d/2d/3d, broadcast_to, roll. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add block to array-expr

ca6546e

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Remove xfail markers from 50 now-passing array-expr tests

cab0ff3

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

mrocklin and others added 19 commits December 26, 2025 18:51

Add stateful testing against numpy arrays

fdbc023

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Skip var, std for now

5756d67

Fix running with expr

e25b8ef

make more idiomatic

ac661f2

WIP

d1d0830

dcherian commented Dec 28, 2025

View reviewed changes

update skill to be moreidiomatic

2d9ad8f

dcherian force-pushed the stateful-test branch from 79bf1a3 to 2d9ad8f Compare December 28, 2025 05:57

mrocklin force-pushed the array-expr branch 3 times, most recently from c9da9cf to 57fa657 Compare January 2, 2026 16:18

dcherian closed this Jan 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add stateful testing against numpy arrays #15

Add stateful testing against numpy arrays #15

Uh oh!

dcherian commented Dec 28, 2025 •

edited

Loading

Uh oh!

dcherian Dec 28, 2025

Uh oh!

mrocklin commented Dec 28, 2025 •

edited

Loading

Uh oh!

mrocklin commented Dec 28, 2025

Uh oh!

mrocklin commented Dec 28, 2025

Uh oh!

dcherian commented Dec 29, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add stateful testing against numpy arrays #15

Add stateful testing against numpy arrays #15

Uh oh!

Conversation

dcherian commented Dec 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dcherian Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

mrocklin commented Dec 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mrocklin commented Dec 28, 2025

Uh oh!

mrocklin commented Dec 28, 2025

Uh oh!

dcherian commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dcherian commented Dec 28, 2025 •

edited

Loading

mrocklin commented Dec 28, 2025 •

edited

Loading

dcherian commented Dec 29, 2025 •

edited

Loading