Skip to content

Conversation

@ParticularMiner
Copy link
Contributor

@ParticularMiner ParticularMiner commented Feb 15, 2022

There are often times when one needs to create a dask array whose shape/size is not known at graph-construction time but will be determined later during execution time.

As at now, dask does not directly allow the creation of arrays of delayed shape/size.

One could of course simply compute the shape/size during graph-construction, but this is often a computationally expensive and inconvenient operation, and one would rather prefer to defer all computations to the very end, when it is convenient.

With DOnion this becomes possible.

@ParticularMiner
Copy link
Contributor Author

As an example, the following statement:

new_vector = Vector.from_values(rows, values)

introduces a problem in dask-grblas when rows is a dask array. This problem arises because the size of new_vector is sz = rows.max() + 1 which is not known during the graph-construction phase of new_vector unless sz is first computed. However, it is preferred that such a computation be delayed until graph-execution-time.

dask-grblas solves this problem by introducing a new object, coined a "dOnion" ("dask Onion"), which essentially delays the creation of the dask array associated with new_vector until graph-execution time, when sz can been determined. It does so by nesting the dask array within another dask array.

Note that this way of using dask has been rather strongly discouraged, with good reason, by dask developers (see https://dask.discourse.group/t/dask-array-twice-delayed/380), however it is the only way I currently see of circumventing the problem described above while continuing to use all known dask-grblas operations on new_vector.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant