Skip to content

Conversation

@clayton-ho
Copy link

Sped up the creation and filling of datasets (in Dataset.py), making the RSG over an order of magnitude faster for large datasets.
This requires the datavault server to have a setting "shape" which returns the shape of the dataset.
Instead of initializing an empty numpy array and appending successive data grabs from the datavault, we now use the datavault's shape setting to preallocate a numpy array and fill it with data.
Additionally, the size of each data grab has been increased from 100 to 1000 since most of the overhead arises from filling the dataset.

@clayton-ho
Copy link
Author

I realized that I was stupid and forgot to account for live datasets, in which case the Dataset object will not be able to accommodate the increasing dataset.

@clayton-ho
Copy link
Author

I realized that I was stupid and forgot to account for live datasets, in which case the Dataset object will not be able to accommodate the increasing dataset.

These have been fixed in the latest commit, which somewhat speeds up getData since we now create the data arrays upon instantiation of a Dataset object so we never have to check for the existence of data arrays.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant