Skip to content

Conversation

@daurer
Copy link
Contributor

@daurer daurer commented May 28, 2024

The way we are currently updating views, there can be a huge overhead which is especially problematic in two scenarios

  • A single dataset (scan) with a very large number of views (e.g. electron ptychography)
  • A combination of datasets loaded as a large list of scans

Two avoid such overheads, I am proposing the following changes

  1. Move container reformating to the outermost layer in new_data,
    i.e. reformat once after all scans and views have been loaded
  2. Keep track of changes that are made to particular views and only update if absolutely needed

Example benchmarks of data loading times before/after the above changes have been applied

Case 1

Electron Ptychography data with N frames of shape XxY
Data loading before: XX seconds
Data loading after: YY seconds
Speed increase Nx

Case 2

Ptychography data set with N scans, each with K frames of shape XxY, combined into a single reconstruction.
Data loading before: XX seconds
Data loading after: YY seconds
Speed increase Nx

@daurer daurer force-pushed the reduce-overhead-in-reformat branch from fca2450 to 1a3b1a8 Compare May 28, 2024 15:27
@daurer daurer mentioned this pull request Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants