Skip to content

Speed-up virtual data processing #291

@sssangha

Description

@sssangha

As discussed with @ehavazli @rzinke @bbuzz31 @dbekaert, we aim to benchmark virtual data processing with the following approaches (which are to be captured in PR #290):
1. Toggle GDAL_DISABLE_READDIR_ON_OPEN option in unwrapStitching.py code, which seems to be a significant bottleneck. Preliminary benchmarking efforts from Brett suggest significant improvements in speeds. He will update us with specific figures.

2. Where possible, instead of using gdal.Open to access metadata info, use gdal.Info.

  1. Update the function renderVRT (e.g. in finalize_metadata) to handle input cutline/masks, so as to circumvent successive gdal.Warp commands outside of the function. To support this, find a means to add a mask band -- which incorporates cutline + other user-specified masks such as a water mask -- to avoid having to utilize time-expensive gdal.Warp calls.

4. Add NUM_THREADS to gdal.Warp, and where applicable and appropriate other gdal calls.

Longer-term action items to explore:
5. Create a dummy netcdf file with virtual string paths pointing to data/metadata layers (i.e. staging the data). I am using this script as a template: https://github.com/ACCESS-Cloud-Based-InSAR/DockerizedTopsApp/blob/dev/isce2_topsapp/packaging_utils/nc_packaging.py

6. Extract everything to VRT files from ARIA-tools.*

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions