Test and document using GDAL virtual file system handlers#444
Test and document using GDAL virtual file system handlers#444dcdenu4 merged 8 commits intonatcap:mainfrom
Conversation
dcdenu4
left a comment
There was a problem hiding this comment.
Thanks @emlys , I found a couple other places we might want to add the vsi blurb too. One I couldn't really comment on is under the def build_overviews functions in geoprocessing.py, line 4670.
When thinking about testing a few questions came to mind that I was curious if you'd come across or had thought about.
- Where does GDAL save the needed downloaded data from URLs and how / when does GDAL garbage collect it? Is that something we need to be mindful of?
- I'd love to have some more comprehensive testing that could help us spot issues further in advance, but I know we don't want to burden our GHA runners all the time. It might be nice to start thinking about having a more comprehensive test runner / suite that we can manually run every once in awhile or that runs periodically during off hours.
| lambda block: block * 2, [input_path], target_path) | ||
| numpy.testing.assert_array_equal( | ||
| pygeoprocessing.raster_to_numpy_array(target_path), | ||
| numpy.ones((9, 9), dtype=numpy.float32)) |
There was a problem hiding this comment.
This got me thinking that it'd be nice to document what test_data/raster.tif is and it's properties. Maybe a more descriptive name for that test file could help at the very least. But if I wanted to use it in another test function I wouldn't know what to expect for values.
There was a problem hiding this comment.
Hmm, I'm not sure exactly how to handle that. I renamed the files to small_raster.tif and small_vector.gpkg so hopefully that helps a bit? But I don't want to be too specific e.g. small_raster_for_vsi_tests.tif would then not make sense if we did want to use it in another test.
There was a problem hiding this comment.
Would it be overkill to have a small README in the folder that describes the data? I'm mostly thinking that I don't know what the values are and therefore what I'd expect for some kind of output. Maybe we can address this in the future if we do more of this.
tests/test_geoprocessing.py
Outdated
| def test_raster_map_vsicurl(self): | ||
| """PGP: raster_map with vsicurl.""" | ||
| # Access test data hosted on github | ||
| input_path = '/vsicurl/https://raw.githubusercontent.com/emlys/pygeoprocessing/feature/441/tests/test_data/raster.tif' |
There was a problem hiding this comment.
In playing around with this, do you have a sense for where GDAL downloads the data it needs from a hosted file? Are we sure we're cleaning that up in our tests?
There was a problem hiding this comment.
I don't know that it downloads it to anywhere - it may just be in memory. I wouldn't expect it to leave any extra files around afterward
Co-authored-by: Doug <dcdenu4@gmail.com>
|
Thanks @dcdenu4! Here are my thoughts - Where does GDAL save the needed downloaded data from URLs and how / when does GDAL garbage collect it? Is that something we need to be mindful of? In the docs I do not see any mention of downloaded data being permanently stored. I think it's safe to assume that this is abstracted away by the drivers. The documentation suggests that everything is cached and there is a reasonable limit on the cache size:
I'd love to have some more comprehensive testing that could help us spot issues further in advance, but I know we don't want to burden our GHA runners all the time. It might be nice to start thinking about having a more comprehensive test runner / suite that we can manually run every once in awhile or that runs periodically during off hours. I think this could be addressed separately... I don't want to fall into the trap of testing GDAL functionality rather than pygeoprocessing functionality. VSI handlers are new to us, but not new to GDAL, so I think we can assume that they work as described and not need to write our own comprehensive tests using that feature. I did also test locally running |
Fixes #441