Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions docs/source/basic_usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -99,3 +99,31 @@ first element is the filepath, and the second is the band index:

>>> raster_path_band_list = [(raster_a_path, 1), (raster_b_path, 1)]

Using GDAL's virtual file system handlers
*****************************************

Pygeoprocessing reads all input data using GDAL. In addition to "regular" files located on
your local file system, GDAL can read other types of files using virtual file system handlers.
Using virtual file system handlers, you can directly access files hosted on a remote server
over HTTP; files hosted with cloud storage services such as AWS S3 or Google Cloud Storage
(including access-controlled files); zip archives; and more. See the
`GDAL Virtual File Systems documentation <https://gdal.org/en/stable/user/virtual_file_systems.html>`_
for details.

To use a virtual file system handler, you must add the appropriate `/vsi/` prefix to the
file path or URL. For example, the `/vsicurl/` prefix tells GDAL to read a file from the given
URL over HTTP/FTP. You can pass in paths with VSI prefixes directly to ``pygeoprocessing``.
For example:

.. code::

pygeoprocessing.get_raster_info(
'/vsicurl/https://storage.googleapis.com/natcap-data-cache/global/nasa-srtm-v3-1s/srtm-v3-1s.tif'
)

Note that support for virtual file systems is specific to each of GDAL's file format drivers. These will work for most but not all file formats.

GDAL can also write out data to a remote location using virtual file system handlers. ``pygeoprocessing``
does not officially support this, but it may work in some cases. Note that, even if the output is
written to a remote location, many ``pygeoprocessing`` functions will write out temporary
intermediate results to the local file system.
104 changes: 70 additions & 34 deletions src/pygeoprocessing/geoprocessing.py

Large diffs are not rendered by default.

15 changes: 10 additions & 5 deletions src/pygeoprocessing/geoprocessing_core.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,8 @@ def _distance_transform_edt(

Parameters:
region_raster_path (string): path to a byte raster where region pixels
are indicated by a 1 and 0 otherwise.
are indicated by a 1 and 0 otherwise. Paths may use any
GDAL-supported scheme, including virtual file system /vsi schemes.
g_raster_path (string): path to a raster created by this call that
is used as the intermediate "g" variable described in Meijster
et. al.
Expand Down Expand Up @@ -361,7 +362,8 @@ def calculate_slope(

Parameters:
base_elevation_raster_path_band (string): a path/band tuple to a
raster of height values. (path_to_raster, band_index)
raster of height values. (path_to_raster, band_index) Paths may use
any GDAL-supported scheme, including virtual file system /vsi schemes.
target_slope_path (string): path to target slope raster; will be a
32 bit float GeoTIFF of same size/projection as calculate slope
with units of percent slope.
Expand Down Expand Up @@ -686,7 +688,8 @@ def raster_band_percentile(

Parameters:
base_raster_path_band (tuple): raster path band tuple to a raster
that is of any integer or real type.
that is of any integer or real type. Paths may use any
GDAL-supported scheme, including virtual file system /vsi schemes.
working_sort_directory (str): path to a directory that does not
exist or is empty. This directory will be used to create heapfiles
with sizes no larger than ``heap_buffer_size`` which are written in the
Expand Down Expand Up @@ -745,7 +748,8 @@ def _raster_band_percentile_int(

Parameters:
base_raster_path_band (tuple): raster path band tuple to a raster that
is of an integer type.
is of an integer type. Paths may use any GDAL-supported scheme,
including virtual file system /vsi schemes.
working_sort_directory (str): path to a directory that does not
exist or is empty. This directory will be used to create heapfiles
with sizes no larger than ``heap_buffer_size`` which are written in the
Expand Down Expand Up @@ -894,7 +898,8 @@ def _raster_band_percentile_double(

Parameters:
base_raster_path_band (tuple): raster path band tuple to raster that
is a real/float type.
is a real/float type. Paths may use any GDAL-supported scheme,
including virtual file system /vsi schemes.
working_sort_directory (str): path to a directory that does not
exist or is empty. This directory will be used to create heapfiles
with sizes no larger than ``heap_buffer_size`` which are written in the
Expand Down
3 changes: 2 additions & 1 deletion src/pygeoprocessing/multiprocessing/raster_calculator.py
Original file line number Diff line number Diff line change
Expand Up @@ -398,7 +398,8 @@ def raster_calculator(
must have the same raster size. If only arrays are input, numpy
arrays must be broadcastable to each other and the final raster
size will be the final broadcast array shape. A value error is
raised if only "raw" inputs are passed.
raised if only "raw" inputs are passed. Paths may use any
GDAL-supported scheme, including virtual file system /vsi schemes.
local_op (function) a function that must take in as many parameters as
there are elements in ``base_raster_path_band_const_list``. The
parameters in ``local_op`` will map 1-to-1 in order with the values
Expand Down
3 changes: 2 additions & 1 deletion src/pygeoprocessing/routing/helper_functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,8 @@ def extract_streams_d8(
Args:
flow_accum_raster_path_band (tuple): A (path, band) tuple indicating
the path to a D8 flow accumulation raster and the band index to
use.
use. Paths may use any GDAL-supported scheme, including virtual
file system /vsi schemes.
flow_threshold (number): The flow threshold. Flow accumulation values
greater than this threshold are considered stream pixels, values
less than this threshold are non-stream pixels.
Expand Down
71 changes: 50 additions & 21 deletions src/pygeoprocessing/routing/routing.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,8 @@ cdef struct FlowPixelType:
cdef struct DecayingValue:
double decayed_value # The value, which will be progressively updated as it decays
double min_value # The minimum value before the Decaying Value should be ignored.



# This struct is used to track an intermediate flow pixel's last calculated
# direction and flow accumulation value so far (just like with FlowPixelType).
# Additionally, we track all of the decaying values from upstream that
Expand Down Expand Up @@ -667,7 +668,8 @@ def fill_pits(

Parameters:
dem_raster_path_band (tuple): a path, band number tuple indicating the
DEM calculate flow direction.
DEM calculate flow direction. Paths may use any GDAL-supported
scheme, including virtual file system /vsi schemes.
target_filled_dem_raster_path (str): path the pit filled dem,
that's created by a call to this function. It is functionally a
single band copy of ``dem_raster_path_band`` with the pit pixels
Expand Down Expand Up @@ -1107,7 +1109,8 @@ def flow_dir_d8(
Parameters:
dem_raster_path_band (tuple): a path, band number tuple indicating the
DEM calculate flow direction. This DEM must not have hydrological
pits or else the target flow direction is undefined.
pits or else the target flow direction is undefined. Paths may use
any GDAL-supported scheme, including virtual file system /vsi schemes.
target_flow_dir_path (str): path to a byte raster created by this
call of same dimensions as ``dem_raster_path_band`` that has a value
indicating the direction of downhill flow. Values are defined as
Expand Down Expand Up @@ -1483,6 +1486,8 @@ def flow_accumulation_d8(
4 x 0
5 6 7

Paths may use any GDAL-supported scheme, including virtual
file system /vsi schemes.
target_flow_accum_raster_path (str): path to flow
accumulation raster created by this call. After this call, the
value of each pixel will be 1 plus the number of upstream pixels
Expand Down Expand Up @@ -1787,7 +1792,8 @@ def flow_dir_mfd(
Parameters:
dem_raster_path_band (tuple): a path, band number tuple indicating the
DEM calculate flow direction. This DEM must not have hydrological
pits or else the target flow direction will be undefined.
pits or else the target flow direction will be undefined. Paths may
use any GDAL-supported scheme, including virtual file system /vsi schemes.
target_flow_dir_path (str): path to a raster created by this call
of a 32 bit int raster of the same dimensions and projections as
``dem_raster_path_band[0]``. The value of the pixel indicates the
Expand Down Expand Up @@ -2285,7 +2291,8 @@ def flow_accumulation_mfd(
flow_dir_mfd_raster_path_band (tuple): a path, band number tuple
for a multiple flow direction raster generated from a call to
``flow_dir_mfd``. The format of this raster is described in the
docstring of that function.
docstring of that function. Paths may use any GDAL-supported
scheme, including virtual file system /vsi schemes.
target_flow_accum_raster_path (str): a path to a raster created by
a call to this function that is the same dimensions and projection
as ``flow_dir_mfd_raster_path_band[0]``. The value in each pixel is
Expand All @@ -2302,7 +2309,8 @@ def flow_accumulation_mfd(
weight. If ``None``, 1 is the default flow accumulation weight.
This raster must be the same dimensions as
``flow_dir_mfd_raster_path_band``. If a weight nodata pixel is
encountered it will be treated as a weight value of 0.
encountered it will be treated as a weight value of 0. Paths may
use any GDAL-supported scheme, including virtual file system /vsi schemes.
raster_driver_creation_tuple (tuple): a tuple containing a GDAL driver
name string as the first element and a GDAL creation options
tuple/list as the second. Defaults to a GTiff driver tuple
Expand Down Expand Up @@ -2576,19 +2584,23 @@ def distance_to_channel_d8(
4 x 0
5 6 7

Paths may use any GDAL-supported scheme, including virtual
file system /vsi schemes.
channel_raster_path_band (tuple): a path/band tuple of the same
dimensions and projection as ``flow_dir_d8_raster_path_band[0]``
that indicates where the channels in the problem space lie. A
channel is indicated if the value of the pixel is 1. Other values
are ignored.
are ignored. Paths may use any GDAL-supported scheme, including
virtual file system /vsi schemes.
target_distance_to_channel_raster_path (str): path to a raster
created by this call that has per-pixel distances from a given
pixel to the nearest downhill channel.
weight_raster_path_band (tuple): optional path and band number to a
raster that will be used as the per-pixel flow distance
weight. If ``None``, 1 is the default distance between neighboring
pixels. This raster must be the same dimensions as
``flow_dir_mfd_raster_path_band``.
``flow_dir_mfd_raster_path_band``. Paths may use any GDAL-supported
scheme, including virtual file system /vsi schemes.
raster_driver_creation_tuple (tuple): a tuple containing a GDAL driver
name string as the first element and a GDAL creation options
tuple/list as the second. Defaults to a GTiff driver tuple
Expand Down Expand Up @@ -2798,20 +2810,23 @@ def distance_to_channel_mfd(
flow_dir_mfd_raster_path_band (tuple): a path/band index tuple
indicating the raster that defines the mfd flow accumulation
raster for this call. This raster should be generated by a call
to ``pygeoprocessing.routing.flow_dir_mfd``.
to ``pygeoprocessing.routing.flow_dir_mfd``. Paths may use any
GDAL-supported scheme, including virtual file system /vsi schemes.
channel_raster_path_band (tuple): a path/band tuple of the same
dimensions and projection as ``flow_dir_mfd_raster_path_band[0]``
that indicates where the channels in the problem space lie. A
channel is indicated if the value of the pixel is 1. Other values
are ignored.
are ignored. Paths may use any GDAL-supported scheme, including
virtual file system /vsi schemes.
target_distance_to_channel_raster_path (str): path to a raster
created by this call that has per-pixel distances from a given
pixel to the nearest downhill channel.
weight_raster_path_band (tuple): optional path and band number to a
raster that will be used as the per-pixel flow distance
weight. If ``None``, 1 is the default distance between neighboring
pixels. This raster must be the same dimensions as
``flow_dir_mfd_raster_path_band``.
``flow_dir_mfd_raster_path_band``. Paths may use any GDAL-supported
scheme, including virtual file system /vsi schemes.
raster_driver_creation_tuple (tuple): a tuple containing a GDAL driver
name string as the first element and a GDAL creation options
tuple/list as the second. Defaults to a GTiff driver tuple
Expand Down Expand Up @@ -3095,9 +3110,11 @@ def extract_streams_mfd(
a stream. Values in this raster that are >= flow_threshold will
be classified as streams. This raster should be derived from
``dem_raster_path_band`` using
``pygeoprocessing.routing.flow_accumulation_mfd``.
``pygeoprocessing.routing.flow_accumulation_mfd``. Paths may use
any GDAL-supported scheme, including virtual file system /vsi schemes.
flow_dir_mfd_path_band (str): path to multiple flow direction
raster, required to join divergent streams.
raster, required to join divergent streams. Paths may use any
GDAL-supported scheme, including virtual file system /vsi schemes.
flow_threshold (float): the value in ``flow_accum_raster_path_band`` to
indicate where a stream exists.
target_stream_raster_path (str): path to the target stream raster.
Expand Down Expand Up @@ -3336,12 +3353,15 @@ def extract_strahler_streams_d8(

Args:
flow_dir_d8_raster_path_band (tuple): a path/band representing the D8
flow direction raster.
flow direction raster. Paths may use any GDAL-supported scheme,
including virtual file system /vsi schemes.
flow_accum_raster_path_band (tuple): a path/band representing the D8
flow accumulation raster represented by
``flow_dir_d8_raster_path_band``.
``flow_dir_d8_raster_path_band``. Paths may use any GDAL-supported
scheme, including virtual file system /vsi schemes.
dem_raster_path_band (tuple): a path/band representing the DEM used to
derive flow dir.
derive flow dir. Paths may use any GDAL-supported scheme, including
virtual file system /vsi schemes.
target_stream_vector_path (tuple): a single layer line vector created
by this function representing the stream segments extracted from
the above arguments. Contains the fields "order" and "parent" as
Expand Down Expand Up @@ -3982,7 +4002,9 @@ def _build_discovery_finish_rasters(
"""Generates a discovery and finish time raster for a given d8 flow path.

Args:
flow_dir_d8_raster_path_band (tuple): a D8 flow raster path band tuple
flow_dir_d8_raster_path_band (tuple): a D8 flow raster path band tuple.
Paths may use any GDAL-supported scheme, including virtual file
system /vsi schemes.
target_discovery_raster_path (str): path to a generated raster that
creates discovery time (i.e. what count the pixel is visited in)
target_finish_raster_path (str): path to generated raster that creates
Expand Down Expand Up @@ -4120,8 +4142,11 @@ def calculate_subwatershed_boundary(

Args:
d8_flow_dir_raster_path_band (tuple): raster/path band for d8 flow dir
raster
strahler_stream_vector_path (str): path to stream segment vector
raster. Paths may use any GDAL-supported scheme, including virtual
file system /vsi schemes.
strahler_stream_vector_path (str): path to stream segment vector. Paths
may use any GDAL-supported scheme, including virtual file system
/vsi schemes.
target_watershed_boundary_vector_path (str): path to created vector
of linestring for watershed boundaries. Contains the fields:

Expand All @@ -4138,6 +4163,8 @@ def calculate_subwatershed_boundary(
with underlying raster data that created the streams in
``strahler_stream_vector_path``.

Paths may use any GDAL-supported scheme, including virtual file
system /vsi schemes.
max_steps_per_watershed (int): maximum number of steps to take when
defining a watershed boundary. Useful if the DEM is large and
degenerate or some other user known condition to limit long large
Expand Down Expand Up @@ -4465,7 +4492,8 @@ def detect_lowest_drain_and_sink(dem_raster_path_band):

Args:
dem_raster_path_band (tuple): a raster/path band tuple to detect
sinks in.
sinks in. Paths may use any GDAL-supported scheme, including
virtual file system /vsi schemes.

Return:
(drain_pixel, drain_height, sink_pixel, sink_height) -
Expand Down Expand Up @@ -4575,7 +4603,8 @@ def detect_outlets(
Args:
flow_dir_raster_path_band (tuple): raster path/band tuple
indicating D8 or MFD flow direction created by
`routing.flow_dir_d8` or `routing.flow_dir_mfd`.
`routing.flow_dir_d8` or `routing.flow_dir_mfd`. Paths may use
any GDAL-supported scheme, including virtual file system /vsi schemes.
flow_dir_type (str): one of 'd8' or 'mfd' to indicate the
``flow_dir_raster_path_band`` is either a D8 or MFD flow
direction raster.
Expand Down
18 changes: 9 additions & 9 deletions src/pygeoprocessing/routing/watershed.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -645,15 +645,15 @@ def delineate_watersheds_d8(
d8_flow_dir_raster_path_band (tuple): A (path, band_id) tuple
to a D8 flow direction raster. This raster must be a tiled raster
with block sizes being a power of 2. The output watersheds vector
will have its spatial reference copied from this raster.
outflow_vector_path (str): The path to a vector on disk containing
features with valid geometries from which watersheds will be
delineated. Only those parts of the geometry that overlap valid
flow direction pixels will be included in the output watersheds
vector.
target_watersheds_vector_path (str): The path to a vector on disk
where the target watersheds will be stored. Must have the
extension ``.gpkg``.
will have its spatial reference copied from this raster. Paths may
use any GDAL-supported scheme, including virtual file system /vsi schemes.
outflow_vector_path (str): The path to a vector containing features
with valid geometries from which watersheds will be delineated.
Only those parts of the geometry that overlap valid flow direction
pixels will be included in the output watersheds vector. Paths may
use any GDAL-supported scheme, including virtual file system /vsi schemes.
target_watersheds_vector_path (str): The path to a vector where the
target watersheds will be stored. Must have the extension ``.gpkg``.
working_dir=None (str or None): The path to a directory on disk
within which various intermediate files will be stored. If None,
a folder will be created within the system's temp directory.
Expand Down
Loading
Loading