Skip to content

Commit 2e28c22

Browse files
committed
Clarify the bounding box behavior
1 parent 9467351 commit 2e28c22

File tree

1 file changed

+31
-13
lines changed

1 file changed

+31
-13
lines changed

Geospatial.md

Lines changed: 31 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,37 @@ Bounding box is defined as the thrift struct below in the representation of
9494
min/max value pair of coordinates from each axis. Note that X and Y Values are
9595
always present. Z and M are omitted for 2D geospatial instances.
9696

97+
Writers should follow the guidelines below when calculating bounding boxes in
98+
the presence of invalid values. An invalid geospatial value refers to any of
99+
the following: `NaN`, `null`, `does not exist` (e.g., LINESTRING EMPTY), or
100+
`out of bounds` (e.g., `x < -180` or `x > 180` for `GEOGRAPHY` types):
101+
102+
* X and Y: Skip any value where X or Y is invalid and processing the
103+
remaining X or Y values. Do not produce a bounding box if all X or all Y
104+
values are invalid.
105+
106+
* Z: Skip any Z value that is invalid and continue processing the remaining
107+
Z values. Omit Z from the bounding box if all Z values are invalid
108+
109+
* M: Skip any M value that is invalid and continue processing the remaining
110+
M values. Omit M from the bounding box if all M values are invalid
111+
112+
Readers should follow the guidelines below when examining bounding boxes:
113+
114+
* No bounding box: No assumptions can be made about the presence or absence
115+
of invalid values. Readers may need to load all individual coordinate
116+
values for validation.
117+
118+
* A bounding box is present:
119+
* X and Y: X and Y of the bounding box must be present. Readers should
120+
proceed using these values.
121+
* Z: If Z of the bounding box are missing, readers should make no
122+
assumptions about invalid values and may need to load individual
123+
coordinates for validation.
124+
* M: If M of the bounding box are missing, readers should make no
125+
assumptions about invalid values and may need to load individual
126+
coordinates for validation.
127+
97128
For the X values only, xmin may be greater than xmax. In this case, an object
98129
in this bounding box may match if it contains an X such that `x >= xmin` OR
99130
`x <= xmax`. This wraparound occurs only when the corresponding bounding box
@@ -104,19 +135,6 @@ crosses the antimeridian line. In geographic terminology, the concepts of `xmin`
104135
For `GEOGRAPHY` types, X and Y values are restricted to the canonical ranges of
105136
[-180, 180] for X and [-90, 90] for Y.
106137

107-
To produce `GeospatialStatistics`, writers must omit zmin and zmax if and
108-
only if there are zero non-NaN Z values in the column chunk, and must omit mmin
109-
and mmax if and only if there are zero non-NaN M values. The bounding box must
110-
be omitted entirely if and only if there are zero non-NaN X values or zero
111-
non-NaN Y values in the column chunk. If Z or M values are missing, the writer
112-
may still include a bounding box using only the available dimensions.
113-
114-
Readers may interpret the absence of a bounding box, zmin/zmax, or mmin/mmax as
115-
an indication that all corresponding values are null, and may use this
116-
information to skip data during predicate evaluation. For example, a reader may
117-
skip a row group if the bounding box is absent, indicating that all X and Y
118-
coordinates are null.
119-
120138
```thrift
121139
struct BoundingBox {
122140
1: required double xmin;

0 commit comments

Comments
 (0)