Skip to content

Commit e83d03a

Browse files
committed
Clarify the bounding box behavior
1 parent 9467351 commit e83d03a

File tree

1 file changed

+30
-13
lines changed

1 file changed

+30
-13
lines changed

Geospatial.md

Lines changed: 30 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,36 @@ Bounding box is defined as the thrift struct below in the representation of
9494
min/max value pair of coordinates from each axis. Note that X and Y Values are
9595
always present. Z and M are omitted for 2D geospatial instances.
9696

97+
Writers should follow the guidelines below when calculating bounding boxes in
98+
the presence of invalid values. An invalid geospatial value refers to any of
99+
the following: `NaN`, `null`, `does not exist` (e.g., LINESTRING EMPTY), or
100+
`out of bounds` (e.g., `x < -180` or `x > 180` for `GEOGRAPHY` types):
101+
102+
* X and Y: Skip any invalid X or Y value and processing the remaining X or Y
103+
values. Do not produce a bounding box if all X or all Y values are invalid.
104+
105+
* Z: Skip any invalid Z value and continue processing the remaining Z values.
106+
Omit Z from the bounding box if all Z values are invalid.
107+
108+
* M: Skip any invalid M value and continue processing the remaining M values.
109+
Omit M from the bounding box if all M values are invalid.
110+
111+
Readers should follow the guidelines below when examining bounding boxes:
112+
113+
* No bounding box: No assumptions can be made about the presence or absence
114+
of invalid values. Readers may need to load all individual coordinate
115+
values for validation.
116+
117+
* A bounding box is present:
118+
* X and Y: X and Y of the bounding box must be present. Readers should
119+
proceed using these values.
120+
* Z: If Z of the bounding box are missing, readers should make no
121+
assumptions about invalid values and may need to load individual
122+
coordinates for validation.
123+
* M: If M of the bounding box are missing, readers should make no
124+
assumptions about invalid values and may need to load individual
125+
coordinates for validation.
126+
97127
For the X values only, xmin may be greater than xmax. In this case, an object
98128
in this bounding box may match if it contains an X such that `x >= xmin` OR
99129
`x <= xmax`. This wraparound occurs only when the corresponding bounding box
@@ -104,19 +134,6 @@ crosses the antimeridian line. In geographic terminology, the concepts of `xmin`
104134
For `GEOGRAPHY` types, X and Y values are restricted to the canonical ranges of
105135
[-180, 180] for X and [-90, 90] for Y.
106136

107-
To produce `GeospatialStatistics`, writers must omit zmin and zmax if and
108-
only if there are zero non-NaN Z values in the column chunk, and must omit mmin
109-
and mmax if and only if there are zero non-NaN M values. The bounding box must
110-
be omitted entirely if and only if there are zero non-NaN X values or zero
111-
non-NaN Y values in the column chunk. If Z or M values are missing, the writer
112-
may still include a bounding box using only the available dimensions.
113-
114-
Readers may interpret the absence of a bounding box, zmin/zmax, or mmin/mmax as
115-
an indication that all corresponding values are null, and may use this
116-
information to skip data during predicate evaluation. For example, a reader may
117-
skip a row group if the bounding box is absent, indicating that all X and Y
118-
coordinates are null.
119-
120137
```thrift
121138
struct BoundingBox {
122139
1: required double xmin;

0 commit comments

Comments
 (0)