From 90b1ca964358a49f48f04cec24763e2a407e3b62 Mon Sep 17 00:00:00 2001 From: Jia Yu Date: Wed, 16 Apr 2025 10:38:56 -0700 Subject: [PATCH 01/14] Add more explanation --- Geospatial.md | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/Geospatial.md b/Geospatial.md index 4be4a38e..f0dc6fb2 100644 --- a/Geospatial.md +++ b/Geospatial.md @@ -104,6 +104,19 @@ crosses the antimeridian line. In geographic terminology, the concepts of `xmin` For `GEOGRAPHY` types, X and Y values are restricted to the canonical ranges of [-180, 180] for X and [-90, 90] for Y. +When `GeospatialStatistics` is present, writers must omit zmin and zmax if and +only if there are zero non-NaN Z values in the column chunk, and must omit mmin +and mmax if and only if there are zero non-NaN M values. The bounding box must +be omitted entirely if and only if there are zero non-NaN X values or zero +non-NaN Y values in the column chunk. If Z or M values are missing, the writer +may still include a bounding box using only the available dimensions. + +Readers may interpret the absence of a bounding box, zmin/zmax, or mmin/mmax as +an indication that all corresponding values are null, and may use this +information to skip data during predicate evaluation. For example, a reader may +skip a row group if the bounding box is absent, indicating that all X and Y +coordinates are null. + ```thrift struct BoundingBox { 1: required double xmin; From 9467351e12932007f447f6c4eb196a2892569b72 Mon Sep 17 00:00:00 2001 From: Jia Yu Date: Thu, 17 Apr 2025 15:54:14 -0700 Subject: [PATCH 02/14] Update Geospatial.md Co-authored-by: Gang Wu --- Geospatial.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Geospatial.md b/Geospatial.md index f0dc6fb2..3eebeff8 100644 --- a/Geospatial.md +++ b/Geospatial.md @@ -104,7 +104,7 @@ crosses the antimeridian line. In geographic terminology, the concepts of `xmin` For `GEOGRAPHY` types, X and Y values are restricted to the canonical ranges of [-180, 180] for X and [-90, 90] for Y. -When `GeospatialStatistics` is present, writers must omit zmin and zmax if and +To produce `GeospatialStatistics`, writers must omit zmin and zmax if and only if there are zero non-NaN Z values in the column chunk, and must omit mmin and mmax if and only if there are zero non-NaN M values. The bounding box must be omitted entirely if and only if there are zero non-NaN X values or zero From e83d03abb5865c51e35b5b3c31adfaa7ca7a08f2 Mon Sep 17 00:00:00 2001 From: Jia Yu Date: Sun, 20 Apr 2025 23:53:00 -0700 Subject: [PATCH 03/14] Clarify the bounding box behavior --- Geospatial.md | 43 ++++++++++++++++++++++++++++++------------- 1 file changed, 30 insertions(+), 13 deletions(-) diff --git a/Geospatial.md b/Geospatial.md index 3eebeff8..8b8e0c2b 100644 --- a/Geospatial.md +++ b/Geospatial.md @@ -94,6 +94,36 @@ Bounding box is defined as the thrift struct below in the representation of min/max value pair of coordinates from each axis. Note that X and Y Values are always present. Z and M are omitted for 2D geospatial instances. +Writers should follow the guidelines below when calculating bounding boxes in +the presence of invalid values. An invalid geospatial value refers to any of +the following: `NaN`, `null`, `does not exist` (e.g., LINESTRING EMPTY), or +`out of bounds` (e.g., `x < -180` or `x > 180` for `GEOGRAPHY` types): + +* X and Y: Skip any invalid X or Y value and processing the remaining X or Y + values. Do not produce a bounding box if all X or all Y values are invalid. + +* Z: Skip any invalid Z value and continue processing the remaining Z values. + Omit Z from the bounding box if all Z values are invalid. + +* M: Skip any invalid M value and continue processing the remaining M values. + Omit M from the bounding box if all M values are invalid. + +Readers should follow the guidelines below when examining bounding boxes: + +* No bounding box: No assumptions can be made about the presence or absence + of invalid values. Readers may need to load all individual coordinate + values for validation. + +* A bounding box is present: + * X and Y: X and Y of the bounding box must be present. Readers should + proceed using these values. + * Z: If Z of the bounding box are missing, readers should make no + assumptions about invalid values and may need to load individual + coordinates for validation. + * M: If M of the bounding box are missing, readers should make no + assumptions about invalid values and may need to load individual + coordinates for validation. + For the X values only, xmin may be greater than xmax. In this case, an object in this bounding box may match if it contains an X such that `x >= xmin` OR `x <= xmax`. This wraparound occurs only when the corresponding bounding box @@ -104,19 +134,6 @@ crosses the antimeridian line. In geographic terminology, the concepts of `xmin` For `GEOGRAPHY` types, X and Y values are restricted to the canonical ranges of [-180, 180] for X and [-90, 90] for Y. -To produce `GeospatialStatistics`, writers must omit zmin and zmax if and -only if there are zero non-NaN Z values in the column chunk, and must omit mmin -and mmax if and only if there are zero non-NaN M values. The bounding box must -be omitted entirely if and only if there are zero non-NaN X values or zero -non-NaN Y values in the column chunk. If Z or M values are missing, the writer -may still include a bounding box using only the available dimensions. - -Readers may interpret the absence of a bounding box, zmin/zmax, or mmin/mmax as -an indication that all corresponding values are null, and may use this -information to skip data during predicate evaluation. For example, a reader may -skip a row group if the bounding box is absent, indicating that all X and Y -coordinates are null. - ```thrift struct BoundingBox { 1: required double xmin; From 87ee1a6094b87eac25783e6b51fce4e2639e05e9 Mon Sep 17 00:00:00 2001 From: Jia Yu Date: Wed, 23 Apr 2025 11:21:12 -0700 Subject: [PATCH 04/14] Update Geospatial.md Co-authored-by: Gang Wu --- Geospatial.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Geospatial.md b/Geospatial.md index 8b8e0c2b..b5026478 100644 --- a/Geospatial.md +++ b/Geospatial.md @@ -99,7 +99,7 @@ the presence of invalid values. An invalid geospatial value refers to any of the following: `NaN`, `null`, `does not exist` (e.g., LINESTRING EMPTY), or `out of bounds` (e.g., `x < -180` or `x > 180` for `GEOGRAPHY` types): -* X and Y: Skip any invalid X or Y value and processing the remaining X or Y +* X and Y: Skip any invalid X or Y value and continue processing the remaining X or Y values. Do not produce a bounding box if all X or all Y values are invalid. * Z: Skip any invalid Z value and continue processing the remaining Z values. From 47725e2300392ad0df55e8eb756afcb4d883d943 Mon Sep 17 00:00:00 2001 From: Jia Yu Date: Wed, 23 Apr 2025 16:20:05 -0700 Subject: [PATCH 05/14] Clarify the definition of invalid values --- Geospatial.md | 43 +++++++++++++++++++++++++++++-------------- 1 file changed, 29 insertions(+), 14 deletions(-) diff --git a/Geospatial.md b/Geospatial.md index b5026478..00ea3afe 100644 --- a/Geospatial.md +++ b/Geospatial.md @@ -95,9 +95,7 @@ min/max value pair of coordinates from each axis. Note that X and Y Values are always present. Z and M are omitted for 2D geospatial instances. Writers should follow the guidelines below when calculating bounding boxes in -the presence of invalid values. An invalid geospatial value refers to any of -the following: `NaN`, `null`, `does not exist` (e.g., LINESTRING EMPTY), or -`out of bounds` (e.g., `x < -180` or `x > 180` for `GEOGRAPHY` types): +the presence of [invalid geospatial values](#invalid-geospatial-values). * X and Y: Skip any invalid X or Y value and continue processing the remaining X or Y values. Do not produce a bounding box if all X or all Y values are invalid. @@ -108,21 +106,23 @@ the following: `NaN`, `null`, `does not exist` (e.g., LINESTRING EMPTY), or * M: Skip any invalid M value and continue processing the remaining M values. Omit M from the bounding box if all M values are invalid. -Readers should follow the guidelines below when examining bounding boxes: +Readers should follow the guidelines below when examining bounding boxes. If +a bounding box is [invalid](#invalid-geospatial-values), it is treated as a `no +bounding box` case. -* No bounding box: No assumptions can be made about the presence or absence - of invalid values. Readers may need to load all individual coordinate +* No bounding box: No assumptions can be made about the presence or validity + of coordinate values. Readers may need to load all individual coordinate values for validation. * A bounding box is present: - * X and Y: X and Y of the bounding box must be present. Readers should - proceed using these values. - * Z: If Z of the bounding box are missing, readers should make no - assumptions about invalid values and may need to load individual - coordinates for validation. - * M: If M of the bounding box are missing, readers should make no - assumptions about invalid values and may need to load individual - coordinates for validation. + * X and Y: Both X and Y of the bounding box must be present. If either + is missing, the bounding box is invalid. + * Z: If Z of the bounding box is missing, readers should not assume + anything about the presence or validity of Z values and may need to + load individual coordinates for validation. + * M: If M of the bounding box is missing, readers should not assume + anything about the presence or validity of M values and may need to + load individual coordinates for validation. For the X values only, xmin may be greater than xmax. In this case, an object in this bounding box may match if it contains an X such that `x >= xmin` OR @@ -192,3 +192,18 @@ The axis order of the coordinates in WKB and bounding box stored in Parquet follows the de facto standard for axis order in WKB and is therefore always (x, y) where x is easting or longitude and y is northing or latitude. This ordering explicitly overrides the axis order as specified in the CRS. + +# Invalid geospatial values + +An invalid geospatial value refers to any of the following cases: + +* `null`: A null value in Parquet. +* A non-`null` value that are encoded in a valid WKB or bounding box format + but are not considered valid under this specification, including: + * `NaN`: Not a Number. For example, `POINT EMPTY` in WKB is represented by a + `Point` with each ordinate value set to an IEEE-754 quiet NaN value. + * `Empty geometries`: Geometries explicitly marked as empty in WKB using + indicators such as `numPoints`, `numRings`, or `numGeometries`. Examples + include `LINESTRING EMPTY` or `POLYGON EMPTY`. + * `Out-of-bounds coordinates`: Values that fall outside the valid range + for `GEOGRAPHY` types. For example, `x < -180` or `x > 180`. \ No newline at end of file From 2558f3136f6957e667c01a8457272a71bc3de035 Mon Sep 17 00:00:00 2001 From: Jia Yu Date: Fri, 25 Apr 2025 00:36:04 -0700 Subject: [PATCH 06/14] Simplify the definition of invalid values --- Geospatial.md | 52 +++++++++++++++++++++++++++------------------------ 1 file changed, 28 insertions(+), 24 deletions(-) diff --git a/Geospatial.md b/Geospatial.md index 00ea3afe..9a9d6005 100644 --- a/Geospatial.md +++ b/Geospatial.md @@ -95,28 +95,31 @@ min/max value pair of coordinates from each axis. Note that X and Y Values are always present. Z and M are omitted for 2D geospatial instances. Writers should follow the guidelines below when calculating bounding boxes in -the presence of [invalid geospatial values](#invalid-geospatial-values). +the presence of edge cases. -* X and Y: Skip any invalid X or Y value and continue processing the remaining X or Y - values. Do not produce a bounding box if all X or all Y values are invalid. +* `null` instance: Skip it and continue processing the remaining + geospatial instances. Do not produce a bounding box if all instances are null. +* Non-`null` instance with [invalid geospatial values](#invalid-geospatial-values): + * X and Y: Skip any invalid X or Y value and continue processing the + remaining X or Y values. Do not produce a bounding box if all X or all Y + values are invalid. -* Z: Skip any invalid Z value and continue processing the remaining Z values. - Omit Z from the bounding box if all Z values are invalid. + * Z: Skip any invalid Z value and continue processing the remaining Z values. + Omit Z from the bounding box if all Z values are invalid. -* M: Skip any invalid M value and continue processing the remaining M values. - Omit M from the bounding box if all M values are invalid. + * M: Skip any invalid M value and continue processing the remaining M values. + Omit M from the bounding box if all M values are invalid. -Readers should follow the guidelines below when examining bounding boxes. If -a bounding box is [invalid](#invalid-geospatial-values), it is treated as a `no -bounding box` case. +Readers should follow the guidelines below when examining bounding boxes. +Parquet does not permit `null` or `NaN` values in bounding boxes, whether at +the overall bounding box level or within individual coordinate fields. * No bounding box: No assumptions can be made about the presence or validity of coordinate values. Readers may need to load all individual coordinate values for validation. * A bounding box is present: - * X and Y: Both X and Y of the bounding box must be present. If either - is missing, the bounding box is invalid. + * X and Y: Both X and Y of the bounding box must be present. * Z: If Z of the bounding box is missing, readers should not assume anything about the presence or validity of Z values and may need to load individual coordinates for validation. @@ -195,15 +198,16 @@ ordering explicitly overrides the axis order as specified in the CRS. # Invalid geospatial values -An invalid geospatial value refers to any of the following cases: - -* `null`: A null value in Parquet. -* A non-`null` value that are encoded in a valid WKB or bounding box format - but are not considered valid under this specification, including: - * `NaN`: Not a Number. For example, `POINT EMPTY` in WKB is represented by a - `Point` with each ordinate value set to an IEEE-754 quiet NaN value. - * `Empty geometries`: Geometries explicitly marked as empty in WKB using - indicators such as `numPoints`, `numRings`, or `numGeometries`. Examples - include `LINESTRING EMPTY` or `POLYGON EMPTY`. - * `Out-of-bounds coordinates`: Values that fall outside the valid range - for `GEOGRAPHY` types. For example, `x < -180` or `x > 180`. \ No newline at end of file +An invalid geospatial value refers to the coordinate values of a non-`null` +geospatial instance that are encoded in a valid WKB format, but are not +considered valid values under this specification. While different WKB +readers may interpret such values differently, the resulting output should +be treated as invalid. + +* `NaN`: Not a Number. For example, `POINT EMPTY` in WKB is represented by a + `Point` with each ordinate value set to an IEEE-754 quiet NaN value. +* `Empty geometries`: Geometries explicitly marked as empty in WKB using + indicators such as `numPoints`, `numRings`, or `numGeometries`. Examples + include `LINESTRING EMPTY` or `POLYGON EMPTY`. +* `Out-of-bounds coordinates`: Values that fall outside the valid range + for `GEOGRAPHY` types. For example, `x < -180` or `x > 180`. \ No newline at end of file From 96962691b48b838cce318940bf8192ae906efcd2 Mon Sep 17 00:00:00 2001 From: Jia Yu Date: Sun, 27 Apr 2025 23:08:58 -0700 Subject: [PATCH 07/14] Add canonical values for invalid examples --- Geospatial.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/Geospatial.md b/Geospatial.md index 9a9d6005..3aa956c4 100644 --- a/Geospatial.md +++ b/Geospatial.md @@ -204,10 +204,12 @@ considered valid values under this specification. While different WKB readers may interpret such values differently, the resulting output should be treated as invalid. -* `NaN`: Not a Number. For example, `POINT EMPTY` in WKB is represented by a - `Point` with each ordinate value set to an IEEE-754 quiet NaN value. +* `NaN`: Not a Number. For example, a `Point` with no X and Y values in WKB is + represented by a `Point` with each ordinate value set to an IEEE-754 quiet + NaN value (hex: `01 01 00 00 00 00 00 00 00 00 00 00 f8 7f 00 00 00 00 00 00 f8 7f`). * `Empty geometries`: Geometries explicitly marked as empty in WKB using indicators such as `numPoints`, `numRings`, or `numGeometries`. Examples - include `LINESTRING EMPTY` or `POLYGON EMPTY`. + include `LineString` with no coordinates (hex: `01 02 00 00 00 00 00 00 + 00`) or `Polygon` with no coordinates (hex: `01 03 00 00 00 00 00 00 00`). * `Out-of-bounds coordinates`: Values that fall outside the valid range for `GEOGRAPHY` types. For example, `x < -180` or `x > 180`. \ No newline at end of file From 21d3d49c802b35653f05d18c0478bc2c397dcda9 Mon Sep 17 00:00:00 2001 From: Jia Yu Date: Tue, 29 Apr 2025 23:53:49 -0700 Subject: [PATCH 08/14] Update Geospatial.md Co-authored-by: Dewey Dunnington --- Geospatial.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/Geospatial.md b/Geospatial.md index 3aa956c4..4ef34f3b 100644 --- a/Geospatial.md +++ b/Geospatial.md @@ -205,8 +205,10 @@ readers may interpret such values differently, the resulting output should be treated as invalid. * `NaN`: Not a Number. For example, a `Point` with no X and Y values in WKB is - represented by a `Point` with each ordinate value set to an IEEE-754 quiet - NaN value (hex: `01 01 00 00 00 00 00 00 00 00 00 00 f8 7f 00 00 00 00 00 00 f8 7f`). + represented by a `Point` with each ordinate value set to an IEEE-754 + NaN value (e.g., hex: `01 01 00 00 00 00 00 00 00 00 00 00 f8 7f 00 00 00 00 00 00 f8 7f`). + NaN values in other geometry types are typically considered invalid geometries by other + libraries. * `Empty geometries`: Geometries explicitly marked as empty in WKB using indicators such as `numPoints`, `numRings`, or `numGeometries`. Examples include `LineString` with no coordinates (hex: `01 02 00 00 00 00 00 00 From 31f29a3467cc9b946db2ce7cf9cfc9814a4e158d Mon Sep 17 00:00:00 2001 From: Jia Yu Date: Wed, 30 Apr 2025 00:16:48 -0700 Subject: [PATCH 09/14] Clarify the special values --- Geospatial.md | 42 +++++++++++++++++++++--------------------- 1 file changed, 21 insertions(+), 21 deletions(-) diff --git a/Geospatial.md b/Geospatial.md index 4ef34f3b..9c6d8937 100644 --- a/Geospatial.md +++ b/Geospatial.md @@ -99,27 +99,27 @@ the presence of edge cases. * `null` instance: Skip it and continue processing the remaining geospatial instances. Do not produce a bounding box if all instances are null. -* Non-`null` instance with [invalid geospatial values](#invalid-geospatial-values): - * X and Y: Skip any invalid X or Y value and continue processing the +* Non-`null` instance with [special geospatial values](#special-geospatial-values): + * X and Y: Skip any special X or Y value and continue processing the remaining X or Y values. Do not produce a bounding box if all X or all Y - values are invalid. + values are special values. - * Z: Skip any invalid Z value and continue processing the remaining Z values. - Omit Z from the bounding box if all Z values are invalid. + * Z: Skip any special Z value and continue processing the remaining Z values. + Omit Z from the bounding box if all Z values are special values. - * M: Skip any invalid M value and continue processing the remaining M values. - Omit M from the bounding box if all M values are invalid. + * M: Skip any special M value and continue processing the remaining M values. + Omit M from the bounding box if all M values are special values. -Readers should follow the guidelines below when examining bounding boxes. -Parquet does not permit `null` or `NaN` values in bounding boxes, whether at -the overall bounding box level or within individual coordinate fields. +Readers should follow the guidelines below when examining bounding boxes. * No bounding box: No assumptions can be made about the presence or validity of coordinate values. Readers may need to load all individual coordinate values for validation. * A bounding box is present: - * X and Y: Both X and Y of the bounding box must be present. + * X and Y: Both X and Y of the bounding box must be present. If any of + `xmin`, `ymin`, `xmax`, or `ymax` is `NaN`, the bounding box is not + reliable and should not be used. * Z: If Z of the bounding box is missing, readers should not assume anything about the presence or validity of Z values and may need to load individual coordinates for validation. @@ -196,22 +196,22 @@ follows the de facto standard for axis order in WKB and is therefore always (x, y) where x is easting or longitude and y is northing or latitude. This ordering explicitly overrides the axis order as specified in the CRS. -# Invalid geospatial values +# Special geospatial values -An invalid geospatial value refers to the coordinate values of a non-`null` -geospatial instance that are encoded in a valid WKB format, but are not -considered valid values under this specification. While different WKB -readers may interpret such values differently, the resulting output should -be treated as invalid. +A special geospatial value refers to the coordinate values of a +non-`null` geospatial instance that should be excluded from bounding box +calculations. -* `NaN`: Not a Number. For example, a `Point` with no X and Y values in WKB is +* `NaN`: Not a Number. A `Point` with no X and Y values in WKB is represented by a `Point` with each ordinate value set to an IEEE-754 NaN value (e.g., hex: `01 01 00 00 00 00 00 00 00 00 00 00 f8 7f 00 00 00 00 00 00 f8 7f`). - NaN values in other geometry types are typically considered invalid geometries by other - libraries. + NaN values in other geometry types are typically considered invalid + geometries by other libraries. * `Empty geometries`: Geometries explicitly marked as empty in WKB using indicators such as `numPoints`, `numRings`, or `numGeometries`. Examples include `LineString` with no coordinates (hex: `01 02 00 00 00 00 00 00 00`) or `Polygon` with no coordinates (hex: `01 03 00 00 00 00 00 00 00`). * `Out-of-bounds coordinates`: Values that fall outside the valid range - for `GEOGRAPHY` types. For example, `x < -180` or `x > 180`. \ No newline at end of file + for `GEOGRAPHY` types. For example, `x < -180` or `x > 180`. +* Any invalid WKB representation of a geospatial instance, such as an empty + string. \ No newline at end of file From 7b24ac9d92ac545d4b3b5fc93c12fc3c99405760 Mon Sep 17 00:00:00 2001 From: Jia Yu Date: Wed, 30 Apr 2025 08:41:45 -0700 Subject: [PATCH 10/14] Update Geospatial.md Co-authored-by: Gang Wu --- Geospatial.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Geospatial.md b/Geospatial.md index 9c6d8937..f147569a 100644 --- a/Geospatial.md +++ b/Geospatial.md @@ -203,7 +203,7 @@ non-`null` geospatial instance that should be excluded from bounding box calculations. * `NaN`: Not a Number. A `Point` with no X and Y values in WKB is - represented by a `Point` with each ordinate value set to an IEEE-754 + represented by a `Point` with each coordinate value set to an IEEE-754 NaN value (e.g., hex: `01 01 00 00 00 00 00 00 00 00 00 00 f8 7f 00 00 00 00 00 00 f8 7f`). NaN values in other geometry types are typically considered invalid geometries by other libraries. From 0764b0cc245e31b6935f3a47a537609128d7f19d Mon Sep 17 00:00:00 2001 From: Jia Yu Date: Wed, 30 Apr 2025 08:50:57 -0700 Subject: [PATCH 11/14] Clarify the invalid values in Z and M of a bbox --- Geospatial.md | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/Geospatial.md b/Geospatial.md index f147569a..bd8f66be 100644 --- a/Geospatial.md +++ b/Geospatial.md @@ -120,12 +120,14 @@ Readers should follow the guidelines below when examining bounding boxes. * X and Y: Both X and Y of the bounding box must be present. If any of `xmin`, `ymin`, `xmax`, or `ymax` is `NaN`, the bounding box is not reliable and should not be used. - * Z: If Z of the bounding box is missing, readers should not assume - anything about the presence or validity of Z values and may need to - load individual coordinates for validation. - * M: If M of the bounding box is missing, readers should not assume - anything about the presence or validity of M values and may need to - load individual coordinates for validation. + * Z: If Z of the bounding box is missing or either `zmin` or `zmax` is + `NaN`, readers should not assume anything about the presence or + validity of Z values and may need to load individual coordinates for + validation. + * M: If M of the bounding box is missing or either `mmin` or `mmax` is + `NaN`, readers should not assume anything about the presence or + validity of M values and may need to load individual coordinates for + validation. For the X values only, xmin may be greater than xmax. In this case, an object in this bounding box may match if it contains an X such that `x >= xmin` OR From 7b1899953cc2fc058a1c917b94c9819116753154 Mon Sep 17 00:00:00 2001 From: Jia Yu Date: Thu, 1 May 2025 07:18:03 -0700 Subject: [PATCH 12/14] Differentiate coordinates and axis values --- Geospatial.md | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/Geospatial.md b/Geospatial.md index bd8f66be..0e6ab125 100644 --- a/Geospatial.md +++ b/Geospatial.md @@ -86,7 +86,7 @@ The Z values introduce the third dimension coordinate. Usually they are used to indicate the height, or elevation. M values are an opportunity for a geospatial instance to express a fourth -dimension as a coordinate value. These values can be used as a linear reference +dimension as an axis value. These values can be used as a linear reference value (e.g., highway milepost value), a timestamp, or some other value as defined by the CRS. @@ -113,7 +113,7 @@ the presence of edge cases. Readers should follow the guidelines below when examining bounding boxes. * No bounding box: No assumptions can be made about the presence or validity - of coordinate values. Readers may need to load all individual coordinate + of geospatial values. Readers may need to load all individual coordinate values for validation. * A bounding box is present: @@ -200,12 +200,15 @@ ordering explicitly overrides the axis order as specified in the CRS. # Special geospatial values -A special geospatial value refers to the coordinate values of a -non-`null` geospatial instance that should be excluded from bounding box -calculations. +A special geospatial value refers to an individual axis value (e.g., X, Y, Z, +or M) within a coordinate of a non-`null` geospatial instance. These special +values are excluded from bounding box calculations. For example, in a +`LineString` instance with XY coordinates `[(1, 2), (NaN, 3), (4, 5)]`, the +`NaN` value on the X axis will be excluded from the bounding box calculation, +while all other axis values will be included. * `NaN`: Not a Number. A `Point` with no X and Y values in WKB is - represented by a `Point` with each coordinate value set to an IEEE-754 + represented by a `Point` with each axis value set to an IEEE-754 NaN value (e.g., hex: `01 01 00 00 00 00 00 00 00 00 00 00 f8 7f 00 00 00 00 00 00 f8 7f`). NaN values in other geometry types are typically considered invalid geometries by other libraries. From 3b9a069e0834886f4ce8a2b6399164f49fc7e3e6 Mon Sep 17 00:00:00 2001 From: Jia Yu Date: Sun, 4 May 2025 22:56:38 -0700 Subject: [PATCH 13/14] Reword axis values --- Geospatial.md | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/Geospatial.md b/Geospatial.md index 0e6ab125..e03dd7a2 100644 --- a/Geospatial.md +++ b/Geospatial.md @@ -85,10 +85,9 @@ associated with each point. The Z values introduce the third dimension coordinate. Usually they are used to indicate the height, or elevation. -M values are an opportunity for a geospatial instance to express a fourth -dimension as an axis value. These values can be used as a linear reference -value (e.g., highway milepost value), a timestamp, or some other value as defined -by the CRS. +M values are an opportunity for a geospatial instance to track a value in a +fourth dimension. These values can be used as a linear reference value (e.g., +highway milepost value), a timestamp, or some other value as defined by the CRS. Bounding box is defined as the thrift struct below in the representation of min/max value pair of coordinates from each axis. Note that X and Y Values are @@ -200,15 +199,15 @@ ordering explicitly overrides the axis order as specified in the CRS. # Special geospatial values -A special geospatial value refers to an individual axis value (e.g., X, Y, Z, +A special geospatial value refers to an individual scalar value (e.g., X, Y, Z, or M) within a coordinate of a non-`null` geospatial instance. These special values are excluded from bounding box calculations. For example, in a `LineString` instance with XY coordinates `[(1, 2), (NaN, 3), (4, 5)]`, the `NaN` value on the X axis will be excluded from the bounding box calculation, -while all other axis values will be included. +while all other scalar values will be included. * `NaN`: Not a Number. A `Point` with no X and Y values in WKB is - represented by a `Point` with each axis value set to an IEEE-754 + represented by a `Point` with each scalar value set to an IEEE-754 NaN value (e.g., hex: `01 01 00 00 00 00 00 00 00 00 00 00 f8 7f 00 00 00 00 00 00 f8 7f`). NaN values in other geometry types are typically considered invalid geometries by other libraries. From 2228707baf65c90ded98220ba22175614ee71938 Mon Sep 17 00:00:00 2001 From: Jia Yu Date: Mon, 12 May 2025 13:43:52 -0700 Subject: [PATCH 14/14] Shorten the description according to the Iceberg PR --- Geospatial.md | 63 +++++---------------------------------------------- 1 file changed, 6 insertions(+), 57 deletions(-) diff --git a/Geospatial.md b/Geospatial.md index e03dd7a2..597ba2b5 100644 --- a/Geospatial.md +++ b/Geospatial.md @@ -93,40 +93,12 @@ Bounding box is defined as the thrift struct below in the representation of min/max value pair of coordinates from each axis. Note that X and Y Values are always present. Z and M are omitted for 2D geospatial instances. -Writers should follow the guidelines below when calculating bounding boxes in -the presence of edge cases. - -* `null` instance: Skip it and continue processing the remaining - geospatial instances. Do not produce a bounding box if all instances are null. -* Non-`null` instance with [special geospatial values](#special-geospatial-values): - * X and Y: Skip any special X or Y value and continue processing the - remaining X or Y values. Do not produce a bounding box if all X or all Y - values are special values. - - * Z: Skip any special Z value and continue processing the remaining Z values. - Omit Z from the bounding box if all Z values are special values. - - * M: Skip any special M value and continue processing the remaining M values. - Omit M from the bounding box if all M values are special values. - -Readers should follow the guidelines below when examining bounding boxes. - -* No bounding box: No assumptions can be made about the presence or validity - of geospatial values. Readers may need to load all individual coordinate - values for validation. - -* A bounding box is present: - * X and Y: Both X and Y of the bounding box must be present. If any of - `xmin`, `ymin`, `xmax`, or `ymax` is `NaN`, the bounding box is not - reliable and should not be used. - * Z: If Z of the bounding box is missing or either `zmin` or `zmax` is - `NaN`, readers should not assume anything about the presence or - validity of Z values and may need to load individual coordinates for - validation. - * M: If M of the bounding box is missing or either `mmin` or `mmax` is - `NaN`, readers should not assume anything about the presence or - validity of M values and may need to load individual coordinates for - validation. +When calculating a bounding box, null or NaN values in a coordinate +dimension are skipped. For example, `POINT (1 NaN)` contributes a value to X +but no values to Y, Z, or M dimension of the bounding box. If a dimension has +only null or NaN values, that dimension is omitted from the bounding box. If +either the X or Y dimension is missing, then the bounding box itself is not +produced. For the X values only, xmin may be greater than xmax. In this case, an object in this bounding box may match if it contains an X such that `x >= xmin` OR @@ -196,26 +168,3 @@ The axis order of the coordinates in WKB and bounding box stored in Parquet follows the de facto standard for axis order in WKB and is therefore always (x, y) where x is easting or longitude and y is northing or latitude. This ordering explicitly overrides the axis order as specified in the CRS. - -# Special geospatial values - -A special geospatial value refers to an individual scalar value (e.g., X, Y, Z, -or M) within a coordinate of a non-`null` geospatial instance. These special -values are excluded from bounding box calculations. For example, in a -`LineString` instance with XY coordinates `[(1, 2), (NaN, 3), (4, 5)]`, the -`NaN` value on the X axis will be excluded from the bounding box calculation, -while all other scalar values will be included. - -* `NaN`: Not a Number. A `Point` with no X and Y values in WKB is - represented by a `Point` with each scalar value set to an IEEE-754 - NaN value (e.g., hex: `01 01 00 00 00 00 00 00 00 00 00 00 f8 7f 00 00 00 00 00 00 f8 7f`). - NaN values in other geometry types are typically considered invalid - geometries by other libraries. -* `Empty geometries`: Geometries explicitly marked as empty in WKB using - indicators such as `numPoints`, `numRings`, or `numGeometries`. Examples - include `LineString` with no coordinates (hex: `01 02 00 00 00 00 00 00 - 00`) or `Polygon` with no coordinates (hex: `01 03 00 00 00 00 00 00 00`). -* `Out-of-bounds coordinates`: Values that fall outside the valid range - for `GEOGRAPHY` types. For example, `x < -180` or `x > 180`. -* Any invalid WKB representation of a geospatial instance, such as an empty - string. \ No newline at end of file