Skip to content

171: perf: Optimize st_has(z/m) using WKBBytesExecutor + Implement new WKBHeader#4

Closed
martin-augment wants to merge 37 commits intomainfrom
pr-171-2025-10-27-09-17-59
Closed

171: perf: Optimize st_has(z/m) using WKBBytesExecutor + Implement new WKBHeader#4
martin-augment wants to merge 37 commits intomainfrom
pr-171-2025-10-27-09-17-59

Conversation

@martin-augment
Copy link
Copy Markdown
Owner

171: To review by AI

petern48 and others added 30 commits September 30, 2025 18:33
Co-authored-by: Dewey Dunnington <dewey@dunnington.ca>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Oct 27, 2025

Walkthrough

The PR adds comprehensive support for WKB dimension inference by introducing a new WkbHeader parser in Rust geometry library, refactoring the STHasZm function to leverage fast-path byte parsing, and extending test coverage for Z and M dimensions across benchmarks and unit tests.

Changes

Cohort / File(s) Summary
Benchmark Tests
benchmarks/test_functions.py
Added two new parameterized benchmark test methods: test_st_hasm() and test_st_hasz() that execute spatial function queries across multiple engines and table collections.
Unit Tests
python/sedonadb/tests/functions/test_functions.py
Extended test coverage for test_st_geomfromwkb() and test_st_hasz() with cases for MULTIPOINT Z, MULTIPOINT ZM, and GEOMETRYCOLLECTION with nested Z dimensions.
Rust WKB Header Parser
rust/sedona-geometry/src/wkb_header.rs
Introduced new module with public WkbHeader struct implementing fast-path WKB header parsing, supporting geometry type identification, coordinate extraction, dimension inference, and EWKB SRID handling.
Rust Module Exports
rust/sedona-geometry/src/lib.rs
Exposed new public module wkb_header in crate root.
Rust Function Refactoring
rust/sedona-functions/src/st_haszm.rs
Replaced WkbExecutor with WkbBytesExecutor; reworked scalar invocation to infer geometry dimensions directly from raw WKB bytes using WkbHeader fast-path; extended tests for geometry collections and multipoint scenarios with dimension inference.
Test Fixtures
rust/sedona-testing/src/fixtures.rs
Added new public constant MULTIPOINT_WITH_INFERRED_Z_DIMENSION_WKB for testing dimension inference in multipoint geometries.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pr-171-2025-10-27-09-17-59

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link
Copy Markdown

claude bot commented Oct 27, 2025

Pull Request Review: PR #4 - Optimize ST_Has(Z/M) using WKBBytesExecutor + Implement new WKBHeader

Overview

This PR optimizes the ST_HasZ and ST_HasM functions by implementing a fast-path WKB header parser that avoids full geometry parsing. The optimization switches from WkbExecutor to WkbBytesExecutor and introduces a new WkbHeader struct for efficient header-only parsing.

✅ Strengths

Performance Optimization

  • Excellent optimization approach: The new WkbHeader parser only reads the WKB header instead of parsing entire geometries, which should significantly improve performance for these dimension-checking functions.
  • Smart design: The lazy parsing approach with caching is well-designed for potential future use cases.
  • Proper use of WkbBytesExecutor: Switching to WkbBytesExecutor allows operating directly on raw WKB bytes without full deserialization.

Test Coverage

  • Comprehensive test suite: The wkb_header.rs file has excellent test coverage (lines 507-885) covering:
    • All geometry types (Point, LineString, Polygon, Multi*, GeometryCollection)
    • Empty geometries
    • Different dimensions (XY, XYZ, XYM, XYZM)
    • Edge cases like empty collections and nested dimensions
  • Benchmarks added: New benchmarks in test_functions.py for performance tracking.
  • Edge case testing: Good coverage of the MULTIPOINT with inferred Z dimension case.

Code Structure

  • Good separation of concerns: The WkbBuffer helper struct properly encapsulates buffer reading logic.
  • Proper error handling: Uses Result types and provides descriptive error messages.

🔴 Critical Issues

1. Debug Print Statements Left in Production Code

Location: rust/sedona-geometry/src/wkb_header.rs

Lines 68, 70, 83, 89, 90, 91, 243, 247 contain println! debug statements that should be removed before merging:

// Line 68-70
println!("geometry_type_id: {geometry_type_id:?}");
println!("size: {size}");
println!("buf: {:?}", &wkb_buffer.buf[wkb_buffer.offset..]);

// Line 83
println!("starting parse_dimensions buf: {:?}", &buf[i..]);

// Line 89-91
println!("first_xy_coord: buf: {:?}", &wkb_buffer.buf[wkb_buffer.offset..]);

// Line 243, 247
println!("first_geom_idx: buf: {:?}", &self.buf[self.offset..]);
println!("geometry_type_id: {geometry_type_id:?}");

Action Required: Remove all println! statements before merging.


2. Extensive Commented-Out Code

Location: rust/sedona-geometry/src/wkb_header.rs

Lines 54-141, 235-242, 300-373, 399-402, 432-436, 481-485 contain large blocks of commented-out code. This significantly reduces code readability and maintainability.

Action Required: Remove all commented-out code. If you need to reference the old implementation, it's available in git history.


3. TODO Comments in Production Code

Location: rust/sedona-geometry/src/wkb_header.rs

Lines 77, 79, 84, 88 contain "TODO: clean up later" comments:

let mut wkb_buffer = WkbBuffer::new(buf); // Reset: TODO: clean up later

Issue: These TODOs suggest incomplete refactoring. The current approach creates multiple WkbBuffer instances and repeatedly parses the same data.

Recommendation: Either:

  1. Complete the refactoring to use a single WkbBuffer with proper state management
  2. Remove the TODOs if the current approach is intentional (though it's inefficient)

⚠️ Bugs and Potential Issues

4. Potential Performance Issue: Redundant Buffer Parsing

Location: rust/sedona-geometry/src/wkb_header.rs:45-98

The try_new method creates 4 separate WkbBuffer instances and parses the same buffer multiple times:

let mut wkb_buffer = WkbBuffer::new(buf);  // Instance 1: parse type, srid, size
// ...
let mut wkb_buffer = WkbBuffer::new(buf);  // Instance 2: find first geom
// ...
let mut wkb_buffer = WkbBuffer::new(&buf[i..]);  // Instance 3: parse dimensions
// ...
let mut wkb_buffer = WkbBuffer::new(&buf[i..]);  // Instance 4: parse first xy

Impact: This negates some of the performance benefits of the optimization. You're parsing the header multiple times.

Recommendation: Refactor to use a single WkbBuffer instance and maintain state properly. The WkbBuffer already has an offset mechanism - use it!


5. Logic Error: Empty GeometryCollection Handling

Location: rust/sedona-geometry/src/wkb_header.rs:116-123 and rust/sedona-functions/src/st_haszm.rs:116-123

For empty geometry collections, the logic may not be correct:

// In WkbHeader::try_new
if let Some(first_geom_dimensions) = header.first_geom_dimensions() {
    dimensions = first_geom_dimensions;
} else {
    dimensions = top_level_dimensions;  // Falls back for empty collections
}

Test case analysis:

  • GEOMETRYCOLLECTION EMPTY - no top-level dimension, no first geom → should return false
  • GEOMETRYCOLLECTION Z EMPTY - has top-level Z dimension, no first geom → should return true

The current logic appears to handle this correctly, but it's not immediately obvious. Consider adding a comment explaining the empty collection behavior.


6. Inconsistent Behavior for Edge Cases

Location: rust/sedona-geometry/src/wkb_header.rs:62-67

For Point geometries, a dummy size of 1 is assigned. While this works, it's semantically incorrect - Points don't have a "size" in the same way other geometries do.

Recommendation: Document this behavior clearly in the size() method documentation, or consider returning Option<u32> where None indicates "not applicable" for Points.


🔧 Code Quality Issues

7. Lack of Documentation

Location: rust/sedona-geometry/src/wkb_header.rs

The WkbBuffer struct and its methods lack documentation. For a performance-critical component, comprehensive documentation is essential.

Action Required: Add doc comments explaining:

  • The purpose of WkbBuffer
  • The state management approach (offset, remaining, last_endian)
  • Why multiple buffer instances are created (if this is intentional)

8. Inconsistent Error Messages

Location: rust/sedona-geometry/src/wkb_header.rs

Some error messages are very specific:

"Invalid WKB: buffer too small. At offset: {}. Need 4 bytes."

While others are generic:

"Invalid WKB: buffer too small"  // in commented-out code

Recommendation: Ensure all error messages provide sufficient context for debugging.


9. Magic Numbers

Location: rust/sedona-geometry/src/wkb_header.rs:192

match self.geometry_type / 1000 {

The division by 1000 to extract dimension information should be documented or extracted to a constant with a clear name.

Recommendation:

// WKB dimension encoding: type = base_type + (dimension * 1000)
// where dimension is: 0=XY, 1=XYZ, 2=XYM, 3=XYZM
const WKB_DIMENSION_MULTIPLIER: u32 = 1000;
match self.geometry_type / WKB_DIMENSION_MULTIPLIER {

🎯 Performance Considerations

10. Memory Allocation

The current implementation is good - it doesn't allocate additional memory beyond the builder, which is optimal.

11. Branching

The dimension checking logic is clean and should be well-optimized by the compiler:

if dim_index == 2 {
    return Ok(Some(matches!(dimensions, Dimensions::Xyz | Dimensions::Xyzm)));
}

However, consider using a match statement for better clarity:

match dim_index {
    2 => Ok(Some(matches!(dimensions, Dimensions::Xyz | Dimensions::Xyzm))),
    3 => Ok(Some(matches!(dimensions, Dimensions::Xym | Dimensions::Xyzm))),
    _ => Ok(Some(false)),
}

🔒 Security Considerations

12. Buffer Bounds Checking

Status: ✅ Good

The bounds checking throughout is excellent. Every read operation checks remaining bytes:

if self.remaining < 4 {
    return Err(SedonaGeometryError::Invalid(format!(...)));
}

This prevents buffer overruns and should handle malformed WKB data safely.


📊 Test Coverage Assessment

Python Tests

  • ✅ Good: New test cases added for edge cases
  • ✅ Good: Benchmarks added for performance tracking

Rust Tests

  • ✅ Excellent: Comprehensive unit tests for WkbHeader
  • ✅ Good: Tests for st_haszm functions
  • ⚠️ Missing: No tests for malformed/invalid WKB inputs at the WkbHeader level

Recommendation: Add tests for invalid WKB inputs to ensure proper error handling:

  • Truncated WKB buffers
  • Invalid byte order markers
  • Invalid geometry type codes
  • Buffers that claim to have more data than they contain

📝 Documentation Gaps

13. Missing High-Level Documentation

The WkbHeader struct has good field-level documentation but lacks:

  • Overall usage examples
  • Performance characteristics
  • Thread-safety guarantees
  • Comparison with full WKB parsing

🎉 Summary

What's Great:

  • ✅ Solid optimization approach that should provide good performance improvements
  • ✅ Excellent test coverage for happy paths
  • ✅ Good error handling and bounds checking
  • ✅ Clean separation of concerns

Must Fix Before Merge:

  1. ❗ Remove all println! debug statements
  2. ❗ Remove all commented-out code
  3. ❗ Resolve or remove TODO comments
  4. ❗ Fix redundant buffer parsing (performance issue)

Recommended Improvements:

  1. 📝 Add documentation for WkbBuffer and its methods
  2. 📝 Document magic numbers and dimension encoding
  3. 🧪 Add error case tests for malformed WKB
  4. 🎨 Improve code clarity by removing the multiple buffer creation pattern

Overall Assessment:

This is a solid optimization with good test coverage, but it needs cleanup before merging. The redundant buffer parsing and debug code left in place suggest this might have been committed in a work-in-progress state. Once the critical issues are addressed, this will be a great performance improvement!


Recommended Actions:

  1. Clean up debug statements and commented code
  2. Refactor WkbHeader::try_new to use a single WkbBuffer instance
  3. Add documentation
  4. Run cargo clippy to catch any linting issues
  5. Run cargo fmt to ensure consistent formatting
  6. Verify all tests pass: cargo test
  7. Run benchmarks to validate performance improvements

Great work on the optimization approach! Looking forward to seeing this cleaned up and merged. 🚀


Review generated with Claude Code

Copy link
Copy Markdown

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 3 suggestions posted.


/// Returns the top-level dimension of the WKB
pub fn dimensions(&self) -> Result<Dimensions, SedonaGeometryError> {
let dimensions = match self.geometry_type / 1000 {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider masking out the SRID flag before deriving dimensions; using self.geometry_type / 1000 will misclassify EWKB with SRID set (unexpected code). This also applies to parse_dimensions() where code / 1000 is used.

🤖 React with 👍 or 👎 to let us know if the comment was useful.

// ));
// }
// let num_geometries = self.read_u32(&buf[5..9], byte_order)?;
let num_geometries = self.read_u32()?;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For collections, the SRID (EWKB) flag should be handled before reading the count; here num_geometries is read before skipping SRID, so with SRID set this reads SRID instead of the count and throws off offsets.

🤖 React with 👍 or 👎 to let us know if the comment was useful.

if let Some(i) = first_geom_idx {
// first_geom_dimensions = Some(self.parse_dimensions(&buf[i..])?);

println!("starting parse_dimensions buf: {:?}", &buf[i..]);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover println! debug statements (e.g., here and below) will pollute stdout during normal runs; consider removing or gating them behind a debug flag. (also applies to other locations in this file)

🤖 React with 👍 or 👎 to let us know if the comment was useful.

@github-actions
Copy link
Copy Markdown

Findings

  • Bug: rust/sedona-geometry/src/wkb_header.rs:192 leaves the EWKB SRID flag in self.geometry_type before computing the dimension. For any SRID-annotated WKB (e.g. EWKB from PostGIS), self.geometry_type / 1000 becomes 536 870+ and the match falls into the error branch, so ST_HasZ/M now fail with Invalid WKB instead of returning a boolean. Mask the SRID bit before dividing (same issue in parse_dimensions at rust/sedona-geometry/src/wkb_header.rs:493 for nested reads).
  • Bug: rust/sedona-geometry/src/wkb_header.rs:263 reads the “number of geometries” before skipping the SRID bytes. In EWKB the SRID precedes that count, so SRID values (e.g. 4326) are misinterpreted as the geometry count. This causes empty collections with SRIDs to be treated as non-empty and then blows up when first_geom_idx tries to parse a non-existent child, propagating an error out of ST_HasZ/M. Move the SRID skip ahead of the count (or mask it out before reading).
  • Issue: rust/sedona-geometry/src/wkb_header.rs:243 & rust/sedona-geometry/src/wkb_header.rs:247 still contain println! debugging. These fire on every ST_HasZ/M invocation, flooding stdout and hurting performance. Please remove or gate them behind cfg(test)/logging.

Suggested follow-up

  1. Add a regression test that exercises EWKB with an SRID (both empty and non-empty collections) to ensure the fast path handles SRID correctly once fixed.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
rust/sedona-functions/src/st_haszm.rs (1)

163-233: Add tests for Z/M in non-first child to verify the current behavior.

The invoke_scalar function at lines 110-135 only inspects the first geometry in a collection via header.first_geom_dimensions(). This creates a potential gap: a GEOMETRYCOLLECTION (POINT (0 0), POINT Z (0 0 1)) would incorrectly return false for ST_HASZ() since only the first child (which lacks Z) is examined.

Add these test cases to ensure the implementation's handling of multi-child collections is intentional and correct:

         m_tester.assert_scalar_result_equals(result, ScalarValue::Boolean(Some(true)));
 
+        // Non-first child carries Z: verify current behavior
+        let result = z_tester
+            .invoke_wkb_scalar(Some("GEOMETRYCOLLECTION (POINT (0 0), POINT Z (0 0 1))"))
+            .unwrap();
+        z_tester.assert_scalar_result_equals(result, ScalarValue::Boolean(Some(true)));
+
+        // Non-first child carries M: verify current behavior
+        let result = m_tester
+            .invoke_wkb_scalar(Some("GEOMETRYCOLLECTION (POINT (0 0), POINT M (0 0 1))"))
+            .unwrap();
+        m_tester.assert_scalar_result_equals(result, ScalarValue::Boolean(Some(true)));
🧹 Nitpick comments (9)
rust/sedona-testing/src/fixtures.rs (1)

31-44: Fixture looks correct; bytes match a MultiPoint(XY) with a nested PointZ(1,2,3).

Nice targeted case for mixed-dimension inference.

Optionally mention EWKB thousand-offset (1001 = Point Z) in the doc comment for quicker auditing.

python/sedonadb/tests/functions/test_functions.py (1)

564-571: Add a case where Z/M appears in a non-first child.

E.g., GEOMETRYCOLLECTION (POINT (0 0), POINT Z (0 0 1)) should be True for ST_HasZ; similarly for M. This guards against first-child-only inference regressions.

rust/sedona-geometry/src/lib.rs (1)

27-27: Public exposure of wkb_header looks good.

Consider adding top-level module docs and (optionally) re-exporting key types (e.g., pub use wkb_header::WkbHeader) for discoverability.

rust/sedona-functions/src/st_haszm.rs (1)

109-110: Nit: comment mismatch.

This returns a boolean, not a “geometry type name”. Update the function comment.

rust/sedona-geometry/src/wkb_header.rs (5)

162-169: Base type extraction: prefer masking thousands over & 0x7 for clarity.

& 0x7 works by coincidence (codes 1..7), but base = (geometry_type & !SRID_FLAG_BIT) % 1000 communicates intent and avoids surprises if additional flags appear.

Apply:

-        // Only low 3 bits is for the base type, high bits include additional info
-        let code = self.geometry_type & 0x7;
+        // Strip flags and dimension offsets, then take base WKB type in 1..7
+        let code = (self.geometry_type & !SRID_FLAG_BIT) % 1000;

77-99: Avoid triple buffer re-inits and remove debug prints.

try_new recreates WkbBuffer 3x and prints raw buffers. Streamline by computing i = first_geom_idx once and reusing a single subslice; gate logging behind #[cfg(debug_assertions)] or log at debug.

Apply:

-        let mut wkb_buffer = WkbBuffer::new(buf); // Reset: TODO: clean up later
-        let first_geom_idx = wkb_buffer.first_geom_idx()?; // ERROR HERE
+        let mut wkb_buffer = WkbBuffer::new(buf);
+        let first_geom_idx = wkb_buffer.first_geom_idx()?;
         if let Some(i) = first_geom_idx {
-            println!("starting parse_dimensions buf: {:?}", &buf[i..]);
-            let mut wkb_buffer = WkbBuffer::new(&buf[i..]); // Reset: TODO: clean up later
+            let mut wkb_buffer = WkbBuffer::new(&buf[i..]);
             first_geom_dimensions = Some(wkb_buffer.parse_dimensions()?);
-            let mut wkb_buffer = WkbBuffer::new(&buf[i..]); // Reset: TODO: clean up later
-            println!(
-                "first_xy_coord: buf: {:?}",
-                &wkb_buffer.buf[wkb_buffer.offset..]
-            );
+            let mut wkb_buffer = WkbBuffer::new(&buf[i..]);
             (first_x, first_y) = wkb_buffer.first_xy_coord()?;
         } else {
             first_geom_dimensions = None;
             first_x = f64::NAN;
             first_y = f64::NAN;
         }

And remove/guard other println! calls in this file.


241-248: Strip debug prints from hot path.

println! in parsing paths will flood stdout and hurt perf. Replace with log::debug! or guard with #[cfg(debug_assertions)].

Example:

-        println!("first_geom_idx: buf: {:?}", &self.buf[self.offset..]);
+        #[cfg(debug_assertions)]
+        eprintln!("first_geom_idx: buf: {:?}", &self.buf[self.offset..]);
@@
-        println!("geometry_type_id: {geometry_type_id:?}");
+        #[cfg(debug_assertions)]
+        eprintln!("geometry_type_id: {geometry_type_id:?}");

Also applies to: 270-276


62-67: Clarify size semantics for Point or return Option<u32>.

size() returns 1 for Point but docs say “Not applicable for a point”. Consider returning Option<u32> or document “Point always returns 1”.

Possible API tweak:

  • Keep u32 and specify: “For Point, returns 1.”
  • Or change to Option<u32> where None = not applicable.

Also applies to: 171-178


507-885: Great test coverage; add SRID/EWKB cases to lock the fixes.

Current tests skip SRID. Add minimal EWKB byte fixtures to assert dimensions(), size(), and first_xy() with SRID present for single and collection geometries.

I can contribute a small helper to inject SRID into WKB:

fn with_srid(mut wkb: Vec<u8>, srid: u32) -> Vec<u8> {
    // wkb[0] = endian, wkb[1..5] = type; set SRID flag and insert srid after type
    let endian = wkb[0];
    let mut ty = u32::from_le_bytes(wkb[1..5].try_into().unwrap()); // writer defaults to LE
    ty |= SRID_FLAG_BIT;
    wkb[1..5].copy_from_slice(&ty.to_le_bytes());
    wkb.splice(5..5, srid.to_le_bytes()); // insert SRID at byte 5
    wkb
}

Then test e.g. with_srid(make_wkb("MULTIPOINT ((1 2))"), 4326) to verify both dimensions() and collection parsing.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1c5731f and 4e3e525.

📒 Files selected for processing (6)
  • benchmarks/test_functions.py (1 hunks)
  • python/sedonadb/tests/functions/test_functions.py (1 hunks)
  • rust/sedona-functions/src/st_haszm.rs (6 hunks)
  • rust/sedona-geometry/src/lib.rs (1 hunks)
  • rust/sedona-geometry/src/wkb_header.rs (1 hunks)
  • rust/sedona-testing/src/fixtures.rs (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
rust/sedona-functions/src/st_haszm.rs (1)
rust/sedona-geometry/src/wkb_header.rs (6)
  • new (222-229)
  • try_new (45-150)
  • dimensions (191-205)
  • dimensions (778-794)
  • first_geom_dimensions (208-210)
  • first_geom_dimensions (834-852)
benchmarks/test_functions.py (3)
python/sedonadb/python/sedonadb/testing.py (7)
  • SedonaDBSingleThread (347-353)
  • PostGISSingleThread (618-624)
  • DuckDBSingleThread (407-412)
  • execute_and_collect (156-168)
  • execute_and_collect (342-344)
  • execute_and_collect (403-404)
  • execute_and_collect (533-537)
python/sedonadb/tests/functions/test_functions.py (2)
  • test_st_hasm (601-603)
  • test_st_hasz (574-576)
benchmarks/test_bench_base.py (1)
  • _get_eng (147-161)
🪛 Ruff (0.14.1)
benchmarks/test_functions.py

184-184: Possible SQL injection vector through string-based query construction

(S608)


202-202: Possible SQL injection vector through string-based query construction

(S608)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: claude-review
  • GitHub Check: docs-and-deploy
  • GitHub Check: python ubuntu-latest
  • GitHub Check: codex
  • GitHub Check: docs
  • GitHub Check: build
  • GitHub Check: test
  • GitHub Check: ubuntu-latest (release)
  • GitHub Check: windows-latest (release)
  • GitHub Check: clippy
  • GitHub Check: macos-latest (release)
🔇 Additional comments (6)
benchmarks/test_functions.py (2)

170-205: Bench additions LGTM; consistent with existing style.

Engines/tables and query harness match the suite.


183-185: Static analysis S608 is low-risk here.

Tables come from a fixed enum; no untrusted input. Nothing to change.

If you later bench user-provided table names, prefer validated identifiers or prepared statements where supported.

Also applies to: 201-203

rust/sedona-functions/src/st_haszm.rs (3)

92-106: Good move to WkbBytesExecutor; minimizes allocation and copies.

Capacity pre-sizing and null handling are correct.


195-233: Nice coverage for empty collections with Z/M flags and mixed-level Z/M on collections.

These validate the intended inference shortcuts.

Please add a case where Z/M is present in a non-first child to prevent regressions (see diff below).


235-249: Great: inferred Z in MultiPoint with mixed header nesting is validated.

Complements the fast path well.

rust/sedona-geometry/src/wkb_header.rs (1)

297-377: Coordinate parsing is correct; XY subset reads fine for XYZ/M/ZM.

Logic correctly handles empty cases and consumes only X,Y for higher dimensions. No change required.

If you want, I can add tests for LINESTRING Z/M/ZM and POLYGON Z/M/ZM covering empty/non-empty first rings.

Comment on lines +109 to 138
/// Fast-path inference of geometry type name from raw WKB bytes
fn invoke_scalar(buf: &[u8], dim_index: usize) -> Result<Option<bool>> {
let header = WkbHeader::try_new(buf).map_err(|e| DataFusionError::External(Box::new(e)))?;
let top_level_dimensions = header
.dimensions()
.map_err(|e| DataFusionError::External(Box::new(e)))?;

// Infer dimension based on first coordinate dimension for cases where it differs from top-level
// e.g GEOMETRYCOLLECTION (POINT Z (1 2 3))
let dimensions;
if let Some(first_geom_dimensions) = header.first_geom_dimensions() {
dimensions = first_geom_dimensions;
} else {
dimensions = top_level_dimensions;
}

if dim_index == 2 {
return Ok(Some(matches!(
dimensions,
Dimensions::Xyz | Dimensions::Xyzm
)));
}
if dim_index == 3 {
return Ok(Some(matches!(
dimensions,
Dimensions::Xym | Dimensions::Xyzm
)));
}
Ok(Some(false))
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Correctness risk: only inspecting the first nested geometry can yield false negatives.

For collections/multis where the first child is XY but a later child is Z/M, this returns False even though the geometry “has Z/M”. Example: GEOMETRYCOLLECTION (POINT (0 0), POINT Z (0 0 1)).

  • Keep the current fast path, but when it evaluates to False for collections/multis, fall back to a lightweight scan of all nested headers (still bytes-only) and short-circuit on the first Z/M match.
  • If iterating headers isn’t available yet, add a slow-path fallback (e.g., previous WkbExecutor) only in these uncertain cases to retain semantics without regressing perf on the common path.

Minimal sketch:

 fn invoke_scalar(buf: &[u8], dim_index: usize) -> Result<Option<bool>> {
-    let header = WkbHeader::try_new(buf).map_err(|e| DataFusionError::External(Box::new(e)))?;
+    let header = WkbHeader::try_new(buf).map_err(|e| DataFusionError::External(Box::new(e)))?;
     let top_level_dimensions = header
         .dimensions()
         .map_err(|e| DataFusionError::External(Box::new(e)))?;

     let dimensions = header.first_geom_dimensions().unwrap_or(top_level_dimensions);

     let want = if dim_index == 2 {
         matches!(dimensions, Dimensions::Xyz | Dimensions::Xyzm)
     } else if dim_index == 3 {
         matches!(dimensions, Dimensions::Xym | Dimensions::Xyzm)
     } else {
         false
     };

-    Ok(Some(want))
+    if want {
+        return Ok(Some(true));
+    }
+
+    // Optional: only for collection/multi types with potential for nested dims
+    // Pseudocode—requires iterator support from WkbHeader/WkbBuffer:
+    // for child in header.iter_children(buf)? {
+    //     let dims = child.dimensions()?;
+    //     if (dim_index == 2 && matches!(dims, Dimensions::Xyz | Dimensions::Xyzm))
+    //        || (dim_index == 3 && matches!(dims, Dimensions::Xym | Dimensions::Xyzm)) {
+    //         return Ok(Some(true));
+    //     }
+    // }
+    Ok(Some(false))
 }

Add tests to cover non-first-child Z/M (see separate test suggestion below).


I'll verify this review comment by examining the actual implementation and related code to confirm whether the correctness risk is genuine.


🏁 Script executed:

#!/bin/bash
# Find the st_haszm.rs file and view its full content
fd -t f 'st_haszm\.rs' --exec cat {}

Length of output: 9008


🏁 Script executed:

#!/bin/bash
# Search for WkbHeader definition and first_geom_dimensions method
rg -n 'fn first_geom_dimensions' -A 5 -B 2

Length of output: 1300


🏁 Script executed:

#!/bin/bash
# Search for tests related to st_haszm to see what edge cases are covered
fd -t f 'test.*haszm|haszm.*test' -i --exec cat {}

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Check for any geometry collection test cases with mixed dimensions
rg -n 'GEOMETRYCOLLECTION.*POINT.*Z|hasZ.*Collection|hasMember' -i --type rs

Length of output: 26


🏁 Script executed:

#!/bin/bash
# Look for WkbHeader implementation and available methods
fd -t f '.*header.*\.rs|.*wkb.*\.rs' -i | head -20 | xargs grep -l 'struct WkbHeader\|impl.*WkbHeader' 2>/dev/null

Length of output: 38


🏁 Script executed:

#!/bin/bash
# Search for any iteration over children in WkbHeader or similar structures
rg -n 'iter.*child|for.*child|children\(' --type rs | head -20

Length of output: 26


🏁 Script executed:

#!/bin/bash
# Check if there are any other hasZ/hasM implementations or related logic
rg -n 'hasZ|hasM|Xyzm|first_geom' --type rs -B 2 -A 2 | head -40

Length of output: 26


🏁 Script executed:

#!/bin/bash
# Search for the dimensions method and Dimensions enum
rg -n 'enum Dimensions|pub fn dimensions' --type rs -A 5

Length of output: 26


🏁 Script executed:

#!/bin/bash
# Look for test files in the same directory
fd -t f '.*test.*\.rs|.*_test\.rs' rust/sedona-functions/src/ --exec ls -lh {}

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Check what WKB types are considered collections/multis
rg -n 'Multi|Collection' --type rs | grep -i 'enum\|type\|const' | head -20

Length of output: 26


🏁 Script executed:

#!/bin/bash
# Look for the exact invoke_scalar function and any related logic
rg -n 'fn invoke_scalar' -A 40 --type rs

Length of output: 26


🏁 Script executed:

#!/bin/bash
# Check git history or comments around this function
rg -n 'first child|nested|collection' --type rs -i -B 2 -A 2 | head -50

Length of output: 26


🏁 Script executed:

#!/bin/bash
# Search for WkbExecutor to understand the fallback mechanism mentioned
rg -n 'WkbExecutor|invoke_array' --type rs -B 2 -A 5 | head -40

Length of output: 26


🏁 Script executed:

#!/bin/bash
# Check if there's logic that handles MultiGeometry or GeometryCollection specially
rg -n 'GeometryCollection|Multi.*Type|0xF|0x7|WkbType' --type rs -C 2 | head -60

Length of output: 26


🏁 Script executed:

#!/bin/bash
# Look for any safeguards or special handling in dimensions parsing
rg -n 'pub fn dimensions|fn dimensions' --type rs -A 15 | head -60

Length of output: 26


🏁 Script executed:

#!/bin/bash
# Check the structure and fields of WkbHeader
rg -n 'struct WkbHeader' -A 20 --type rs

Length of output: 26


🏁 Script executed:

#!/bin/bash
# Look for the implementation details of first_geom_dimensions and top-level dimensions
rg -n 'first_geom_dimensions|top_level|nested_header' --type rs -B 3 -A 10

Length of output: 26


🏁 Script executed:

#!/bin/bash
# Check if there's any recursive or iterative scanning logic already in place
rg -n 'loop|while|for.*in' --type rs --context=5 | grep -A 8 -B 2 'child\|geom\|header' | head -50

Length of output: 26


🏁 Script executed:

#!/bin/bash
# Verify the actual test file if it exists
find rust/sedona-functions/src -name '*test*' -o -name '*tests*' | xargs ls -lh 2>/dev/null

Length of output: 1130


🏁 Script executed:

#!/bin/bash
# Look for mod tests or #[cfg(test)] blocks in st_haszm
rg -n '#\[cfg\(test\)\]|mod tests|#\[test\]' rust/sedona-functions/src/st_haszm.rs -A 30

Length of output: 2448


🏁 Script executed:

#!/bin/bash
# Check if there are integration tests that might cover this
find rust/sedona-functions -name '*test*.rs' -type f | xargs grep -l 'hasZ\|haszm\|POINT.*Z' 2>/dev/null

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Look for the full context around the invoke_scalar function to understand how it's called
rg -n 'fn invoke_scalar|fn invoke_array' --type rs -B 5 -A 50 | head -80

Length of output: 26


🏁 Script executed:

#!/bin/bash
# Check if WkbBuffer or similar has methods for iterating through geometry components
rg -n 'impl.*WkbBuffer|struct WkbBuffer|fn.*children|fn.*iter' --type rs -A 8 | head -60

Length of output: 26


🏁 Script executed:

#!/bin/bash
# Summarize WkbHeader available methods
rg -n 'impl.*WkbHeader' -A 100 --type rs | head -120

Length of output: 26


🏁 Script executed:

#!/bin/bash
# Check the actual geometry type codes to understand collection detection
rg -n 'GeometryType|WkbGeomType|0x07|0x08|MULTI|COLLECTION' --type rs | head -30

Length of output: 26


🏁 Script executed:

#!/bin/bash
# Search for any dimension array scanning or all-children checking logic
rg -n 'scan|all.*dim|iterate|traverse' --type rs -i | grep -v test | head -20

Length of output: 26


🏁 Script executed:

#!/bin/bash
# Final check: look at the exact line range mentioned in the review
sed -n '109,138p' rust/sedona-functions/src/st_haszm.rs

Length of output: 1022


🏁 Script executed:

#!/bin/bash
# Check what top_level_dimensions actually returns and how it differs from first_geom_dimensions
rg -n 'top_level|first_geom_dimensions|dimensions\(' --type rs -B 3 -A 8 | head -80

Length of output: 26


🏁 Script executed:

#!/bin/bash
# Look at the complete WkbHeader structure and all its methods
ast-grep --pattern 'impl WkbHeader { $$$ }'

Length of output: 12874


🏁 Script executed:

#!/bin/bash
# Check for GeometryCollection or Multi type detection at the top level
rg -n 'geometry_type|wkb_type|enum.*Wkb.*Type' --type rs | head -30

Length of output: 26


🏁 Script executed:

#!/bin/bash
# Look for the exact semantics of what the function should return
rg -n 'hasZ|hasM|dimensions' rust/sedona-functions/src/st_haszm.rs -B 2 -A 2

Length of output: 1297


🏁 Script executed:

#!/bin/bash
# Check if there's documentation about the expected behavior
rg -n '//.*[Zz].*dimension|//.*collection|//.*nested' rust/sedona-functions/src/st_haszm.rs

Length of output: 361


🏁 Script executed:

#!/bin/bash
# Look for any comment that explains the choice to use first_geom_dimensions
rg -n 'first' rust/sedona-functions/src/st_haszm.rs -B 5 -A 5

Length of output: 652


🏁 Script executed:

#!/bin/bash
# Check for similar functions that might handle this differently
rg -n 'fn.*haszm|fn.*hasm|fn.*dims' --type rs -i | head -20

Length of output: 26


🏁 Script executed:

#!/bin/bash
# Look at imports and dependencies to understand WkbHeader source
head -40 rust/sedona-functions/src/st_haszm.rs

Length of output: 1562


🏁 Script executed:

#!/bin/bash
# Search for WkbHeader in the broader codebase to find its definition
find rust -name '*.rs' -type f | xargs grep -l 'struct WkbHeader' | head -5

Length of output: 38


🏁 Script executed:

#!/bin/bash
# Get WkbHeader definition file and content
fd 'lib\.rs|mod\.rs' rust/sedona-geos/src --exec grep -l 'WkbHeader' {} \; | head -3 | xargs cat 2>/dev/null | head -100

Length of output: 107


🏁 Script executed:

#!/bin/bash
# More targeted search for WkbHeader
rg -n 'pub struct WkbHeader|pub fn first_geom_dimensions|pub fn dimensions' --type rs -A 15

Length of output: 26


🏁 Script executed:

#!/bin/bash
# Check if there's a geometry collection type discriminator
rg -n 'GeometryCollection\|MultiPoint\|MultiLineString\|MultiPolygon|0x07|0x08|0x0A|0x0B|0x0C' --type rs -B 2 -A 2 | head -50

Length of output: 26


🏁 Script executed:

#!/bin/bash
# Look at the entire st_haszm.rs file to get full picture
wc -l rust/sedona-functions/src/st_haszm.rs && echo "---" && cat rust/sedona-functions/src/st_haszm.rs

Length of output: 9054


🏁 Script executed:

#!/bin/bash
# Check all files in the same directory
ls -la rust/sedona-functions/src/ | head -30

Length of output: 1863


🏁 Script executed:

#!/bin/bash
# Search for wkb-related modules
find rust -type f -name '*.rs' | xargs grep -l 'WkbHeader\|WkbBuffer' | head -10

Length of output: 76


🏁 Script executed:

#!/bin/bash
# Look at WkbHeader in sedona-core or common location
rg -n 'pub struct WkbHeader\|pub fn first_geom_dimensions' -A 25 --type rs

Length of output: 26


🏁 Script executed:

#!/bin/bash
# Check the real behavior by looking at how first_geom_dimensions is used elsewhere
rg -n 'first_geom_dimensions' --type rs -B 3 -A 3

Length of output: 26


🏁 Script executed:

#!/bin/bash
# Look for the specific test case mentioned in the review
rg -n 'GEOMETRYCOLLECTION.*POINT.*0.*0.*POINT.*Z.*0.*0.*1' --type rs -i

Length of output: 26


🏁 Script executed:

#!/bin/bash
# Check for any test about mixed-dimension geometries
rg -n 'mixed|different.*dim|varying' rust/sedona-functions/src/st_haszm.rs -i

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Search for test modules
ast-grep --pattern '#[cfg(test)] mod $_ { $$$ }'

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Look for all test functions in related files
rg -n '#\[test\]' --type rs -A 3 | grep -E 'hasz|hasm|dimension' -i -B 1 -A 3

Length of output: 26


🏁 Script executed:

#!/bin/bash
# Check the structure around invoke_scalar in context
sed -n '90,150p' rust/sedona-functions/src/st_haszm.rs

Length of output: 1876


🏁 Script executed:

#!/bin/bash
# Final verification: check what WkbHeader methods exist
rg -n 'pub fn' rust/sedona-functions/src/ --type rs | grep -i 'header\|dimension'

Length of output: 26


🏁 Script executed:

#!/bin/bash
# Look for all references to this function to understand its usage
rg -n 'invoke_scalar' --type rs -B 2 -A 2

Length of output: 26


🏁 Script executed:

#!/bin_bash
# Check for geometry collection handling in invoke_array
rg -n 'invoke_array' --type rs -A 80 | head -100

Length of output: 26


🏁 Script executed:

#!/bin/bash
# Look for the complete picture of how dimensions are handled for collections
rg -n 'pub fn dimensions\|pub fn first_geom_dimensions\|pub fn top_level_dimensions' --type rs -B 2 -A 20

Length of output: 26


🏁 Script executed:

#!/bin/bash
# Understand the actual file structure
find rust -name '*.rs' | xargs grep -l 'struct WkbHeader' -A 20 | head -1 | xargs cat | head -150

Length of output: 5370


🏁 Script executed:

#!/bin/bash
# Check the entire sedona-common or sedona-core crate structure
find rust -name 'lib.rs' | xargs cat | grep -A 30 'WkbHeader\|dimensions' | head -100

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Try a different approach - find where WkbHeader is defined
find rust -name '*.rs' -exec grep -l 'pub struct WkbHeader' {} \;

Length of output: 38


🏁 Script executed:

#!/bin/bash
# Get the full st_haszm.rs file and WkbHeader related code
cat rust/sedona-functions/src/st_haszm.rs

Length of output: 9008


🏁 Script executed:

#!/bin/bash
# Find WkbHeader definition
find rust -name '*.rs' -type f -exec grep -l 'pub struct WkbHeader\|impl WkbHeader' {} \;

Length of output: 38


🏁 Script executed:

#!/bin/bash
# Search for first_geom_dimensions implementation and usage
rg -n 'first_geom_dimensions' --type rs -B 5 -A 10

Length of output: 26


🏁 Script executed:

#!/bin/bash
# Look for test files related to haszm or dimensions
rg -n '#\[test\].*haszm|#\[test\].*dimension' --type rs -i -A 15

Length of output: 26


🏁 Script executed:

#!/bin/bash
# Check if geometry collection types are handled specially anywhere
rg -n 'GeometryCollection\|MultiPoint|MultiLineString|0xF|wkb.*type' --type rs -i -B 2 -A 2 | head -60

Length of output: 26


🏁 Script executed:

#!/bin/bash
# Search for invoke_array to understand the fallback mechanism
rg -n 'invoke_array|fn invoke' --type rs -B 3 -A 20 | head -80

Length of output: 26


🏁 Script executed:

#!/bin/bash
# Look for any child iteration or nested geometry scanning
rg -n 'children\(|iter_geom|for.*child|while.*byte' --type rs -B 3 -A 5

Length of output: 26


🏁 Script executed:

#!/bin/bash
# Check what top_level_dimensions returns vs first_geom_dimensions
rg -n 'fn dimensions\(|pub fn dimensions' --type rs -A 15 | head -50

Length of output: 26


🏁 Script executed:

#!/bin/bash
# Look for geometry collection tests specifically
rg -n 'test.*collection|geometrycollection' --type rs -i -A 10 | head -60

Length of output: 26


🏁 Script executed:

#!/bin/bash
# Check related WKB handling code
find rust -path '*/wkb*' -o -path '*/common*' -name '*.rs' | head -10 | xargs ls -lh

Length of output: 469


🏁 Script executed:

#!/bin/bash
# Get dimensions enum and related type info
rg -n 'enum Dimensions|pub enum.*Dim' --type rs -A 8

Length of output: 26


🏁 Script executed:

#!/bin/bash
# Look for comments explaining the logic around first geometry
rg -n 'first.*geom|first.*child|nested.*dim' rust/sedona-functions/src/st_haszm.rs -B 2 -A 2

Length of output: 265


🏁 Script executed:

#!/bin/bash
# Check the entire invoke_scalar implementation in detailed context
sed -n '100,145p' rust/sedona-functions/src/st_haszm.rs

Length of output: 1372


Issue verified: false negatives for geometry collections with Z/M only in non-first children.

The concern is valid. The function uses first_geom_dimensions(), which only inspects the first nested geometry. For a collection like GEOMETRYCOLLECTION (POINT (0 0), POINT Z (0 0 1)), it would return false for hasZ even though the geometry does contain Z. Tests confirm this gap—existing tests only cover cases where Z/M appears in the first child or at the top level.

To fix, implement the suggested approach: keep the fast path, but when it returns false for collection/multi types, scan all nested headers to verify no Z/M exists anywhere. This requires either:

  1. Adding child iteration support to WkbHeader/WkbBuffer (bytes-only, no full parsing), or
  2. Falling back to a slower full-geometry inspection only for uncertain cases

Add test case covering GEOMETRYCOLLECTION (POINT (0 0), POINT Z (0 0 1)) to prevent regression.

🤖 Prompt for AI Agents
In rust/sedona-functions/src/st_haszm.rs around lines 109 to 138, the fast-path
uses header.first_geom_dimensions() which misses Z/M in non-first children
(e.g., GEOMETRYCOLLECTION (POINT (0 0), POINT Z (0 0 1))) causing false
negatives; keep the current fast-path but when it would return false for
collection/multi types, scan all nested child headers to detect any Z/M and only
then return false—implement this by adding bytes-only child-header iteration to
WkbHeader/WkbBuffer (preferred) or, if that is not available, fall back to a
slower full-geometry inspection for those uncertain cases; also add a test for
GEOMETRYCOLLECTION (POINT (0 0), POINT Z (0 0 1)) to prevent regression.

Comment on lines +190 to +205
/// Returns the top-level dimension of the WKB
pub fn dimensions(&self) -> Result<Dimensions, SedonaGeometryError> {
let dimensions = match self.geometry_type / 1000 {
0 => Dimensions::Xy,
1 => Dimensions::Xyz,
2 => Dimensions::Xym,
3 => Dimensions::Xyzm,
_ => {
return Err(SedonaGeometryError::Invalid(format!(
"Unexpected code: {}",
self.geometry_type
)))
}
};
Ok(dimensions)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Mask EWKB SRID bit when deriving dimensions (current logic breaks with SRID).

dimensions() and parse_dimensions() divide the raw type by 1000 without clearing SRID_FLAG_BIT. With EWKB SRID set, code / 1000 yields a huge number, not 0..3. Fix by masking the flag first.

Apply:

@@
-    pub fn dimensions(&self) -> Result<Dimensions, SedonaGeometryError> {
-        let dimensions = match self.geometry_type / 1000 {
+    pub fn dimensions(&self) -> Result<Dimensions, SedonaGeometryError> {
+        let code = self.geometry_type & !SRID_FLAG_BIT;
+        let dimensions = match code / 1000 {
             0 => Dimensions::Xy,
             1 => Dimensions::Xyz,
             2 => Dimensions::Xym,
             3 => Dimensions::Xyzm,
             _ => {
                 return Err(SedonaGeometryError::Invalid(format!(
-                    "Unexpected code: {}",
-                    self.geometry_type
+                    "Unexpected code: {}",
+                    code
                 )))
             }
         };
         Ok(dimensions)
     }
@@
-    fn parse_dimensions(&mut self) -> Result<Dimensions, SedonaGeometryError> {
+    fn parse_dimensions(&mut self) -> Result<Dimensions, SedonaGeometryError> {
         self.read_endian()?;
-        let code = self.read_u32()?;
-        match code / 1000 {
+        let mut code = self.read_u32()?;
+        code &= !SRID_FLAG_BIT;
+        match code / 1000 {
             0 => Ok(Dimensions::Xy),
             1 => Ok(Dimensions::Xyz),
             2 => Ok(Dimensions::Xym),
             3 => Ok(Dimensions::Xyzm),
             _ => {
                 return Err(SedonaGeometryError::Invalid(format!(
                     "Unexpected code: {code:?}"
                 )))
             }
         }
     }

Also applies to: 479-505

🤖 Prompt for AI Agents
In rust/sedona-geometry/src/wkb_header.rs around lines 190 to 205 (and likewise
for the parse_dimensions code at lines ~479-505), the dimensions() function
currently divides self.geometry_type by 1000 without clearing the EWKB SRID flag
bit, which causes huge unexpected values when SRID is present; fix by first
masking out the SRID_FLAG_BIT from the raw type (e.g., let code =
self.geometry_type & !SRID_FLAG_BIT) and then divide or otherwise derive the
dimension code from that masked value, returning the appropriate Dimensions
variants and preserving existing error handling.

Comment on lines +262 to +276
// let num_geometries = self.read_u32(&buf[5..9], byte_order)?;
let num_geometries = self.read_u32()?;

if num_geometries == 0 {
return Ok(None);
// return Ok(Some(0))
}

// let mut i = 9;
if geometry_type & SRID_FLAG_BIT != 0 {
// i += 4;
// Skip the SRID
self.read_u32()?;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

SRID must be skipped before reading collection count; current order mis-parses EWKB.

In first_geom_idx, you read num_geometries before skipping SRID, so with EWKB the SRID is misread as the count and offsets drift.

Apply:

-                // let num_geometries = self.read_u32(&buf[5..9], byte_order)?;
-                let num_geometries = self.read_u32()?;
-
-                if num_geometries == 0 {
-                    return Ok(None);
-                }
-
-                // let mut i = 9;
-                if geometry_type & SRID_FLAG_BIT != 0 {
-                    // i += 4;
-                    // Skip the SRID
-                    self.read_u32()?;
-                }
+                // Skip SRID if present before reading the count
+                if geometry_type & SRID_FLAG_BIT != 0 {
+                    self.read_u32()?; // SRID
+                }
+                // Now read the number of geometries
+                let num_geometries = self.read_u32()?;
+                if num_geometries == 0 {
+                    return Ok(None);
+                }
 
                 // Recursive call to get the first geom of the first nested geometry
                 let mut nested_buffer = WkbBuffer::new(&self.buf[self.offset..]);
                 let off = nested_buffer.first_geom_idx()?;
                 if let Some(off) = off {
                     Ok(Some(self.offset + off))
                 } else {
                     Ok(None)
                 }

Also applies to: 280-288

🤖 Prompt for AI Agents
In rust/sedona-geometry/src/wkb_header.rs around lines 262-276 (and similarly
280-288), the code reads num_geometries before skipping the optional EWKB SRID,
causing the SRID to be interpreted as the collection count and corrupting
offsets; change the order so that if geometry_type & SRID_FLAG_BIT != 0 you call
self.read_u32()? to consume/skip the SRID first, and only then call
self.read_u32()? to read num_geometries (apply the same reorder in the other
block at 280-288).

@martin-augment
Copy link
Copy Markdown
Owner Author

. Debug Print Statements Left in Production Code

Location: rust/sedona-geometry/src/wkb_header.rs

Lines 68, 70, 83, 89, 90, 91, 243, 247 contain println! debug statements that should be removed before merging

value:good-to-have; category:bug; feedback:This is a Draft PR! The println statements will be removed for the final version!

@martin-augment
Copy link
Copy Markdown
Owner Author

2. Extensive Commented-Out Code

Location: rust/sedona-geometry/src/wkb_header.rs

Lines 54-141, 235-242, 300-373, 399-402, 432-436, 481-485 contain large blocks of commented-out code. This significantly reduces code readability and maintainability.

Action Required: Remove all commented-out code. If you need to reference the old implementation, it's available in git history.

value:good-to-have; category:bug; feedback:This is a Draft PR! The commented out code will be either enabled or removed for the final version!

@martin-augment
Copy link
Copy Markdown
Owner Author

3. TODO Comments in Production Code

Location: rust/sedona-geometry/src/wkb_header.rs

Lines 77, 79, 84, 88 contain "TODO: clean up later" comments:

value:good-to-have; category:bug; feedback:This is a Draft PR! The TODOs will be addressed for the final version!

@martin-augment
Copy link
Copy Markdown
Owner Author

martin-augment commented Oct 27, 2025

The target PR is a Draft and it seems it is not ready for review yet.
Closing this PR for now. I will open a new one once the target PR is ready for review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants