97: feat(python/sedonadb): Expose random_geometry as a Python function#3
97: feat(python/sedonadb): Expose random_geometry as a Python function#3martin-augment wants to merge 1 commit intomainfrom
Conversation
WalkthroughA new public utility function Changes
Sequence Diagram(s)sequenceDiagram
participant Caller
participant random_geometry
participant json as json module
participant sedonadb as SedonaDB
participant df as DataFrame
Caller->>random_geometry: call with parameters
random_geometry->>random_geometry: validate & normalize arguments
random_geometry->>json: build JSON payload
json-->>random_geometry: payload
random_geometry->>sedonadb: execute query
sedonadb-->>random_geometry: result set
random_geometry->>df: construct DataFrame
df-->>random_geometry: dataframe object
random_geometry->>random_geometry: return first n rows
random_geometry-->>Caller: DataFrame
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
| f"Expected bounds as [xmin, ymin, xmax, ymax] but got {bounds}" | ||
| ) | ||
|
|
||
| width = bounds[2] - bounds[0] |
There was a problem hiding this comment.
The validation error message doesn't match the check: the condition compares size_min to width/height, but the message mentions width/2 and height/2. Consider aligning the check or the message to avoid confusion.
🤖 React with 👍 or 👎 to let us know if the comment was useful.
There was a problem hiding this comment.
value:useful; category:bug; feedback:The AI reviewer is correct that the error message talks about half width/height while the check uses their full values. Either the check or the error message is wrong.
Pull Request Review: Expose random_geometry as a Python functionSummaryThis PR adds a Python wrapper function Code Quality and Best Practices✅ Strengths
|
Potential Bugs or Issues🐛 Issues Found
Performance Considerations⚡ Observations
|
Security Concerns🔒 Assessment
Test Coverage📊 Status:
|
Summary RecommendationsMust Fix (Breaking/Critical)
Should Fix (Quality/Maintainability)
Nice to Have (Polish)
ConclusionThis is a well-structured addition that provides a clean Python interface to existing functionality. The main areas for improvement are:
The code is safe from a security perspective and follows good Python practices overall. Overall Assessment: ✅ Approve with recommendations for improvements Great work on the type hints and parameter validation! 🎉 |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
python/sedonadb/python/sedonadb/testing.py (1)
53-56: Consider moving imports to module level.Imports are typically placed at the module level unless there's a specific reason for local imports (e.g., avoiding circular dependencies).
Apply this change:
Move these imports to the top of the file (after line 24):
import geoarrow.pyarrow as ga import pyarrow as pa +import json +import timeAnd remove them from inside the function:
) -> "sedonadb.dataframe.DataFrame": - import json - import time - import sedonadbNote: Keep the
sedonadbimport inside the function if it's needed to avoid circular import issues.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
python/sedonadb/python/sedonadb/testing.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
python/sedonadb/python/sedonadb/testing.py (3)
python/sedonadb/python/sedonadb/context.py (2)
connect(174-176)sql(155-171)python/sedonadb/python/sedonadb/dataframe.py (1)
limit(83-113)python/sedonadb/src/dataframe.rs (1)
limit(83-90)
🪛 Ruff (0.14.1)
python/sedonadb/python/sedonadb/testing.py
75-75: Avoid specifying long messages outside the exception class
(TRY003)
77-77: Avoid specifying long messages outside the exception class
(TRY003)
79-79: Avoid specifying long messages outside the exception class
(TRY003)
83-85: Avoid specifying long messages outside the exception class
(TRY003)
90-90: Avoid specifying long messages outside the exception class
(TRY003)
107-107: Possible SQL injection vector through string-based query construction
(S608)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: docs-and-deploy
- GitHub Check: python ubuntu-latest
- GitHub Check: codex
- GitHub Check: claude-review
🔇 Additional comments (3)
python/sedonadb/python/sedonadb/testing.py (3)
68-72: Clarify the size_max calculation logic.When
sizeis a single value,size_maxis set tosize + size / 1e3(0.1% larger). This creates a small range even when the user specifies a fixed size, which may be unexpected.Please clarify:
- Is this intentional behavior for the underlying
sd_random_geometryfunction?- Should users expecting a fixed size get exactly that, or is this small variance acceptable?
Consider either:
- Using
size_max = sizefor true fixed-size behavior, or- Documenting this behavior in the function docstring
105-108: Clarify why.limit(n)is needed whentarget_rowsis already set.The args dictionary includes
"target_rows": int(n)(line 101), but the query result is then limited tonrows again (line 108). Is this redundancy intentional as a safeguard, or can the.limit(n)call be removed?Please verify whether
sd_random_geometryguarantees exactlytarget_rowsresults, or if the additional limit is a necessary safety measure.
107-107: Consider the SQL injection concern flagged by static analysis.While
json.dumps(args)does properly escape values and all inputs are controlled (not from untrusted sources), the f-string query construction was flagged by the static analyzer. Given that this is a testing utility with controlled inputs, the risk is low, but consider whether parameterized queries or a safer query builder pattern would be more appropriate for consistency with security best practices.Based on static analysis findings.
| def random_geometry( | ||
| n: int = 1024, | ||
| geometry_type: Literal[ | ||
| "Point", | ||
| "LineString", | ||
| "Polygon", | ||
| "MultiPoint", | ||
| "MultiLineString", | ||
| "MultiPolygon", | ||
| "GeometryCollection", | ||
| ] = "Point", | ||
| *, | ||
| num_vertices: Union[int, Tuple[int, int]] = 4, | ||
| num_parts: Union[int, Tuple[int, int]] = (1, 3), | ||
| size: Union[float, Tuple[float, float]] = (5.0, 20.0), | ||
| bounds: Iterable[float] = (-170, -80, 170, 80), | ||
| hole_rate: float = 0.0, | ||
| empty_rate: float = 0.0, | ||
| null_rate: float = 0.0, | ||
| seed: Optional[int] = None, | ||
| ) -> "sedonadb.dataframe.DataFrame": |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Add a docstring to document the public API.
This public testing utility lacks documentation. Please add a docstring that explains the purpose, parameters, return value, and provides usage examples.
Consider adding documentation like:
def random_geometry(
n: int = 1024,
geometry_type: Literal[
"Point",
"LineString",
"Polygon",
"MultiPoint",
"MultiLineString",
"MultiPolygon",
"GeometryCollection",
] = "Point",
*,
num_vertices: Union[int, Tuple[int, int]] = 4,
num_parts: Union[int, Tuple[int, int]] = (1, 3),
size: Union[float, Tuple[float, float]] = (5.0, 20.0),
bounds: Iterable[float] = (-170, -80, 170, 80),
hole_rate: float = 0.0,
empty_rate: float = 0.0,
null_rate: float = 0.0,
seed: Optional[int] = None,
) -> "sedonadb.dataframe.DataFrame":
+ """Generate random geometric data for testing.
+
+ Args:
+ n: Number of geometries to generate
+ geometry_type: Type of geometry to generate
+ num_vertices: Number of vertices (int or (min, max) tuple)
+ num_parts: Number of parts for multi-geometries (int or (min, max) tuple)
+ size: Size of geometries (float or (min, max) tuple)
+ bounds: Bounding box as [xmin, ymin, xmax, ymax]
+ hole_rate: Probability of holes in polygons (0.0 to 1.0)
+ empty_rate: Probability of empty geometries (0.0 to 1.0)
+ null_rate: Probability of null geometries (0.0 to 1.0)
+ seed: Random seed for reproducibility (auto-generated if None)
+
+ Returns:
+ DataFrame with 'id' and 'geometry' columns
+
+ Examples:
+ >>> df = random_geometry(100, "Point", bounds=(-180, -90, 180, 90))
+ >>> df = random_geometry(50, "Polygon", num_vertices=(3, 10), size=(1.0, 5.0))
+ """📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def random_geometry( | |
| n: int = 1024, | |
| geometry_type: Literal[ | |
| "Point", | |
| "LineString", | |
| "Polygon", | |
| "MultiPoint", | |
| "MultiLineString", | |
| "MultiPolygon", | |
| "GeometryCollection", | |
| ] = "Point", | |
| *, | |
| num_vertices: Union[int, Tuple[int, int]] = 4, | |
| num_parts: Union[int, Tuple[int, int]] = (1, 3), | |
| size: Union[float, Tuple[float, float]] = (5.0, 20.0), | |
| bounds: Iterable[float] = (-170, -80, 170, 80), | |
| hole_rate: float = 0.0, | |
| empty_rate: float = 0.0, | |
| null_rate: float = 0.0, | |
| seed: Optional[int] = None, | |
| ) -> "sedonadb.dataframe.DataFrame": | |
| def random_geometry( | |
| n: int = 1024, | |
| geometry_type: Literal[ | |
| "Point", | |
| "LineString", | |
| "Polygon", | |
| "MultiPoint", | |
| "MultiLineString", | |
| "MultiPolygon", | |
| "GeometryCollection", | |
| ] = "Point", | |
| *, | |
| num_vertices: Union[int, Tuple[int, int]] = 4, | |
| num_parts: Union[int, Tuple[int, int]] = (1, 3), | |
| size: Union[float, Tuple[float, float]] = (5.0, 20.0), | |
| bounds: Iterable[float] = (-170, -80, 170, 80), | |
| hole_rate: float = 0.0, | |
| empty_rate: float = 0.0, | |
| null_rate: float = 0.0, | |
| seed: Optional[int] = None, | |
| ) -> "sedonadb.dataframe.DataFrame": | |
| """Generate random geometric data for testing. | |
| Args: | |
| n: Number of geometries to generate | |
| geometry_type: Type of geometry to generate | |
| num_vertices: Number of vertices (int or (min, max) tuple) | |
| num_parts: Number of parts for multi-geometries (int or (min, max) tuple) | |
| size: Size of geometries (float or (min, max) tuple) | |
| bounds: Bounding box as [xmin, ymin, xmax, ymax] | |
| hole_rate: Probability of holes in polygons (0.0 to 1.0) | |
| empty_rate: Probability of empty geometries (0.0 to 1.0) | |
| null_rate: Probability of null geometries (0.0 to 1.0) | |
| seed: Random seed for reproducibility (auto-generated if None) | |
| Returns: | |
| DataFrame with 'id' and 'geometry' columns | |
| Examples: | |
| >>> df = random_geometry(100, "Point", bounds=(-180, -90, 180, 90)) | |
| >>> df = random_geometry(50, "Polygon", num_vertices=(3, 10), size=(1.0, 5.0)) | |
| """ |
🤖 Prompt for AI Agents
In python/sedonadb/python/sedonadb/testing.py around lines 32 to 52, add a
comprehensive docstring to the public function random_geometry: document the
function purpose (generate N random geometries of specified type), enumerate and
describe each parameter (n, geometry_type, num_vertices, num_parts, size,
bounds, hole_rate, empty_rate, null_rate, seed) including accepted types and
defaults, describe the return value (sedonadb.dataframe.DataFrame containing the
geometries and any metadata), note behavior for edge cases (rates producing
empty/null geometries, tuple ranges for num_vertices/num_parts/size), and
include a short usage example showing typical invocation and expected output
shape; keep it concise, follow Google or NumPy docstring style, and place it
immediately below the function signature.
There was a problem hiding this comment.
value:good-to-have; category:documentation; feedback:The AI reviewer is correct that adding a docstring to the new function would be useful for its users. It would document what are the expected arguments with their possible values.
There was a problem hiding this comment.
@martin-augment Thank you for the feedback! I'm glad the suggestion is helpful. The docstring will indeed make it easier for users to understand how to use random_geometry() and what values are valid for each parameter. If you'd like me to assist with anything else related to this PR, feel free to let me know!
🐰✨
( •‿•)
/>📚
| width = bounds[2] - bounds[0] | ||
| height = bounds[3] - bounds[1] | ||
| if size_min > width or size_min > height: | ||
| raise ValueError("size > height / 2 or width / 2 of bounds") |
There was a problem hiding this comment.
Fix the error message to match the validation logic.
The error message states "size > height / 2 or width / 2 of bounds", but the actual validation checks size_min > width or size_min > height (full dimensions, not halves).
Apply this diff:
width = bounds[2] - bounds[0]
height = bounds[3] - bounds[1]
if size_min > width or size_min > height:
- raise ValueError("size > height / 2 or width / 2 of bounds")
+ raise ValueError("size_min exceeds bounds width or height")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| width = bounds[2] - bounds[0] | |
| height = bounds[3] - bounds[1] | |
| if size_min > width or size_min > height: | |
| raise ValueError("size > height / 2 or width / 2 of bounds") | |
| width = bounds[2] - bounds[0] | |
| height = bounds[3] - bounds[1] | |
| if size_min > width or size_min > height: | |
| raise ValueError("size_min exceeds bounds width or height") |
🧰 Tools
🪛 Ruff (0.14.1)
90-90: Avoid specifying long messages outside the exception class
(TRY003)
🤖 Prompt for AI Agents
In python/sedonadb/python/sedonadb/testing.py around lines 87 to 90 the
ValueError message is misleading: the validation checks whether size_min is
greater than the full width or height (size_min > width or size_min > height)
but the message mentions halves ("height / 2 or width / 2"). Update the
exception message to reflect the actual check (e.g., "size_min must be <= width
and <= height" or "size_min greater than bounds width or height") so the error
accurately describes the validation performed.
There was a problem hiding this comment.
value:useful; category:bug; feedback:The AI reviewer is correct that the error message talks about half width/height while the check uses their full values. Either the check or the error message is wrong.
There was a problem hiding this comment.
@martin-augment Thank you for confirming! After analyzing the context, I believe the validation check is the bug, not the error message. Here's why:
Reasoning:
When generating random geometries with a given size within bounds, the geometry needs room on all sides. If size represents the extent or diameter of a geometry:
- A geometry centered anywhere within the bounds needs at least
size/2clearance from each edge - Therefore,
size_minshould be at mostwidth/2orheight/2to allow proper placement - Checking against the full width/height is too restrictive and would reject valid size values
Recommended Fix:
Update the validation check to use half dimensions:
width = bounds[2] - bounds[0]
height = bounds[3] - bounds[1]
if size_min > width / 2 or size_min > height / 2:
raise ValueError("size_min exceeds half of bounds width or height")This aligns the logic with the original error message intent and ensures geometries can be properly positioned within the bounds.
|
Bugs
Suggestions
|
value:incorrect-but-reasonable; category:bug; feedback: |
value:useful; category:bug; feedback:The AI reviewer is correct that the error message talks about half width/height while the check uses their full values. Either the check or the error message is wrong. |
value:incorrect-but-reasonable; category:bug; feedback:The type of the |
value:incorrect-but-reasonable; category:bug; feedback:This API is used for testing purposes, so the developer is responsible to request a reasonable amount of results for its specific needs. |
value:useful; category:bug; feedback:The AI reviewer is correct that the bounds arguments should be validated before using them to calculate the width and the height because now they could have negative values. |
97: To review by AI
Summary by CodeRabbit