-
Notifications
You must be signed in to change notification settings - Fork 0
97: feat(python/sedonadb): Expose random_geometry as a Python function #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -14,11 +14,11 @@ | |||||||||||||||||
| # KIND, either express or implied. See the License for the | ||||||||||||||||||
| # specific language governing permissions and limitations | ||||||||||||||||||
| # under the License. | ||||||||||||||||||
| import os | ||||||||||||||||||
| import math | ||||||||||||||||||
| import os | ||||||||||||||||||
| import warnings | ||||||||||||||||||
| from pathlib import Path | ||||||||||||||||||
| from typing import TYPE_CHECKING, List, Tuple | ||||||||||||||||||
| from typing import TYPE_CHECKING, Iterable, List, Literal, Optional, Tuple, Union | ||||||||||||||||||
|
|
||||||||||||||||||
| import geoarrow.pyarrow as ga | ||||||||||||||||||
| import pyarrow as pa | ||||||||||||||||||
|
|
@@ -29,6 +29,85 @@ | |||||||||||||||||
| import sedonadb | ||||||||||||||||||
|
|
||||||||||||||||||
|
|
||||||||||||||||||
| def random_geometry( | ||||||||||||||||||
| n: int = 1024, | ||||||||||||||||||
| geometry_type: Literal[ | ||||||||||||||||||
| "Point", | ||||||||||||||||||
| "LineString", | ||||||||||||||||||
| "Polygon", | ||||||||||||||||||
| "MultiPoint", | ||||||||||||||||||
| "MultiLineString", | ||||||||||||||||||
| "MultiPolygon", | ||||||||||||||||||
| "GeometryCollection", | ||||||||||||||||||
| ] = "Point", | ||||||||||||||||||
| *, | ||||||||||||||||||
| num_vertices: Union[int, Tuple[int, int]] = 4, | ||||||||||||||||||
| num_parts: Union[int, Tuple[int, int]] = (1, 3), | ||||||||||||||||||
| size: Union[float, Tuple[float, float]] = (5.0, 20.0), | ||||||||||||||||||
| bounds: Iterable[float] = (-170, -80, 170, 80), | ||||||||||||||||||
| hole_rate: float = 0.0, | ||||||||||||||||||
| empty_rate: float = 0.0, | ||||||||||||||||||
| null_rate: float = 0.0, | ||||||||||||||||||
| seed: Optional[int] = None, | ||||||||||||||||||
| ) -> "sedonadb.dataframe.DataFrame": | ||||||||||||||||||
| import json | ||||||||||||||||||
| import time | ||||||||||||||||||
|
|
||||||||||||||||||
| import sedonadb | ||||||||||||||||||
|
|
||||||||||||||||||
| if isinstance(num_vertices, tuple): | ||||||||||||||||||
| num_vertices_min, num_vertices_max = num_vertices | ||||||||||||||||||
| else: | ||||||||||||||||||
| num_vertices_min = num_vertices_max = num_vertices | ||||||||||||||||||
|
|
||||||||||||||||||
| if isinstance(num_parts, tuple): | ||||||||||||||||||
| num_parts_min, num_parts_max = num_parts | ||||||||||||||||||
| else: | ||||||||||||||||||
| num_parts_min = num_parts_max = num_parts | ||||||||||||||||||
|
|
||||||||||||||||||
| if isinstance(size, tuple): | ||||||||||||||||||
| size_min, size_max = size | ||||||||||||||||||
| else: | ||||||||||||||||||
| size_min = size | ||||||||||||||||||
| size_max = size + size / 1e3 | ||||||||||||||||||
|
|
||||||||||||||||||
| if num_vertices_min > num_vertices_max: | ||||||||||||||||||
| raise ValueError("num_vertices_min > num_vertices_max") | ||||||||||||||||||
| if num_parts_min > num_parts_max: | ||||||||||||||||||
| raise ValueError("num_parts_min > num_parts_max") | ||||||||||||||||||
| if size_min > size_max: | ||||||||||||||||||
| raise ValueError("size_min > size_max") | ||||||||||||||||||
|
|
||||||||||||||||||
| bounds = [float(b) for b in bounds] | ||||||||||||||||||
| if len(bounds) != 4: | ||||||||||||||||||
| raise ValueError( | ||||||||||||||||||
| f"Expected bounds as [xmin, ymin, xmax, ymax] but got {bounds}" | ||||||||||||||||||
| ) | ||||||||||||||||||
|
|
||||||||||||||||||
| width = bounds[2] - bounds[0] | ||||||||||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The validation error message doesn't match the check: the condition compares 🤖 React with 👍 or 👎 to let us know if the comment was useful.
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. value:useful; category:bug; feedback:The AI reviewer is correct that the error message talks about half width/height while the check uses their full values. Either the check or the error message is wrong. |
||||||||||||||||||
| height = bounds[3] - bounds[1] | ||||||||||||||||||
| if size_min > width or size_min > height: | ||||||||||||||||||
| raise ValueError("size > height / 2 or width / 2 of bounds") | ||||||||||||||||||
|
Comment on lines
+87
to
+90
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fix the error message to match the validation logic. The error message states "size > height / 2 or width / 2 of bounds", but the actual validation checks Apply this diff: width = bounds[2] - bounds[0]
height = bounds[3] - bounds[1]
if size_min > width or size_min > height:
- raise ValueError("size > height / 2 or width / 2 of bounds")
+ raise ValueError("size_min exceeds bounds width or height")📝 Committable suggestion
Suggested change
🧰 Tools🪛 Ruff (0.14.1)90-90: Avoid specifying long messages outside the exception class (TRY003) 🤖 Prompt for AI Agents
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. value:useful; category:bug; feedback:The AI reviewer is correct that the error message talks about half width/height while the check uses their full values. Either the check or the error message is wrong. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @martin-augment Thank you for confirming! After analyzing the context, I believe the validation check is the bug, not the error message. Here's why: Reasoning:
Recommended Fix: width = bounds[2] - bounds[0]
height = bounds[3] - bounds[1]
if size_min > width / 2 or size_min > height / 2:
raise ValueError("size_min exceeds half of bounds width or height")This aligns the logic with the original error message intent and ensures geometries can be properly positioned within the bounds. |
||||||||||||||||||
|
|
||||||||||||||||||
| args = { | ||||||||||||||||||
| "bounds": bounds, | ||||||||||||||||||
| "empty_rate": max(min(empty_rate, 1.0), 0.0), | ||||||||||||||||||
| "geom_type": geometry_type, | ||||||||||||||||||
| "null_rate": max(min(null_rate, 1.0), 0.0), | ||||||||||||||||||
| "num_parts_range": [num_parts_min, num_parts_max], | ||||||||||||||||||
| "polygon_hole_rate": max(min(hole_rate, 1.0), 0.0), | ||||||||||||||||||
| "seed": int(seed) if seed is not None else round(time.time() * 1000), | ||||||||||||||||||
| "size_range": [size_min, size_max], | ||||||||||||||||||
| "target_rows": int(n), | ||||||||||||||||||
| "vertices_per_linestring_range": [num_vertices_min, num_vertices_max], | ||||||||||||||||||
| } | ||||||||||||||||||
|
|
||||||||||||||||||
| sd = sedonadb.connect() | ||||||||||||||||||
| return sd.sql( | ||||||||||||||||||
| f"SELECT id, geometry FROM sd_random_geometry('{json.dumps(args)}')" | ||||||||||||||||||
| ).limit(int(n)) | ||||||||||||||||||
|
|
||||||||||||||||||
|
|
||||||||||||||||||
| def skip_if_not_exists(path: Path): | ||||||||||||||||||
| """Skip a test using pytest.skip() if path does not exist | ||||||||||||||||||
|
|
||||||||||||||||||
|
|
||||||||||||||||||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Add a docstring to document the public API.
This public testing utility lacks documentation. Please add a docstring that explains the purpose, parameters, return value, and provides usage examples.
Consider adding documentation like:
def random_geometry( n: int = 1024, geometry_type: Literal[ "Point", "LineString", "Polygon", "MultiPoint", "MultiLineString", "MultiPolygon", "GeometryCollection", ] = "Point", *, num_vertices: Union[int, Tuple[int, int]] = 4, num_parts: Union[int, Tuple[int, int]] = (1, 3), size: Union[float, Tuple[float, float]] = (5.0, 20.0), bounds: Iterable[float] = (-170, -80, 170, 80), hole_rate: float = 0.0, empty_rate: float = 0.0, null_rate: float = 0.0, seed: Optional[int] = None, ) -> "sedonadb.dataframe.DataFrame": + """Generate random geometric data for testing. + + Args: + n: Number of geometries to generate + geometry_type: Type of geometry to generate + num_vertices: Number of vertices (int or (min, max) tuple) + num_parts: Number of parts for multi-geometries (int or (min, max) tuple) + size: Size of geometries (float or (min, max) tuple) + bounds: Bounding box as [xmin, ymin, xmax, ymax] + hole_rate: Probability of holes in polygons (0.0 to 1.0) + empty_rate: Probability of empty geometries (0.0 to 1.0) + null_rate: Probability of null geometries (0.0 to 1.0) + seed: Random seed for reproducibility (auto-generated if None) + + Returns: + DataFrame with 'id' and 'geometry' columns + + Examples: + >>> df = random_geometry(100, "Point", bounds=(-180, -90, 180, 90)) + >>> df = random_geometry(50, "Polygon", num_vertices=(3, 10), size=(1.0, 5.0)) + """📝 Committable suggestion
🤖 Prompt for AI Agents
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value:good-to-have; category:documentation; feedback:The AI reviewer is correct that adding a docstring to the new function would be useful for its users. It would document what are the expected arguments with their possible values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martin-augment Thank you for the feedback! I'm glad the suggestion is helpful. The docstring will indeed make it easier for users to understand how to use
random_geometry()and what values are valid for each parameter. If you'd like me to assist with anything else related to this PR, feel free to let me know!🐰✨
( •‿•)
/>📚