-
Notifications
You must be signed in to change notification settings - Fork 18
refactor: add helpers for clients #111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
Signed-off-by: Chojan Shang <psiace@apache.org>
📝 WalkthroughWalkthroughCentralized client initialization and password handling; added BaseClient database management APIs (create/get/delete/list); moved SQL parameter rendering and query-fetch decision logic to shared utilities; embedded and server clients now delegate database operations to BaseClient and use unified SQL rendering/fetch logic. Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/pyseekdb/client/sql_utils.py (1)
60-66: Risky auto-detection of hex strings.The heuristic at line 64 treats any even-length string containing only hex characters as a hex value for UNHEX. This can incorrectly transform legitimate string data (e.g., UUIDs without hyphens like
"abc123def456...", short hex-like words like"DEAD","CAFE").Consider requiring an explicit marker or separate method for hex values rather than auto-detecting based on content:
# Example false positive: user's document ID "DEADBEEF" would be treated as hex >>> stringifier.stringify_value("DEADBEEF") "UNHEX('DEADBEEF')" # Wrong! User wanted the literal string
🤖 Fix all issues with AI agents
In @src/pyseekdb/client/client_base.py:
- Around line 389-393: The get_database function constructs sql by interpolating
the name variable directly into the SQL string (sql) causing SQL injection;
change it to use a parameterized query or properly escape the identifier before
building the query (e.g., use the DB API parameter binding on the cursor.execute
call or an escape function provided by the connection) so that the SCHEMA_NAME
comparison is passed as a bound parameter instead of f"…{name}…"; update the
code around get_database where sql is defined and where the query is executed to
use the parameter tuple (or escape_string) rather than string interpolation.
In @src/pyseekdb/client/sql_utils.py:
- Around line 27-41: In render_sql_with_params, handle bytes explicitly and
avoid fragile sequential replace: detect bytes in params (e.g., convert to a
safe SQL literal by decoding with errors='replace' or converting to hex and then
escaping) instead of using str(param) which yields b'...'; then perform
substitution by splitting the original sql on "%s" and interleaving the
escaped/replaced parameter literals (or validate that count of "%s" matches
len(params)) to produce the final string, rather than repeatedly calling
rendered.replace("%s", ..., 1) which can be corrupted if a replacement contains
the substring "%s".
🧹 Nitpick comments (2)
src/pyseekdb/client/client_seekdb_embedded.py (1)
22-22: Unused import:is_query_sql.
is_query_sqlis imported but not directly used in this file. The query detection is handled viaself._should_fetch_results()which is inherited from the base class.Proposed fix
-from .sql_utils import is_query_sql, render_sql_with_params +from .sql_utils import render_sql_with_paramssrc/pyseekdb/client/client_base.py (1)
442-446: Consider using parameterized queries for LIMIT and OFFSET values.While type hints indicate
Optional[int]for both parameters, they are not enforced at runtime. Using f-string interpolation with these values works correctly for integers but would be more robust with parameterized queries. If type checking is added to the project later, consider refactoring to use parameter placeholders instead of f-string interpolation.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
src/pyseekdb/client/__init__.pysrc/pyseekdb/client/client_base.pysrc/pyseekdb/client/client_seekdb_embedded.pysrc/pyseekdb/client/client_seekdb_server.pysrc/pyseekdb/client/sql_utils.py
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-12-22T12:44:57.424Z
Learnt from: longbingljw
Repo: oceanbase/pyseekdb PR: 91
File: tests/integration_tests/test_collection_dml.py:36-36
Timestamp: 2025-12-22T12:44:57.424Z
Learning: In tests/integration_tests/test_collection_dml.py and similar test files, using `db_client._server._execute()` to run SQL commands directly (e.g., CREATE TABLE statements) is acceptable for integration tests that need custom table setup with specific vector index configurations.
Applied to files:
src/pyseekdb/client/client_seekdb_embedded.pysrc/pyseekdb/client/client_seekdb_server.py
📚 Learning: 2025-12-22T12:33:52.528Z
Learnt from: longbingljw
Repo: oceanbase/pyseekdb PR: 91
File: tests/integration_tests/test_admin_database_management.py:26-39
Timestamp: 2025-12-22T12:33:52.528Z
Learning: In tests/integration_tests/test_admin_database_management.py, using `isinstance(admin_client._server, pyseekdb.SeekdbEmbeddedClient)` to detect client type is acceptable. AdminClient should not expose a public `mode` property.
Applied to files:
src/pyseekdb/client/client_seekdb_embedded.pysrc/pyseekdb/client/__init__.pysrc/pyseekdb/client/client_seekdb_server.py
🧬 Code graph analysis (5)
src/pyseekdb/client/sql_utils.py (1)
src/pyseekdb/client/base_connection.py (1)
_execute(31-33)
src/pyseekdb/client/client_seekdb_embedded.py (2)
src/pyseekdb/client/sql_utils.py (2)
is_query_sql(15-24)render_sql_with_params(27-41)src/pyseekdb/client/client_base.py (1)
_should_fetch_results(1817-1821)
src/pyseekdb/client/__init__.py (3)
src/pyseekdb/client/collection.py (1)
get(364-428)src/pyseekdb/client/client_seekdb_embedded.py (1)
SeekdbEmbeddedClient(27-284)src/pyseekdb/client/client_seekdb_server.py (1)
RemoteServerClient(19-207)
src/pyseekdb/client/client_base.py (4)
src/pyseekdb/client/base_connection.py (3)
BaseConnection(7-65)_execute(31-33)_ensure_connection(16-18)src/pyseekdb/client/admin_client.py (9)
AdminAPI(34-94)create_database(41-49)create_database(116-118)get_database(52-63)get_database(120-122)delete_database(66-74)delete_database(124-126)list_databases(77-94)list_databases(128-135)src/pyseekdb/client/database.py (1)
Database(7-54)src/pyseekdb/client/sql_utils.py (1)
is_query_sql(15-24)
src/pyseekdb/client/client_seekdb_server.py (1)
src/pyseekdb/client/client_base.py (5)
create_database(365-377)get_database(379-408)delete_database(410-422)list_databases(424-463)_database_tenant(341-343)
🪛 Ruff (0.14.10)
src/pyseekdb/client/__init__.py
123-126: Avoid specifying long messages outside the exception class
(TRY003)
src/pyseekdb/client/client_base.py
341-341: Unused method argument: tenant
(ARG002)
390-392: Possible SQL injection vector through string-based query construction
(S608)
397-397: Avoid specifying long messages outside the exception class
(TRY003)
401-401: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: integration-test (server)
- GitHub Check: integration-test (oceanbase)
- GitHub Check: integration-test (embedded)
🔇 Additional comments (12)
src/pyseekdb/client/client_seekdb_server.py (2)
133-196: LGTM! Clean delegation to base class.The database management methods now properly delegate to the superclass, reducing code duplication while preserving the tenant context through the
_database_tenantoverride below.
198-203: LGTM! Proper tenant resolution for remote server mode.The
_database_tenantoverride correctly enforces the client's tenant for all database operations, with an appropriate warning when the caller specifies a different tenant.src/pyseekdb/client/sql_utils.py (1)
15-24: LGTM! Query detection helper.The function correctly identifies common query statement types. The empty string guard at line 16-17 prevents exceptions from
strip()on None.src/pyseekdb/client/__init__.py (3)
57-64: LGTM! Clean helper functions for password and path resolution.These helpers centralize the environment variable fallback and default path logic, reducing duplication between
Client()andAdminClient().
67-126: LGTM! Well-structured factory consolidation.The
_create_server_clienthelper cleanly centralizes the embedded vs. remote client selection logic. The late import of_PYLIBSEEKDB_AVAILABLEat line 113 appropriately avoids circular import issues while keeping the fallback behavior localized.
221-232: LGTM! Clean delegation to shared helpers.Both
Client()andAdminClient()now route through_resolve_passwordand_create_server_client, reducing code duplication while maintaining existing behavior.Also applies to: 293-305
src/pyseekdb/client/client_seekdb_embedded.py (2)
145-153: LGTM! Clean delegation to shared SQL utilities.The use of
render_sql_with_paramsand_should_fetch_resultsproperly centralizes the SQL handling logic, removing duplication from the embedded client.
236-280: LGTM! Consistent delegation pattern.The database management methods correctly delegate to the superclass, matching the pattern used in
RemoteServerClient.src/pyseekdb/client/client_base.py (4)
341-346: LGTM! Base tenant resolution with override hook.The base implementation returning
Noneallows subclasses to provide tenant-specific behavior while_database_contexthandles the logging format cleanly.
348-363: LGTM! Robust schema row parsing.The
_parse_schema_rowhelper handles both dict and tuple/list row formats, providing flexibility for different cursor implementations.
1571-1584: LGTM! Unified fetch decision logic.The
_should_fetch_resultsintegration in both context-manager and try/finally branches ensures consistent behavior for query vs non-query statements.
1817-1841: LGTM! Well-structured execution helpers.The
_should_fetch_resultsmethod provides a sensible fallback chain (cursor.description → SQL prefix detection), and_executeproperly handles both cursor management styles while returning results only for queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @src/pyseekdb/client/client_base.py:
- Around line 365-377: The create_database method interpolates the raw database
name into SQL and is vulnerable to identifier injection; fix it by validating
and/or escaping identifier characters before building the SQL (e.g., disallow
unsafe characters or replace any backtick with doubled backticks), add a
reusable helper like _escape_identifier(name: str) and use it when composing SQL
in create_database and the other affected routines noted (lines ~412-424), and
continue to call self._execute(sql) with the sanitized identifier to prevent
injection.
🧹 Nitpick comments (2)
src/pyseekdb/client/sql_utils.py (1)
36-38: Consider handling bytes decode errors more explicitly.Using
errors="replace"silently converts invalid UTF-8 sequences to replacement characters (U+FFFD). This could mask data corruption issues. If bytes parameters are expected to be valid UTF-8 text, consider usingerrors="strict"(the default) and letting the exception propagate to surface encoding issues early.♻️ Suggested change
elif isinstance(param, (bytes, bytearray, memoryview)): - text = bytes(param).decode("utf-8", errors="replace") + text = bytes(param).decode("utf-8") # Let UnicodeDecodeError propagate replacement = f"'{escape_string(text)}'"src/pyseekdb/client/client_base.py (1)
444-448: Consider validating LIMIT/OFFSET parameters.While
limitandoffsetare typed asOptional[int], adding runtime validation would provide defense-in-depth against unexpected values (e.g., negative numbers).♻️ Suggested validation
if limit is not None: + if not isinstance(limit, int) or limit < 0: + raise ValueError("limit must be a non-negative integer") if offset is not None: + if not isinstance(offset, int) or offset < 0: + raise ValueError("offset must be a non-negative integer") sql += f" LIMIT {offset}, {limit}" else: sql += f" LIMIT {limit}"
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/pyseekdb/client/client_base.pysrc/pyseekdb/client/client_seekdb_embedded.pysrc/pyseekdb/client/sql_utils.py
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-12-22T12:44:57.424Z
Learnt from: longbingljw
Repo: oceanbase/pyseekdb PR: 91
File: tests/integration_tests/test_collection_dml.py:36-36
Timestamp: 2025-12-22T12:44:57.424Z
Learning: In tests/integration_tests/test_collection_dml.py and similar test files, using `db_client._server._execute()` to run SQL commands directly (e.g., CREATE TABLE statements) is acceptable for integration tests that need custom table setup with specific vector index configurations.
Applied to files:
src/pyseekdb/client/client_seekdb_embedded.py
📚 Learning: 2025-12-22T12:33:52.528Z
Learnt from: longbingljw
Repo: oceanbase/pyseekdb PR: 91
File: tests/integration_tests/test_admin_database_management.py:26-39
Timestamp: 2025-12-22T12:33:52.528Z
Learning: In tests/integration_tests/test_admin_database_management.py, using `isinstance(admin_client._server, pyseekdb.SeekdbEmbeddedClient)` to detect client type is acceptable. AdminClient should not expose a public `mode` property.
Applied to files:
src/pyseekdb/client/client_seekdb_embedded.py
🧬 Code graph analysis (2)
src/pyseekdb/client/client_seekdb_embedded.py (2)
src/pyseekdb/client/sql_utils.py (1)
render_sql_with_params(23-47)src/pyseekdb/client/client_base.py (5)
_should_fetch_results(1819-1823)create_database(365-377)get_database(379-410)delete_database(412-424)list_databases(426-465)
src/pyseekdb/client/client_base.py (6)
src/pyseekdb/client/base_connection.py (3)
BaseConnection(7-65)_execute(31-33)_ensure_connection(16-18)src/pyseekdb/client/admin_client.py (9)
AdminAPI(34-94)create_database(41-49)create_database(116-118)get_database(52-63)get_database(120-122)delete_database(66-74)delete_database(124-126)list_databases(77-94)list_databases(128-135)src/pyseekdb/client/database.py (1)
Database(7-54)src/pyseekdb/client/sql_utils.py (1)
is_query_sql(11-20)src/pyseekdb/client/client_seekdb_server.py (6)
_database_tenant(198-203)create_database(134-145)get_database(147-161)_ensure_connection(69-85)delete_database(163-174)list_databases(176-196)src/pyseekdb/client/client_seekdb_embedded.py (7)
create_database(236-244)get_database(246-254)_ensure_connection(72-95)_use_context_manager_for_cursor(116-121)_execute_query_with_cursor(123-206)delete_database(256-264)list_databases(266-280)
🪛 Ruff (0.14.10)
src/pyseekdb/client/client_base.py
341-341: Unused method argument: tenant
(ARG002)
399-399: Avoid specifying long messages outside the exception class
(TRY003)
403-403: Avoid specifying long messages outside the exception class
(TRY003)
src/pyseekdb/client/sql_utils.py
29-31: Avoid specifying long messages outside the exception class
(TRY003)
33-33: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: integration-test (server)
- GitHub Check: integration-test (oceanbase)
- GitHub Check: integration-test (embedded)
🔇 Additional comments (11)
src/pyseekdb/client/sql_utils.py (2)
11-20: LGTM!The
is_query_sqlfunction correctly identifies query statements by checking for common SQL prefixes. The logic handles empty strings and uses case-insensitive matching via.upper().
23-47: The split-based placeholder approach is acceptable for this use case, as SQL strings are internally controlled.The
sql.split("%s")implementation is not a practical concern here becauserender_sql_with_paramsreceives only internally-constructed SQL—never user-supplied input. All SQL construction in the codebase uses programmatic building (FilterBuilder, f-strings) and does not include literal%sstrings. No changes needed for the placeholder detection logic.The suggestion to add
strict=Trueto thezip()call on line 33 remains valid but minor, given that placeholder counts are already validated beforehand.src/pyseekdb/client/client_seekdb_embedded.py (3)
143-157: LGTM!The refactored
_execute_query_with_cursorcleanly integrates with the new shared utilities:
- Uses
render_sql_with_paramsfor parameter embedding- Uses
_should_fetch_resultsto determine fetch behavior- Correctly returns an empty list for non-query statements
This maintains behavior consistency while reducing code duplication.
236-280: LGTM!Clean delegation to base class implementations. The docstrings appropriately document that the
tenantparameter is ignored for embedded mode, maintaining API consistency while deferring implementation to the shared base class.
8-8: Verify ifescape_stringimport is still needed.With parameter rendering now delegated to
render_sql_with_paramsfromsql_utils, theescape_stringimport on line 8 may be unused in this file. If confirmed, removing it would reduce unnecessary imports.#!/bin/bash # Check if escape_string is used elsewhere in client_seekdb_embedded.py rg -n "escape_string" src/pyseekdb/client/client_seekdb_embedded.pyAlso applies to: 22-22
src/pyseekdb/client/client_base.py (6)
9-9: LGTM!Import additions are appropriate for the new functionality:
Sequencefor type hintsDatabaseclass for return typesis_query_sqlfor fetch decision logicAlso applies to: 16-16, 34-35
341-363: LGTM!The helper methods are well-designed:
_database_tenantprovides a base implementation for subclasses to override (the unusedtenantparameter is intentional for API consistency)_database_contextgenerates context strings for logging_parse_schema_rowhandles both dict and tuple row formats robustlyThe static analysis hint ARG002 is a false positive since the parameter enables subclass override patterns.
379-410: LGTM!The
get_databasemethod correctly uses parameterized queries (line 392-396) to prevent SQL injection. The error handling appropriately raisesValueErrorwhen the database is not found.
1819-1823: LGTM!The
_should_fetch_resultsmethod uses a robust two-tier approach:
- First checks
cursor.description(the most reliable indicator)- Falls back to
is_query_sql()when description is unavailableThis correctly handles both standard DB-API cursors and embedded cursors that may not populate description.
1825-1843: LGTM!The new
_executemethod provides a clean, centralized implementation:
- Consolidates connection and cursor management
- Uses
_should_fetch_resultsfor consistent fetch behavior- Supports both context manager and manual cursor patterns
- Returns
Nonefor non-query statements, avoiding unnecessary fetch attempts
1571-1594: LGTM!The
_execute_query_with_cursormethod now consistently uses_should_fetch_resultsin both the context manager and try/finally paths. Returning an empty list for non-query statements aligns with theList[Dict[str, Any]]return type and prevents errors when callers iterate over results.
Signed-off-by: Chojan Shang <psiace@apache.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/pyseekdb/client/client_seekdb_embedded.py (1)
8-8: Remove unused import.
escape_stringfrompymysql.convertersis not used in this file. Parameter escaping is handled entirely byrender_sql_with_paramsfromsql_utils.py(line 145).
🤖 Fix all issues with AI agents
In @src/pyseekdb/client/sql_utils.py:
- Around line 39-40: The branch handling numeric parameters currently converts
floats with replacement = str(param) which yields 'inf'/'nan' for special float
values; update the numeric handling (the isinstance(param, (int, float)) branch
that sets replacement) to check float finiteness (use math.isfinite(param) or
math.isinf/math.isnan) and raise a clear error (e.g., ValueError) when param is
not a finite number, otherwise continue using str(param) for valid ints/floats
so only finite numeric literals are emitted.
🧹 Nitpick comments (2)
src/pyseekdb/client/sql_utils.py (1)
33-46: Addstrict=Trueto zip for defensive consistency.While parameter count is validated above, adding
strict=Trueprovides an additional safety net and satisfies linter B905.Suggested fix
- for param, part in zip(params, parts[1:]): + for param, part in zip(params, parts[1:], strict=True):src/pyseekdb/client/client_seekdb_embedded.py (1)
164-199: Regex-based column extraction has known edge cases.The parentheses-depth tracking doesn't account for:
- String literals containing commas (e.g.,
SELECT 'a,b' AS col)- Nested subqueries in SELECT clause
- CASE expressions with commas
This is acceptable as a fallback when
cursor.descriptionis unavailable, but consider adding a comment documenting these limitations for future maintainers.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/pyseekdb/client/client_seekdb_embedded.pysrc/pyseekdb/client/sql_utils.py
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-12-22T12:44:57.424Z
Learnt from: longbingljw
Repo: oceanbase/pyseekdb PR: 91
File: tests/integration_tests/test_collection_dml.py:36-36
Timestamp: 2025-12-22T12:44:57.424Z
Learning: In tests/integration_tests/test_collection_dml.py and similar test files, using `db_client._server._execute()` to run SQL commands directly (e.g., CREATE TABLE statements) is acceptable for integration tests that need custom table setup with specific vector index configurations.
Applied to files:
src/pyseekdb/client/client_seekdb_embedded.py
📚 Learning: 2025-12-22T12:33:52.528Z
Learnt from: longbingljw
Repo: oceanbase/pyseekdb PR: 91
File: tests/integration_tests/test_admin_database_management.py:26-39
Timestamp: 2025-12-22T12:33:52.528Z
Learning: In tests/integration_tests/test_admin_database_management.py, using `isinstance(admin_client._server, pyseekdb.SeekdbEmbeddedClient)` to detect client type is acceptable. AdminClient should not expose a public `mode` property.
Applied to files:
src/pyseekdb/client/client_seekdb_embedded.py
🧬 Code graph analysis (1)
src/pyseekdb/client/client_seekdb_embedded.py (2)
src/pyseekdb/client/sql_utils.py (1)
render_sql_with_params(23-47)src/pyseekdb/client/client_base.py (1)
_should_fetch_results(1817-1821)
🪛 Ruff (0.14.10)
src/pyseekdb/client/sql_utils.py
29-31: Avoid specifying long messages outside the exception class
(TRY003)
33-33: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: integration-test (embedded)
- GitHub Check: integration-test (oceanbase)
- GitHub Check: integration-test (server)
🔇 Additional comments (3)
src/pyseekdb/client/sql_utils.py (1)
11-20: LGTM!The query detection logic correctly identifies SELECT, SHOW, DESCRIBE, and DESC statements. The
.strip().upper()handling ensures case-insensitivity and whitespace tolerance.src/pyseekdb/client/client_seekdb_embedded.py (2)
145-153: LGTM!Clean delegation to the shared utilities
render_sql_with_paramsand_should_fetch_results. This aligns well with the PR objective to centralize SQL handling and reduce duplication.
236-280: LGTM!Clean delegation to the base class for database management operations. The docstrings appropriately document that the
tenantparameter is not meaningful for embedded mode while still forwarding it to maintain a consistent interface.
Summary
Summary by CodeRabbit
New Features
Utilities
Refactor
Bug Fixes
✏️ Tip: You can customize this high-level summary in your review settings.