Skip to content

Add MinIO to SingleStore CSV loader#252

Open
jeonghoonkang wants to merge 2 commits intomasterfrom
codex/create-code-for-csv-import-and-duration-calculation-x2pk2y
Open

Add MinIO to SingleStore CSV loader#252
jeonghoonkang wants to merge 2 commits intomasterfrom
codex/create-code-for-csv-import-and-duration-calculation-x2pk2y

Conversation

@jeonghoonkang
Copy link
Copy Markdown
Owner

Summary

  • default to dev_id and coll_dt fields when computing storage periods
  • treat empty CSV cells as SQL NULL and document the defaults

Testing

  • python -m py_compile apps/singlestore/minio_to_singlestore.py

https://chatgpt.com/codex/tasks/task_e_68bfbc1b573c8331be6ddad3233951cf

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codex Review: Here are some suggestions.

Reply with @codex fix comments to fix any unresolved comments.

About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you open a pull request for review, mark a draft as ready, or comment "@codex review". If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex fix this CI failure" or "@codex address that feedback".

Comment on lines +86 to +95
def insert_rows(conn, table: str, rows: Iterable[dict]) -> None:
"""Insert iterable of dictionaries into the given table."""
rows = list(rows)
if not rows:
return
columns = list(rows[0].keys())
placeholders = ",".join(["%s"] * len(columns))
column_list = ",".join(columns)
sql = f"INSERT INTO {table} ({column_list}) VALUES ({placeholders})"
values = [tuple(row[col] for col in columns) for row in rows]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] Escape identifiers when building INSERT statement

The insert statement interpolates the table name and column list directly from environment variables and CSV headers (e.g. INSERT INTO {table} ({column_list})). Because neither the table name nor column names are quoted or validated, a crafted header or configuration can break the query or inject arbitrary SQL. Wrapping identifiers in backticks (and validating them) avoids both syntax errors when reserved words appear and injection when importing untrusted CSV files.

Useful? React with 👍 / 👎.

Comment on lines +101 to +107
def calculate_periods(conn, cfg: DBConfig) -> List[Tuple]:
"""Return list of (id, start_time, end_time, seconds) tuples."""
query = (
f"SELECT {cfg.id_field}, MIN({cfg.time_field}) AS start_time, "
f"MAX({cfg.time_field}) AS end_time, "
f"TIMESTAMPDIFF(SECOND, MIN({cfg.time_field}), MAX({cfg.time_field})) "
f"AS duration_seconds FROM {cfg.table} GROUP BY {cfg.id_field}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] Dynamic SELECT query also exposes unescaped identifiers

The SELECT used to compute storage periods inserts cfg.table, cfg.id_field and cfg.time_field directly into the SQL text. If any of these values contain unexpected characters or originate from untrusted input, the query can fail or allow SQL injection. Identifier values should be validated and wrapped in backticks before interpolation so only known table/column names are used.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant