Add chdb skills: DataStore (pandas API) and SQL#14
Add chdb skills: DataStore (pandas API) and SQL#14auxten wants to merge 2 commits intoClickHouse:mainfrom
Conversation
Add two new skills for chdb, the in-process ClickHouse engine for Python: - chdb-datastore: Pandas-compatible DataStore API. Drop-in pandas replacement with ClickHouse performance, supporting 16+ data sources and cross-source joins. - chdb-sql: Raw ClickHouse SQL API. Covers chdb.query(), Session, DB-API 2.0, parametrized queries, UDFs, streaming, and all ClickHouse table functions. Each skill includes SKILL.md, API references, runnable examples, metadata.json, README.md, and a verify_install.py script.
evellasques
left a comment
There was a problem hiding this comment.
Really nice addition!
I just added two minor comments (feel free to ignore).
- Fix .where() description: it follows pandas semantics (masks non-matching values with NaN) rather than being an alias for .filter() - Add .where() usage example showing correct behavior - Add sorted values assertion to verify_install.py check_sort()
Great suggestions, fixed |
There was a problem hiding this comment.
Pull request overview
Adds two new agent skills—chdb-datastore (pandas-compatible DataStore API) and chdb-sql (in-process ClickHouse SQL for Python)—and updates repository docs to advertise and describe these skills alongside existing ClickHouse best-practices guidance.
Changes:
- Introduces
skills/chdb-datastore/with SKILL definition, API/connectors references, runnable examples, and an install verification script. - Introduces
skills/chdb-sql/with SKILL definition, SQL/table-function/API references, runnable examples, and an install verification script. - Updates root documentation (
README.md,AGENTS.md) to include both new skills in the repo overview/structure.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| skills/chdb-sql/scripts/verify_install.py | Adds an environment verification script for chdb SQL usage. |
| skills/chdb-sql/references/table-functions.md | Documents ClickHouse table functions relevant to chdb SQL usage. |
| skills/chdb-sql/references/sql-functions.md | Provides a quick reference for common ClickHouse SQL functions. |
| skills/chdb-sql/references/api-reference.md | Documents chdb SQL API surface (query/session/dbapi/etc.). |
| skills/chdb-sql/metadata.json | Adds skill metadata (version/org/date/abstract/references). |
| skills/chdb-sql/examples/examples.md | Adds runnable usage examples and expected outputs for SQL workflows. |
| skills/chdb-sql/SKILL.md | Defines activation guidance + quick-start for the SQL skill. |
| skills/chdb-sql/README.md | Maintainer/trigger-phrase overview for the SQL skill. |
| skills/chdb-datastore/scripts/verify_install.py | Adds an environment verification script for DataStore usage. |
| skills/chdb-datastore/references/connectors.md | Documents DataStore connectors across files/cloud/databases/lakes. |
| skills/chdb-datastore/references/api-reference.md | Documents DataStore API surface and pandas-like operations. |
| skills/chdb-datastore/metadata.json | Adds skill metadata (version/org/date/abstract/references). |
| skills/chdb-datastore/examples/examples.md | Adds runnable usage examples and expected outputs for DataStore workflows. |
| skills/chdb-datastore/SKILL.md | Defines activation guidance + quick-start for the DataStore skill. |
| skills/chdb-datastore/README.md | Maintainer/trigger-phrase overview for the DataStore skill. |
| README.md | Updates repo overview and skill list to include chdb skills. |
| AGENTS.md | Updates repository structure documentation to include both new skills. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def check_python_version(): | ||
| assert sys.version_info >= (3, 9), f"Python 3.9+ required, got {sys.version}" | ||
|
|
| sess = chs.Session() | ||
| sess.query("CREATE TABLE _verify_test (id UInt64) ENGINE = Memory") | ||
| sess.query("INSERT INTO _verify_test VALUES (1), (2), (3)") | ||
| result = sess.query("SELECT count() AS cnt FROM _verify_test") | ||
| data = result.data() | ||
| assert "3" in data, f"Expected '3' in output, got: {data!r}" | ||
| sess.close() |
| def check_python_version(): | ||
| assert sys.version_info >= (3, 9), f"Python 3.9+ required, got {sys.version}" | ||
|
|
| ``` | ||
| 1. One-off query on files or databases → chdb.query() | ||
| 2. Multi-step analysis with tables → Session | ||
| 3. DB-API 2.0 connection → chdb.connect() |
| ```python | ||
| chdb.query(sql, output_format="CSV", path="", udf_path="", params=None) | ||
| ``` | ||
|
|
||
| | Param | Type | Default | Description | | ||
| |-------|------|---------|-------------| | ||
| | `sql` | str | _(required)_ | ClickHouse SQL query | | ||
| | `output_format` | str | `"CSV"` | Output format (see [Output Formats](#output-formats)) | | ||
| | `path` | str | `""` | Database path (empty = in-memory, no state) | | ||
| | `udf_path` | str | `""` | Path for UDF scripts | | ||
| | `params` | dict | `None` | Named parameters (see [Parametrized Queries](#parametrized-queries)) | |
| # Carol,177 | ||
| # Bob,178 |
|
|
||
| ```python | ||
| from chdb import session as chs | ||
| sess = chs.Session("./analytics_db") # persistent; Session() for in-memory |
pjhampton
left a comment
There was a problem hiding this comment.
Generally, this is a good addition. You should know that we can't currently support it in our code product just yet - soon. Another thing we have in the CI of this project is to execute the code samples so we catch if the skills go stale. I recommend looking into that as a follow up PR.
Summary
Add chdb-datastore skill: Pandas-compatible DataStore API for chdb. Drop-in pandas replacement backed by ClickHouse (
import chdb.datastore as pd), supporting 16+ data sources (MySQL, PostgreSQL, S3, MongoDB, ClickHouse, Iceberg, Delta Lake, etc.) and 10+ file formats with cross-source joins.Add chdb-sql skill: In-process ClickHouse SQL API for Python. Covers
chdb.query(), Session, DB-API 2.0, parametrized queries, UDFs, streaming, and all ClickHouse table functions.Update root
README.mdandAGENTS.mdto include the new skills.Skill Structure
Each skill follows the agent-skills format with:
SKILL.mdmetadata.jsonREADME.mdreferences/*.mdexamples/examples.mdscripts/verify_install.pyWhy Two Skills?
The skills are split by usage pattern so agents load only what's relevant:
Both cross-reference each other in their SKILL.md so the agent knows when to switch.
Test Plan
SKILL.mdfrontmatter parses correctly (name, description, license, metadata fields)python scripts/verify_install.pyin both skill directories (requirespip install chdb)npx skills addand verify agent activation on trigger phrasesnpx skills add auxten/agent-skills --listcorrectly detected all 3 skills (chdb-datastore, chdb-sql, clickhouse-best-practices)npx skills add --allinstalled 3 skills to 42 agent directories via symlinks.claude/skills/,.windsurf/skills/etc. contain correct symlinksskills-lock.jsongenerated with correct source and hashes