Skip to content

Add chdb skills: DataStore (pandas API) and SQL#14

Open
auxten wants to merge 2 commits intoClickHouse:mainfrom
auxten:add-chdb-skills
Open

Add chdb skills: DataStore (pandas API) and SQL#14
auxten wants to merge 2 commits intoClickHouse:mainfrom
auxten:add-chdb-skills

Conversation

@auxten
Copy link
Member

@auxten auxten commented Mar 17, 2026

Summary

  • Add chdb-datastore skill: Pandas-compatible DataStore API for chdb. Drop-in pandas replacement backed by ClickHouse (import chdb.datastore as pd), supporting 16+ data sources (MySQL, PostgreSQL, S3, MongoDB, ClickHouse, Iceberg, Delta Lake, etc.) and 10+ file formats with cross-source joins.

  • Add chdb-sql skill: In-process ClickHouse SQL API for Python. Covers chdb.query(), Session, DB-API 2.0, parametrized queries, UDFs, streaming, and all ClickHouse table functions.

  • Update root README.md and AGENTS.md to include the new skills.

Skill Structure

Each skill follows the agent-skills format with:

File Purpose
SKILL.md Skill definition with YAML frontmatter and quick-start
metadata.json Version, organization, abstract
README.md Maintainer guide with trigger phrases
references/*.md API references and function docs
examples/examples.md Runnable examples with expected output
scripts/verify_install.py Environment verification script

Why Two Skills?

The skills are split by usage pattern so agents load only what's relevant:

  • chdb-datastore activates for pandas-style data analysis (filter, groupby, join, sort — same API as pandas)
  • chdb-sql activates for raw SQL queries (ClickHouse table functions, window functions, sessions, UDFs)

Both cross-reference each other in their SKILL.md so the agent knows when to switch.

Test Plan

  • Verify SKILL.md frontmatter parses correctly (name, description, license, metadata fields)
    • chdb-datastore: name=chdb-datastore, license=Apache-2.0, author=chdb-io, version=4.1
    • chdb-sql: name=chdb-sql, license=Apache-2.0, author=chdb-io, version=4.1
  • Verify all internal links in SKILL.md resolve to their reference files
    • chdb-datastore: 7 links (references/connectors.md, references/api-reference.md, examples/examples.md, scripts/verify_install.py) — all resolved
    • chdb-sql: 7 links (references/table-functions.md, references/sql-functions.md, references/api-reference.md, examples/examples.md) — all resolved
  • Run python scripts/verify_install.py in both skill directories (requires pip install chdb)
    • chdb-datastore: 8/8 checks passed (import chdb, from datastore import DataStore, from chdb.datastore import DataStore, import chdb.datastore as pd, filter, sort, groupby)
    • chdb-sql: 6/6 checks passed (import chdb, basic query, DataFrame output, Session, parametrized query)
  • Install skills via npx skills add and verify agent activation on trigger phrases
    • npx skills add auxten/agent-skills --list correctly detected all 3 skills (chdb-datastore, chdb-sql, clickhouse-best-practices)
    • npx skills add --all installed 3 skills to 42 agent directories via symlinks
    • Verified .claude/skills/, .windsurf/skills/ etc. contain correct symlinks
    • skills-lock.json generated with correct source and hashes

@auxten auxten requested review from a team and doneyli as code owners March 17, 2026 06:37
Add two new skills for chdb, the in-process ClickHouse engine for Python:

- chdb-datastore: Pandas-compatible DataStore API. Drop-in pandas
  replacement with ClickHouse performance, supporting 16+ data sources
  and cross-source joins.

- chdb-sql: Raw ClickHouse SQL API. Covers chdb.query(), Session,
  DB-API 2.0, parametrized queries, UDFs, streaming, and all
  ClickHouse table functions.

Each skill includes SKILL.md, API references, runnable examples,
metadata.json, README.md, and a verify_install.py script.
Copy link

@evellasques evellasques left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice addition!

I just added two minor comments (feel free to ignore).

- Fix .where() description: it follows pandas semantics (masks
  non-matching values with NaN) rather than being an alias for .filter()
- Add .where() usage example showing correct behavior
- Add sorted values assertion to verify_install.py check_sort()
@auxten
Copy link
Member Author

auxten commented Mar 17, 2026

Really nice addition!

I just added two minor comments (feel free to ignore).

Great suggestions, fixed

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds two new agent skills—chdb-datastore (pandas-compatible DataStore API) and chdb-sql (in-process ClickHouse SQL for Python)—and updates repository docs to advertise and describe these skills alongside existing ClickHouse best-practices guidance.

Changes:

  • Introduces skills/chdb-datastore/ with SKILL definition, API/connectors references, runnable examples, and an install verification script.
  • Introduces skills/chdb-sql/ with SKILL definition, SQL/table-function/API references, runnable examples, and an install verification script.
  • Updates root documentation (README.md, AGENTS.md) to include both new skills in the repo overview/structure.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
skills/chdb-sql/scripts/verify_install.py Adds an environment verification script for chdb SQL usage.
skills/chdb-sql/references/table-functions.md Documents ClickHouse table functions relevant to chdb SQL usage.
skills/chdb-sql/references/sql-functions.md Provides a quick reference for common ClickHouse SQL functions.
skills/chdb-sql/references/api-reference.md Documents chdb SQL API surface (query/session/dbapi/etc.).
skills/chdb-sql/metadata.json Adds skill metadata (version/org/date/abstract/references).
skills/chdb-sql/examples/examples.md Adds runnable usage examples and expected outputs for SQL workflows.
skills/chdb-sql/SKILL.md Defines activation guidance + quick-start for the SQL skill.
skills/chdb-sql/README.md Maintainer/trigger-phrase overview for the SQL skill.
skills/chdb-datastore/scripts/verify_install.py Adds an environment verification script for DataStore usage.
skills/chdb-datastore/references/connectors.md Documents DataStore connectors across files/cloud/databases/lakes.
skills/chdb-datastore/references/api-reference.md Documents DataStore API surface and pandas-like operations.
skills/chdb-datastore/metadata.json Adds skill metadata (version/org/date/abstract/references).
skills/chdb-datastore/examples/examples.md Adds runnable usage examples and expected outputs for DataStore workflows.
skills/chdb-datastore/SKILL.md Defines activation guidance + quick-start for the DataStore skill.
skills/chdb-datastore/README.md Maintainer/trigger-phrase overview for the DataStore skill.
README.md Updates repo overview and skill list to include chdb skills.
AGENTS.md Updates repository structure documentation to include both new skills.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +21 to +23
def check_python_version():
assert sys.version_info >= (3, 9), f"Python 3.9+ required, got {sys.version}"

Comment on lines +47 to +53
sess = chs.Session()
sess.query("CREATE TABLE _verify_test (id UInt64) ENGINE = Memory")
sess.query("INSERT INTO _verify_test VALUES (1), (2), (3)")
result = sess.query("SELECT count() AS cnt FROM _verify_test")
data = result.data()
assert "3" in data, f"Expected '3' in output, got: {data!r}"
sess.close()
Comment on lines +21 to +23
def check_python_version():
assert sys.version_info >= (3, 9), f"Python 3.9+ required, got {sys.version}"

```
1. One-off query on files or databases → chdb.query()
2. Multi-step analysis with tables → Session
3. DB-API 2.0 connection → chdb.connect()
Comment on lines +21 to +31
```python
chdb.query(sql, output_format="CSV", path="", udf_path="", params=None)
```

| Param | Type | Default | Description |
|-------|------|---------|-------------|
| `sql` | str | _(required)_ | ClickHouse SQL query |
| `output_format` | str | `"CSV"` | Output format (see [Output Formats](#output-formats)) |
| `path` | str | `""` | Database path (empty = in-memory, no state) |
| `udf_path` | str | `""` | Path for UDF scripts |
| `params` | dict | `None` | Named parameters (see [Parametrized Queries](#parametrized-queries)) |
Comment on lines +161 to +162
# Carol,177
# Bob,178

```python
from chdb import session as chs
sess = chs.Session("./analytics_db") # persistent; Session() for in-memory
Copy link
Member

@pjhampton pjhampton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, this is a good addition. You should know that we can't currently support it in our code product just yet - soon. Another thing we have in the CI of this project is to execute the code samples so we catch if the skills go stale. I recommend looking into that as a follow up PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants