[Proposal] Specializing Agent Skills for ClickHouse-Native Data Modeling (Time-Series analytics & Materialized Views)

Hi @pjhampton ,

Following the feedback on PR #9, I wanted to transition the discussion to the architectural "why" behind those skills. My goal is to bridge the gap where general-purpose LLMs fail to utilize ClickHouse’s specific strengths, particularly in Time-Series Analytics.

## The Problem: The "Generic DDL" Trap

Standard LLMs default to standard SQL patterns. For high-volume time-series data, this leads to:

Query-Time Latency: Agents suggest simple SELECT queries on raw data rather than shifting compute to insert-time.

Storage Inefficiency: Missing opportunities for column-level codecs that can reduce disk footprint by over 80%.

## Proposed Architectural Skills (Non-Executable Rules):
These can be implemented as purely descriptive rules or prompts, requiring no code execution in your AI cloud:

1. data-modeling-materialized-views
Guides the agent to recognize heavy aggregation workloads (e.g., IoT sensor heartbeats) and enforce the Target Table + MV pattern.

Example Heuristic: If the workload requires COUNT/SUM on time-intervals, the agent must recommend SummingMergeTree or AggregatingMergeTree.

2. schema-compression-codecs
Codifies specific heuristics for high-volume columns that go beyond default LZ4.

Timestamp Columns: Enforce DoubleDelta for monotonic sequences.

Metric/Sensor Data: Enforce Gorilla for floating-point values.

Low-Cardinality Strings: Enforce LowCardinality(String) with ZSTD.

## Why this matters:
In our testing with IoT-scale time-series data, applying these specific codecs via the agent reduced storage requirements by ~75% compared to generic AI-generated DDL.

I’d love to discuss how we can codify these as "High-Performance Rules" within the new contributing guide you are developing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] Specializing Agent Skills for ClickHouse-Native Data Modeling (Time-Series analytics & Materialized Views) #13

The Problem: The "Generic DDL" Trap

Proposed Architectural Skills (Non-Executable Rules):

Why this matters:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Proposal] Specializing Agent Skills for ClickHouse-Native Data Modeling (Time-Series analytics & Materialized Views) #13

Description

The Problem: The "Generic DDL" Trap

Proposed Architectural Skills (Non-Executable Rules):

Why this matters:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions