Skip to content

[Proposal] Specializing Agent Skills for ClickHouse-Native Data Modeling (Time-Series analytics & Materialized Views) #13

@shuvajyotikar13

Description

@shuvajyotikar13

Hi @pjhampton ,

Following the feedback on PR #9, I wanted to transition the discussion to the architectural "why" behind those skills. My goal is to bridge the gap where general-purpose LLMs fail to utilize ClickHouse’s specific strengths, particularly in Time-Series Analytics.

The Problem: The "Generic DDL" Trap

Standard LLMs default to standard SQL patterns. For high-volume time-series data, this leads to:

Query-Time Latency: Agents suggest simple SELECT queries on raw data rather than shifting compute to insert-time.

Storage Inefficiency: Missing opportunities for column-level codecs that can reduce disk footprint by over 80%.

Proposed Architectural Skills (Non-Executable Rules):

These can be implemented as purely descriptive rules or prompts, requiring no code execution in your AI cloud:

  1. data-modeling-materialized-views
    Guides the agent to recognize heavy aggregation workloads (e.g., IoT sensor heartbeats) and enforce the Target Table + MV pattern.

Example Heuristic: If the workload requires COUNT/SUM on time-intervals, the agent must recommend SummingMergeTree or AggregatingMergeTree.

  1. schema-compression-codecs
    Codifies specific heuristics for high-volume columns that go beyond default LZ4.

Timestamp Columns: Enforce DoubleDelta for monotonic sequences.

Metric/Sensor Data: Enforce Gorilla for floating-point values.

Low-Cardinality Strings: Enforce LowCardinality(String) with ZSTD.

Why this matters:

In our testing with IoT-scale time-series data, applying these specific codecs via the agent reduced storage requirements by ~75% compared to generic AI-generated DDL.

I’d love to discuss how we can codify these as "High-Performance Rules" within the new contributing guide you are developing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions