Skip to content

nunoatgithub/postgres-is-not-enough

Repository files navigation

PostgreSQL vs ClickHouse for Reporting

This repository runs a repeatable workload against PostgreSQL and ClickHouse to compare how they behave under a mixed insert/update/query pattern that looks like a streaming analytics pipeline for analytics/reporting.

The focus is on:

  • Applying realistic techniques for insertion and update (batching)
  • Running a small set of analytical queries continuously
  • Emitting detailed Prometheus metrics so one can inspect behavior in Grafana

Workload overview

The test client (test-client/) does the following :

  • Generates synthetic reference data:

    • users, products, categories
  • Periodically generates and inserts events in batches (EVENTS_PER_BATCH per batch)

    • Same events are written to PostgreSQL and ClickHouse
  • Periodically runs realistic UPDATE operations on both databases:

    • User lifetime value updates
    • User segment changes
    • Product price updates
    • Event revenue corrections
  • Periodically Runs 7 analytical queries on both databases at a fixed interval

  • Exposes Prometheus metrics (insert/update/query timings, row counts, etc.)

  • It organizes the query operations separately from the writing ones and, within each, separated by database. There are 4 processes, one for each of these.

ClickHouse uses replicated CollapsingMergeTree tables and exposes *_current views for the latest state. PostgreSQL uses a normalized schema with a reduced index set on the events fact table, aggressive autovacuum tuning for streaming workloads, and a "streaming-friendly" durability configuration (WAL fsync on, but synchronous_commit=off).

Query characterization

This test-suite runs seven analytical queries repeatedly; each targets a different workload pattern :

  • Query 1 (Simple Aggregation): tests scan performance and lightweight group-by, low-cardinality grouping, and index usage on timestamp filters.
  • Query 2 (Join Aggregation): tests multi-table joins plus aggregation (join scaling and join order effects).
  • Query 3 (Complex Join): exercises multi-dimensional group-by and joins with mixed-cardinality dimensions.
  • Query 4 (Window Functions): stresses windowing, ordering, re-sorting and memory/workspace usage.
  • Query 5 (Cohort Analysis): tests joining derived cohort tables, time bucketing and moderate group-by cardinality.
  • Query 6 (Funnel Analysis): tests sessionization and conditional aggregation across a user-level derived table.
  • Query 7 (Multi-Window Cohort): a worst-case style query for PostgreSQL combining multiple CTEs that scan the events table, many window functions, percentiles, self-joins and correlated subqueries. For real-time reporting workloads, ClickHouse uses incremental materialized views (realistic production approach) while PostgreSQL must run the direct query (only viable option). This demonstrates the fundamental architectural difference for streaming OLTP → real-time reporting.

How to run – durable replicated ClickHouse setup

The experiment is intended to be run with the durable, replicated ClickHouse setup defined in docker-compose.yml.

This stack includes:

  • PostgreSQL (single node) configured for streaming ingestion
  • ZooKeeper
  • Two ClickHouse replicas using ReplicatedCollapsingMergeTree
  • Prometheus
  • Grafana
  • The Python test client
# From the repo root
cd postgres-is-not-enough

# Start the stack
docker compose up -d

# Check containers
docker compose ps

Services (default ports):

The test client connects to PostgreSQL and clickhouse-replica1 and runs the workload continuously:

  • Generate reference data (if the DBs are empty)
  • Periodically run inserts and updates (in batches) and queries, and push metrics to Prometheus

Workload parameters for this setup are controlled via environment variables in docker-compose.yml.

Observing the experiment

All observability goes through Prometheus and Grafana.

  • Prometheus scrape config is in prometheus/prometheus.yml.
  • Grafana dashboards are provisioned from grafana/dashboards/.

Useful panels include:

  • Insert latency histograms for PostgreSQL vs ClickHouse
  • Update latency histograms per operation
  • Query latency per query name and database
  • Table row counts for both databases over time

The underlying SQL for queries and updates lives in:

  • test-client/queries.py – 7 analytical queries (Postgres + ClickHouse variants)
  • test-client/postgres_client.py – Postgres inserts/updates
  • test-client/clickhouse_client.py – ClickHouse inserts/updates (CollapsingMergeTree)

Results

Based on metrics collected from processing 14.3M-15.2M events over a sustained benchmark run:

Note: PostgreSQL is configured with aggressive autovacuum (vacuum at 1% table change, analyze at 0.5% change) to prevent table bloat and statistics decay. Both databases use configurations that should be appropriate for streaming analytics workloads.

Query Performance

Query PostgreSQL ClickHouse Speedup
Simple Aggregation 6.00s 0.11s 53x
Join Aggregation 2.45s 0.16s 15x
Complex Join 6.64s 0.49s 14x
Window Function 1.38s 0.17s 8x
Cohort Analysis 3.96s 1.14s 3.5x
Funnel Analysis 1.00s 0.14s 7x
Multi-Window Cohort (MV) 46.37s 0.09s 532x

Query 7 demonstrates the difference between PostgreSQL's direct query approach (which degrades with data volume) and ClickHouse's incremental materialized views (which maintain constant query time regardless of total data volume). At 14M+ events, PostgreSQL's 46-second query time makes real-time dashboards infeasible, while ClickHouse's 87ms response time remains dashboard-ready.

Insert Performance

Average batch insert latency (50K events per batch): 6.97s for PostgreSQL versus 1.04s for ClickHouse (6.7x faster). ClickHouse's columnar storage and optimized bulk insert path significantly outperform PostgreSQL's row-based approach at scale.

Update Performance

Operation PostgreSQL Avg ClickHouse Avg Notes
User Lifetime Value (1.43M updates) 1.24s 0.15s CH 8.4x faster
User Segments (713K updates) 0.45s 0.09s CH 5.0x faster
Product Prices (28.5K updates) 0.02s 0.03s Similar performance
Event Revenue (28.2K updates) 0.02s 2.70s PG 135x faster

Event revenue updates are slower in ClickHouse because the update flow reads current state to emit compensating sign=-1/+1 rows, which forces on-the-fly collapsing. PostgreSQL uses MVCC point updates, so updating historical fact rows is cheaper in latency (at the cost of WAL/index work and eventual VACUUM).

Overall Time Distribution

PostgreSQL spent approximately 2,522 CPU seconds total across all operations over the benchmark duration. ClickHouse spent approximately 2,878 CPU seconds total, indicating more aggressive parallel processing and vectorization.

The key difference is in wall-clock time for queries: PostgreSQL's Query 7 alone takes 46 seconds per execution, making it unsuitable for real-time dashboards, while ClickHouse maintains sub-100ms query times across the entire workload through incremental materialized views.

ClickHouse's architectural advantage becomes more pronounced as data volume grows: query performance remains nearly constant while PostgreSQL degrades linearly with data size.

About

how far can postgreSQL go for reporting ?

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published