Skip to content

TPC-C benchmark CPU and memory optimizations to lower hardware requirements and better logic for initial data upload #145

@eivanov89

Description

@eivanov89

Hi,

According to your documentation, "for 10k warehouses, you would need ten clients of type c5.4xlarge to drive the benchmark. For multiple clients, you need to perform three steps". In total, it is 160 CPU cores and 320 GiB RAM, i.e.

  • 1 CPU core per 62.5 warehouses
  • 32.768 MiB RAM per warehouse

At YDB we followed your path and also had forked and adapted TPC-C from the Benchbase. Thus, we finished with very similar high harware requirements for TPC-C clients. Fortunately, we found very simple yet effective optimizations (and because you have same codebase, you can easily employ them too). In this post we discuss our TPC-C implementations and later describe some pitfalls, which again can be easily fixed. Here is a sum-up of changes:

  1. Switch to Java 21 and use virtual threads instead of physical threads. Now, you have to spawn 1 thread per terminal. For example, for 10K warehouses, you will have to run 100K threads. Here is the commit.
  2. Don't aggregate full information about each transaction (LatencyRecord class). Just aggregate OKs/Fails count and use a histogram for execution time. Here is the commit.

After these changes, you will have the following requirements:

  • 1 CPU core per 1000 warehouses
  • 6 MiB RAM per warehouse

Now, to run 10K warehouses you need 10 cores and ~60 GiB of RAM. It gives 2 c5.4xlarge instead of 10 (a significant change in cost) or even 1 memory optimized instance.

Another issue that affects the price of measurements is loading time. You specify ~5.5 hours for 10K warehouses and 30 cluster nodes of type c5d.4xlarge. If you have the original code there, then probably you use YSQL to upload the data. If you try YCQL instead, you can probably cut the time in half. Initially, we needed 2.7 hours to load 15K warehouses, but we were able to change TPC-C code to do it in 1.6 hours. We simply use bulk upserts, which are blind writes instead of inserts, which were by default.

We're very interested to try YugabyteDB with high number of warehouses (e.g. 40K and 80K), these optimisations will help a lot to cut the spendings.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions