-
Notifications
You must be signed in to change notification settings - Fork 21
Description
Hi,
According to your documentation, "for 10k warehouses, you would need ten clients of type c5.4xlarge to drive the benchmark. For multiple clients, you need to perform three steps". In total, it is 160 CPU cores and 320 GiB RAM, i.e.
- 1 CPU core per 62.5 warehouses
- 32.768 MiB RAM per warehouse
At YDB we followed your path and also had forked and adapted TPC-C from the Benchbase. Thus, we finished with very similar high harware requirements for TPC-C clients. Fortunately, we found very simple yet effective optimizations (and because you have same codebase, you can easily employ them too). In this post we discuss our TPC-C implementations and later describe some pitfalls, which again can be easily fixed. Here is a sum-up of changes:
- Switch to Java 21 and use virtual threads instead of physical threads. Now, you have to spawn 1 thread per terminal. For example, for 10K warehouses, you will have to run 100K threads. Here is the commit.
- Don't aggregate full information about each transaction (LatencyRecord class). Just aggregate OKs/Fails count and use a histogram for execution time. Here is the commit.
After these changes, you will have the following requirements:
- 1 CPU core per 1000 warehouses
- 6 MiB RAM per warehouse
Now, to run 10K warehouses you need 10 cores and ~60 GiB of RAM. It gives 2 c5.4xlarge instead of 10 (a significant change in cost) or even 1 memory optimized instance.
Another issue that affects the price of measurements is loading time. You specify ~5.5 hours for 10K warehouses and 30 cluster nodes of type c5d.4xlarge. If you have the original code there, then probably you use YSQL to upload the data. If you try YCQL instead, you can probably cut the time in half. Initially, we needed 2.7 hours to load 15K warehouses, but we were able to change TPC-C code to do it in 1.6 hours. We simply use bulk upserts, which are blind writes instead of inserts, which were by default.
We're very interested to try YugabyteDB with high number of warehouses (e.g. 40K and 80K), these optimisations will help a lot to cut the spendings.