Warning
This software is not intended for production use and is intended for our own private research purposes. Although others may use the code, please be aware that it is prone to change without notice.
This repository is part of our Master’s thesis, focused on benchmarking modifications made to MySQL. Our fork of MySQL, which implements the Optimistic Order-Preserving Hash Join (OOHJ) feature, is available at: https://github.com/johansolbakken/mysql-server/tree/oohj/oohj-iterator.
Link to Master Thesis will be added later.
brew install ruby gem install pq_query gem install terminal-table brew install hyperfine
If MySQL has lost JOB data then do the following.
First, make sure MySQL is running. (Tips use mtr --start-dirty or make run-dirty to avoid resetting MySQL).
make job-dataset # Download job dataset make job-order-queries # Convert job queries to ORDER BY # Ensure MySQL is running make job-setup # Create databases make inline-local # Allow inserting data from CSV make job-feed # Feed data to database make prepare # Enable hypergraph optimizer and disable stats_auto_recalc and set size-values for join-buffer and innodb-pool make job-analyze # Analyze job tables
- Start the MySQL server
- Fill out config.yaml
make testThis will compare .sql files in homemade_dataset folder against .expected files in homemade_dataset folder. It will also do other types tests defined in tests.yaml file.
These tests are passing under our current assumptions. As we improve optimistic order-preserving hash join these tests will fail and we need to take new snapshots of the new expected state.
make job-dataset- Download and build the MySQL source code.
- Start MySQL using MySQL Test Run (MTR), MySQL’s testing script:
First-time setup:
cd build # Navigate to the MySQL build folder
./mysql-test/mtr --startFor subsequent runs, use:
./mysql-test/mtr --start-dirty- Configure the path to the MySQL binary in `config.yaml` (e.g., `build/bin/mysql`).
Copy the `config.yaml.example` file to `config.yaml` and update it with the path to your MySQL binary.
It is important that path points to a mysql client executable.
Then, initialize the database by creating tables and indexes:
make job-setupFeed the downloaded dataset into the database:
make job-feedThis command sets the environment to ensure the environment is the same for every test.
ruby bin/benchmark.rb --prepare-mysqlTo run SQL queries, use the following commands:
- Execute a query:
ruby bin/benchmark.rb --run ./job/1a.sql- Execute a query with EXPLAIN ANALYZE to analyze execution:
ruby bin/benchmark.rb --run ./job/1a.sql --analyze- Execute a query with EXPLAIN FORMAT=TREE to analyze plan:
ruby bin/benchmark.rb --run ./job/1a.sql --treeThe YAML configuration is structured under a top-level tests key that divides tests into two categories: diff_test and contain_test. Each category may include a global setup section to prepare the environment before running tests, followed by a list of test cases under the tests key. In diff_test, each test is defined with a name, an SQL file specified by the sql key, and an expected file for output comparison; tests can also have individual setup commands. In contain_test, tests may include individual setup commands and verify outputs by checking for specific substrings using a contains list. To add a new test, choose the appropriate category based on whether you want a full output comparison or substring validation. Then, include any necessary setup commands and define the test with a unique name, the path to the SQL file, and either an expected file (for diff_test) or a contains list (for contain_test). Note that tests run sequentially, so the environment setup for one test may affect subsequent tests.
tests:
diff_test:
setup:
- "ruby ./bin/generate_sort_hashjoin_dataset.rb 10000 10000"
- "ruby ./bin/benchmark.rb --prepare-mysql"
tests:
- name: "Basic test"
sql: "./homemade_dataset/homemade.sql"
expected: "./homemade_dataset/homemade.expected"
- name: "Disable optimistic hash join"
sql: "./homemade_dataset/homemade_disabled.sql"
expected: "./homemade_dataset/homemade_disabled.expected"
contain_test:
# Global setup is optional here.
tests:
- name: "went_on_disk=false, n=100 m=100"
setup:
- "ruby ./bin/generate_sort_hashjoin_dataset.rb 100 100"
- "ruby ./bin/benchmark.rb --prepare-mysql"
sql: "./homemade_dataset/went_on_disk.sql"
contains:
- "(optimistic hash join!)"
- "(went_on_disk=false)"The debug/logger.h is a class that can be used to fast log to a file.
Usage:
#include "/absolute_path_to_benchmark/debug/logger.h"
static Logger* s_logger = nullptr;
ClassToTest::ClassToTest() {
s_logger = new Logger("~/path_to_output/log.txt");
}
void ClassToTest::functionToTest() {
// Lets write CSV information to the logger.
auto& logger = *s_logger;
while (someCondition) {
logger << logger.timestamp() << "," this->getSomeValue() << ",";
logger << this->getState() << "\n";
}
}This class will delete the log-file on construction.
There is a timestamp() function for getting timestamps easily.
Currently using streams.
podman run --rm -it \
-v $(pwd):/src \
-w /src \
ubuntu:22.04 \
bash
# now in podman ubuntu
sudo apt update && sudo apt install -y gcc make ruby bison flex
ruby bin/build-tpc-h.rbThis will generate folders:
tpc-h-queriestpc-h-ddltpc-h-dataset
# copy the Makefile.suite and add -fcommon to CFLAGS
CFLAGS = $(BASE_CFLAGS) -D$(OS) $($(OS)_CFLAGS) -fcommon
# Start podman
podman run --rm -it \
-v $(pwd):/src \
-w /src \
ubuntu:22.04 \
bash
# now in podman ubuntu
sudo apt update && sudo apt install -y gcc make ruby bison flex
ruby bin/build-tpc-ds.rbRequires MySQL to be running.
make job-setupTo wipe database and recreate:
ruby bin/job-setup.rb --forceCreates job-dataset folder.
make job-datasetThe job-dataset folder contains all the data as csv files.
Do this before feeding.
Feed data in job-dataset to MySQL database imdbload.
make job-feedThe job files we were provided is altered such that each column is in a MIN aggregate.
We therefore have created scripts for removing MIN and additionally adding ORDER BY clauses.
To generate the queries without MIN or ORDER BY:
make job-queriesTo make ordered queries:
make job-order-queriesmake run-file DATABASE=imdbload FILE=./job-queries/10a.sqlmake job-cleanEssentially means loading all data into memory.
make job-warmupTo analyze run the script:
ruby bin/analyze.rb --jobruby ./bin/test-sql-files.rb --folder ./job-queries --database imdbloadmake run-file DATABASE=imdbload FILE=./job-queries/10b.sqlRequires MySQL to be running.
make homemade-setupTo wipe database and recreate:
ruby bin/homemade-setup.rb --forceFor instance with table a size and table b size set to 10000. Default for both is 10000.
make homemade-dataset TABLE_A_SIZE=10000 TABLE_B_SIZE=10000make homemade-feedmake run-file DATABASE=homemade FILE=./sql/analyze_homemade.sql`bin/count-ohj.rb`, counts occurrences of “optimistic hash join” in SQL execution plans. It works by:
ruby bin/count-ohj.rb [--join-buffer-size SIZE]Example (setting join buffer size to 16MB):
ruby bin/count-ohj.rb --join-buffer-size 16777216Generates a graphical representation of a query plan from an input SQL file.
Run with an SQL file:
ruby bin/benchmark.rb ./job/1a.sqlDisplay the JSON output:
ruby bin/benchmark.rb --show-json ./job/1a.sqlSpecify a custom output PNG file:
ruby bin/benchmark.rb -o custom_plan.png ./job/1a.sqlKeep the DOT file:
ruby bin/benchmark.rb -c ./job/1a.sqlWith hints
ruby bin/print-plan-as-graphwiz.rb ./job-order-queries/1a.sql -o./1a.png --hint "/*+ SET_OPTIMISM_FUNC(SIGMOID) */"make job-index-enable
make job-index-disable- There must exist an folder in root of this repo called
tpc-h-toolwithdbgenand others.
# Ensure you have data
ls tpc-h-tool/dbgen/*
# Generate tpc-h data
cargo run -p tpc-h-datagen
make experimental-setup # activate hypergraph optimizer
make tpc-h-setup # will setup both s1 and s10
make tpc-h-feed # will feed both
make tpc-h-analyze # this will analyze both s1 and s10
ruby bin/tpc-h-warmup.rb s1
rbuy bin/tpc-h-warmup.rb s10