Benchmarking for Optimistic Order-Preserving Hash Join (OOHJ)

Warning

This software is not intended for production use and is intended for our own private research purposes. Although others may use the code, please be aware that it is prone to change without notice.

This repository is part of our Master’s thesis, focused on benchmarking modifications made to MySQL. Our fork of MySQL, which implements the Optimistic Order-Preserving Hash Join (OOHJ) feature, is available at: https://github.com/johansolbakken/mysql-server/tree/oohj/oohj-iterator.

Link to Master Thesis will be added later.

Requirements

brew install ruby
gem install pq_query
gem install terminal-table
brew install hyperfine

JOB from a fresh start

If MySQL has lost JOB data then do the following.

First, make sure MySQL is running. (Tips use mtr --start-dirty or make run-dirty to avoid resetting MySQL).

make job-dataset            # Download job dataset
make job-order-queries      # Convert job queries to ORDER BY

# Ensure MySQL is running
make job-setup              # Create databases
make inline-local           # Allow inserting data from CSV
make job-feed               # Feed data to database
make prepare                # Enable hypergraph optimizer and disable stats_auto_recalc and set size-values for join-buffer and innodb-pool
make job-analyze            # Analyze job tables

Testing the Optimistic Hash Join

Start the MySQL server
Fill out config.yaml

make test

This will compare .sql files in homemade_dataset folder against .expected files in homemade_dataset folder. It will also do other types tests defined in tests.yaml file.

These tests are passing under our current assumptions. As we improve optimistic order-preserving hash join these tests will fail and we need to take new snapshots of the new expected state.

Commands

Download JOB dataset

make job-dataset

Start MySQL

Download and build the MySQL source code.
Start MySQL using MySQL Test Run (MTR), MySQL’s testing script:

First-time setup:

cd build # Navigate to the MySQL build folder
./mysql-test/mtr --start

For subsequent runs, use:

./mysql-test/mtr --start-dirty

Configure the path to the MySQL binary in `config.yaml` (e.g., `build/bin/mysql`).

config.yaml

Copy the `config.yaml.example` file to `config.yaml` and update it with the path to your MySQL binary.

It is important that path points to a mysql client executable.

Setup the Database

Then, initialize the database by creating tables and indexes:

make job-setup

Load the Data

Feed the downloaded dataset into the database:

make job-feed

Prepare MySQL environment

This command sets the environment to ensure the environment is the same for every test.

ruby bin/benchmark.rb --prepare-mysql

Run SQL Queries

To run SQL queries, use the following commands:

Execute a query:

ruby bin/benchmark.rb --run ./job/1a.sql

Execute a query with EXPLAIN ANALYZE to analyze execution:

ruby bin/benchmark.rb --run ./job/1a.sql --analyze

Execute a query with EXPLAIN FORMAT=TREE to analyze plan:

ruby bin/benchmark.rb --run ./job/1a.sql --tree

tests.yaml

The YAML configuration is structured under a top-level tests key that divides tests into two categories: diff_test and contain_test. Each category may include a global setup section to prepare the environment before running tests, followed by a list of test cases under the tests key. In diff_test, each test is defined with a name, an SQL file specified by the sql key, and an expected file for output comparison; tests can also have individual setup commands. In contain_test, tests may include individual setup commands and verify outputs by checking for specific substrings using a contains list. To add a new test, choose the appropriate category based on whether you want a full output comparison or substring validation. Then, include any necessary setup commands and define the test with a unique name, the path to the SQL file, and either an expected file (for diff_test) or a contains list (for contain_test). Note that tests run sequentially, so the environment setup for one test may affect subsequent tests.

tests:
  diff_test:
    setup:
      - "ruby ./bin/generate_sort_hashjoin_dataset.rb 10000 10000"
      - "ruby ./bin/benchmark.rb --prepare-mysql"
    tests:
      - name: "Basic test"
        sql: "./homemade_dataset/homemade.sql"
        expected: "./homemade_dataset/homemade.expected"
      - name: "Disable optimistic hash join"
        sql: "./homemade_dataset/homemade_disabled.sql"
        expected: "./homemade_dataset/homemade_disabled.expected"

  contain_test:
    # Global setup is optional here.
    tests:
      - name: "went_on_disk=false, n=100 m=100"
        setup:
          - "ruby ./bin/generate_sort_hashjoin_dataset.rb 100 100"
          - "ruby ./bin/benchmark.rb --prepare-mysql"
        sql: "./homemade_dataset/went_on_disk.sql"
        contains:
          - "(optimistic hash join!)"
          - "(went_on_disk=false)"

C++ Debugging Tools

Header-only Logging File

The debug/logger.h is a class that can be used to fast log to a file.

Usage:

#include "/absolute_path_to_benchmark/debug/logger.h"

static Logger* s_logger = nullptr;

ClassToTest::ClassToTest() {
    s_logger = new Logger("~/path_to_output/log.txt");
}

void ClassToTest::functionToTest() {
    // Lets write CSV information to the logger.
    auto& logger = *s_logger;

    while (someCondition) {
        logger << logger.timestamp() << "," this->getSomeValue() << ",";
        logger << this->getState() << "\n";
    }
}

This class will delete the log-file on construction.

There is a timestamp() function for getting timestamps easily.

Currently using streams.

Generate TPC-H for MacOS

podman run --rm -it \
  -v $(pwd):/src \
  -w /src \
  ubuntu:22.04 \
  bash
# now in podman ubuntu
sudo apt update && sudo apt install -y gcc make ruby bison flex
ruby bin/build-tpc-h.rb

This will generate folders:

tpc-h-queries
tpc-h-ddl
tpc-h-dataset

Generate TPC-DS for MacOS

# copy the Makefile.suite and add -fcommon to CFLAGS
CFLAGS = $(BASE_CFLAGS) -D$(OS) $($(OS)_CFLAGS) -fcommon

# Start podman
podman run --rm -it \
  -v $(pwd):/src \
  -w /src \
  ubuntu:22.04 \
  bash

# now in podman ubuntu
sudo apt update && sudo apt install -y gcc make ruby bison flex
ruby bin/build-tpc-ds.rb

Join Order Benchmark Commands

Setup database and indexes in MySQL

Requires MySQL to be running.

make job-setup

To wipe database and recreate:

ruby bin/job-setup.rb --force

Download job dataset

Creates job-dataset folder.

make job-dataset

The job-dataset folder contains all the data as csv files.

Do this before feeding.

Feed job data

Feed data in job-dataset to MySQL database imdbload.

make job-feed

Convert queries: remove MIN(…)

The job files we were provided is altered such that each column is in a MIN aggregate.

We therefore have created scripts for removing MIN and additionally adding ORDER BY clauses.

To generate the queries without MIN or ORDER BY:

make job-queries

To make ordered queries:

make job-order-queries

Run JOB queries

make run-file DATABASE=imdbload FILE=./job-queries/10a.sql

Delete JOB artifacts

make job-clean

Warmup MySQL

Essentially means loading all data into memory.

make job-warmup

Analyze

To analyze run the script:

ruby bin/analyze.rb --job

Check if any query fails for a database

ruby ./bin/test-sql-files.rb --folder ./job-queries --database imdbload

Run any file

make run-file DATABASE=imdbload FILE=./job-queries/10b.sql

Homemade Dataset

Setup database and indexes in MySQL

Requires MySQL to be running.

make homemade-setup

To wipe database and recreate:

ruby bin/homemade-setup.rb --force

Generate dataset

For instance with table a size and table b size set to 10000. Default for both is 10000.

make homemade-dataset TABLE_A_SIZE=10000 TABLE_B_SIZE=10000

Feed homemade data

make homemade-feed

Analyze tables

make run-file DATABASE=homemade FILE=./sql/analyze_homemade.sql

Count Optimistic Hash Join

`bin/count-ohj.rb`, counts occurrences of “optimistic hash join” in SQL execution plans. It works by:

ruby bin/count-ohj.rb [--join-buffer-size SIZE]

Example (setting join buffer size to 16MB):

ruby bin/count-ohj.rb --join-buffer-size 16777216

Export query plan as DOT

Generates a graphical representation of a query plan from an input SQL file.

Run with an SQL file:

ruby bin/benchmark.rb ./job/1a.sql

Display the JSON output:

ruby bin/benchmark.rb --show-json ./job/1a.sql

Specify a custom output PNG file:

ruby bin/benchmark.rb -o custom_plan.png ./job/1a.sql

Keep the DOT file:

ruby bin/benchmark.rb -c ./job/1a.sql

With hints

ruby bin/print-plan-as-graphwiz.rb ./job-order-queries/1a.sql -o./1a.png --hint "/*+ SET_OPTIMISM_FUNC(SIGMOID) */"

Enable and disable indexes for JOB

make job-index-enable
make job-index-disable

TPC-H setup

There must exist an folder in root of this repo called tpc-h-tool with dbgen and others.

# Ensure you have data
ls tpc-h-tool/dbgen/*

# Generate tpc-h data
cargo run -p tpc-h-datagen

make experimental-setup      # activate hypergraph optimizer
make tpc-h-setup             # will setup both s1 and s10
make tpc-h-feed              # will feed both
make tpc-h-analyze           # this will analyze both s1 and s10
ruby bin/tpc-h-warmup.rb s1
rbuy bin/tpc-h-warmup.rb s10

Name		Name	Last commit message	Last commit date
Latest commit History 328 Commits
bin		bin
debug		debug
explain-job		explain-job
explain-ssb		explain-ssb
explain-stq		explain-stq
explain-tpc-ds		explain-tpc-ds
homemade-schema		homemade-schema
homemade-test-queries		homemade-test-queries
job-plans		job-plans
job		job
lib		lib
oohj-bench-queries		oohj-bench-queries
results		results
sql		sql
ssb-plans		ssb-plans
ssb-queries		ssb-queries
ssb-schema		ssb-schema
stq-plans		stq-plans
stress-test-queries		stress-test-queries
test		test
tools		tools
tpc-ds-plans		tpc-ds-plans
tpc-ds-queries		tpc-ds-queries
tpc-ds-schema		tpc-ds-schema
tpc-h-queries-mysql		tpc-h-queries-mysql
tpc-h-schema		tpc-h-schema
tpc-h-utils		tpc-h-utils
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
Makefile		Makefile
README.org		README.org
config.yaml.example		config.yaml.example
explain_analyze.txt		explain_analyze.txt
explain_analyze_always.txt		explain_analyze_always.txt
tests.yaml		tests.yaml

License

johansolbakken/benchmark

Folders and files

Latest commit

History

Repository files navigation

Benchmarking for Optimistic Order-Preserving Hash Join (OOHJ)

Requirements

JOB from a fresh start

Testing the Optimistic Hash Join

Commands

Download JOB dataset

Start MySQL

config.yaml

Setup the Database

Load the Data

Prepare MySQL environment

Run SQL Queries

tests.yaml

C++ Debugging Tools

Header-only Logging File

Generate TPC-H for MacOS

Generate TPC-DS for MacOS

Join Order Benchmark Commands

Setup database and indexes in MySQL

Download job dataset

Feed job data

Convert queries: remove MIN(…)

Run JOB queries

Delete JOB artifacts

Warmup MySQL

Analyze

Check if any query fails for a database

Run any file

Homemade Dataset

Setup database and indexes in MySQL

Generate dataset

Feed homemade data

Analyze tables

Count Optimistic Hash Join

Export query plan as DOT

Enable and disable indexes for JOB

TPC-H setup

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages