Skip to content

A repository for benchmarking Optimistic Order-Preserving Hash Join (OOHJ) in MySQL using the Join Order Benchmark (JOB). Part of a Master’s thesis project, it includes scripts for setting up the database, running SQL queries, and evaluating performance.

License

Notifications You must be signed in to change notification settings

johansolbakken/benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

328 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Benchmarking for Optimistic Order-Preserving Hash Join (OOHJ)

Warning

This software is not intended for production use and is intended for our own private research purposes. Although others may use the code, please be aware that it is prone to change without notice.

This repository is part of our Master’s thesis, focused on benchmarking modifications made to MySQL. Our fork of MySQL, which implements the Optimistic Order-Preserving Hash Join (OOHJ) feature, is available at: https://github.com/johansolbakken/mysql-server/tree/oohj/oohj-iterator.

Link to Master Thesis will be added later.

Requirements

brew install ruby
gem install pq_query
gem install terminal-table
brew install hyperfine

JOB from a fresh start

If MySQL has lost JOB data then do the following.

First, make sure MySQL is running. (Tips use mtr --start-dirty or make run-dirty to avoid resetting MySQL).

make job-dataset            # Download job dataset
make job-order-queries      # Convert job queries to ORDER BY

# Ensure MySQL is running
make job-setup              # Create databases
make inline-local           # Allow inserting data from CSV
make job-feed               # Feed data to database
make prepare                # Enable hypergraph optimizer and disable stats_auto_recalc and set size-values for join-buffer and innodb-pool
make job-analyze            # Analyze job tables

Testing the Optimistic Hash Join

  1. Start the MySQL server
  2. Fill out config.yaml
make test

This will compare .sql files in homemade_dataset folder against .expected files in homemade_dataset folder. It will also do other types tests defined in tests.yaml file.

These tests are passing under our current assumptions. As we improve optimistic order-preserving hash join these tests will fail and we need to take new snapshots of the new expected state.

Commands

Download JOB dataset

make job-dataset

Start MySQL

  1. Download and build the MySQL source code.
  2. Start MySQL using MySQL Test Run (MTR), MySQL’s testing script:

First-time setup:

cd build # Navigate to the MySQL build folder
./mysql-test/mtr --start

For subsequent runs, use:

./mysql-test/mtr --start-dirty
  1. Configure the path to the MySQL binary in `config.yaml` (e.g., `build/bin/mysql`).

config.yaml

Copy the `config.yaml.example` file to `config.yaml` and update it with the path to your MySQL binary.

It is important that path points to a mysql client executable.

Setup the Database

Then, initialize the database by creating tables and indexes:

make job-setup

Load the Data

Feed the downloaded dataset into the database:

make job-feed

Prepare MySQL environment

This command sets the environment to ensure the environment is the same for every test.

ruby bin/benchmark.rb --prepare-mysql

Run SQL Queries

To run SQL queries, use the following commands:

  • Execute a query:
ruby bin/benchmark.rb --run ./job/1a.sql
  • Execute a query with EXPLAIN ANALYZE to analyze execution:
ruby bin/benchmark.rb --run ./job/1a.sql --analyze
  • Execute a query with EXPLAIN FORMAT=TREE to analyze plan:
ruby bin/benchmark.rb --run ./job/1a.sql --tree

tests.yaml

The YAML configuration is structured under a top-level tests key that divides tests into two categories: diff_test and contain_test. Each category may include a global setup section to prepare the environment before running tests, followed by a list of test cases under the tests key. In diff_test, each test is defined with a name, an SQL file specified by the sql key, and an expected file for output comparison; tests can also have individual setup commands. In contain_test, tests may include individual setup commands and verify outputs by checking for specific substrings using a contains list. To add a new test, choose the appropriate category based on whether you want a full output comparison or substring validation. Then, include any necessary setup commands and define the test with a unique name, the path to the SQL file, and either an expected file (for diff_test) or a contains list (for contain_test). Note that tests run sequentially, so the environment setup for one test may affect subsequent tests.

tests:
  diff_test:
    setup:
      - "ruby ./bin/generate_sort_hashjoin_dataset.rb 10000 10000"
      - "ruby ./bin/benchmark.rb --prepare-mysql"
    tests:
      - name: "Basic test"
        sql: "./homemade_dataset/homemade.sql"
        expected: "./homemade_dataset/homemade.expected"
      - name: "Disable optimistic hash join"
        sql: "./homemade_dataset/homemade_disabled.sql"
        expected: "./homemade_dataset/homemade_disabled.expected"

  contain_test:
    # Global setup is optional here.
    tests:
      - name: "went_on_disk=false, n=100 m=100"
        setup:
          - "ruby ./bin/generate_sort_hashjoin_dataset.rb 100 100"
          - "ruby ./bin/benchmark.rb --prepare-mysql"
        sql: "./homemade_dataset/went_on_disk.sql"
        contains:
          - "(optimistic hash join!)"
          - "(went_on_disk=false)"

C++ Debugging Tools

Header-only Logging File

The debug/logger.h is a class that can be used to fast log to a file.

Usage:

#include "/absolute_path_to_benchmark/debug/logger.h"

static Logger* s_logger = nullptr;

ClassToTest::ClassToTest() {
    s_logger = new Logger("~/path_to_output/log.txt");
}

void ClassToTest::functionToTest() {
    // Lets write CSV information to the logger.
    auto& logger = *s_logger;

    while (someCondition) {
        logger << logger.timestamp() << "," this->getSomeValue() << ",";
        logger << this->getState() << "\n";
    }
}

This class will delete the log-file on construction.

There is a timestamp() function for getting timestamps easily.

Currently using streams.

Generate TPC-H for MacOS

podman run --rm -it \
  -v $(pwd):/src \
  -w /src \
  ubuntu:22.04 \
  bash
# now in podman ubuntu
sudo apt update && sudo apt install -y gcc make ruby bison flex
ruby bin/build-tpc-h.rb

This will generate folders:

  • tpc-h-queries
  • tpc-h-ddl
  • tpc-h-dataset

Generate TPC-DS for MacOS

# copy the Makefile.suite and add -fcommon to CFLAGS
CFLAGS = $(BASE_CFLAGS) -D$(OS) $($(OS)_CFLAGS) -fcommon

# Start podman
podman run --rm -it \
  -v $(pwd):/src \
  -w /src \
  ubuntu:22.04 \
  bash

# now in podman ubuntu
sudo apt update && sudo apt install -y gcc make ruby bison flex
ruby bin/build-tpc-ds.rb

Join Order Benchmark Commands

Setup database and indexes in MySQL

Requires MySQL to be running.

make job-setup

To wipe database and recreate:

ruby bin/job-setup.rb --force

Download job dataset

Creates job-dataset folder.

make job-dataset

The job-dataset folder contains all the data as csv files.

Do this before feeding.

Feed job data

Feed data in job-dataset to MySQL database imdbload.

make job-feed

Convert queries: remove MIN(…)

The job files we were provided is altered such that each column is in a MIN aggregate.

We therefore have created scripts for removing MIN and additionally adding ORDER BY clauses.

To generate the queries without MIN or ORDER BY:

make job-queries

To make ordered queries:

make job-order-queries

Run JOB queries

make run-file DATABASE=imdbload FILE=./job-queries/10a.sql

Delete JOB artifacts

make job-clean

Warmup MySQL

Essentially means loading all data into memory.

make job-warmup

Analyze

To analyze run the script:

ruby bin/analyze.rb --job

Check if any query fails for a database

ruby ./bin/test-sql-files.rb --folder ./job-queries --database imdbload

Run any file

make run-file DATABASE=imdbload FILE=./job-queries/10b.sql

Homemade Dataset

Setup database and indexes in MySQL

Requires MySQL to be running.

make homemade-setup

To wipe database and recreate:

ruby bin/homemade-setup.rb --force

Generate dataset

For instance with table a size and table b size set to 10000. Default for both is 10000.

make homemade-dataset TABLE_A_SIZE=10000 TABLE_B_SIZE=10000

Feed homemade data

make homemade-feed

Analyze tables

make run-file DATABASE=homemade FILE=./sql/analyze_homemade.sql

Count Optimistic Hash Join

`bin/count-ohj.rb`, counts occurrences of “optimistic hash join” in SQL execution plans. It works by:

ruby bin/count-ohj.rb [--join-buffer-size SIZE]

Example (setting join buffer size to 16MB):

ruby bin/count-ohj.rb --join-buffer-size 16777216

Export query plan as DOT

Generates a graphical representation of a query plan from an input SQL file.

Run with an SQL file:

ruby bin/benchmark.rb ./job/1a.sql

Display the JSON output:

ruby bin/benchmark.rb --show-json ./job/1a.sql

Specify a custom output PNG file:

ruby bin/benchmark.rb -o custom_plan.png ./job/1a.sql

Keep the DOT file:

ruby bin/benchmark.rb -c ./job/1a.sql

With hints

ruby bin/print-plan-as-graphwiz.rb ./job-order-queries/1a.sql -o./1a.png --hint "/*+ SET_OPTIMISM_FUNC(SIGMOID) */"

Enable and disable indexes for JOB

make job-index-enable
make job-index-disable

TPC-H setup

  • There must exist an folder in root of this repo called tpc-h-tool with dbgen and others.
# Ensure you have data
ls tpc-h-tool/dbgen/*

# Generate tpc-h data
cargo run -p tpc-h-datagen

make experimental-setup      # activate hypergraph optimizer
make tpc-h-setup             # will setup both s1 and s10
make tpc-h-feed              # will feed both
make tpc-h-analyze           # this will analyze both s1 and s10
ruby bin/tpc-h-warmup.rb s1
rbuy bin/tpc-h-warmup.rb s10

About

A repository for benchmarking Optimistic Order-Preserving Hash Join (OOHJ) in MySQL using the Join Order Benchmark (JOB). Part of a Master’s thesis project, it includes scripts for setting up the database, running SQL queries, and evaluating performance.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •