A set of tools for optimizing and working with standard database dump files (SQL dumps). The main goal is to significantly speed up the process of importing data into a database, especially for large dump files.
Standard tools like mysqldump or pg_dump often generate files where each INSERT statement adds only one or a few rows. With millions of records, importing such a file is very slow because each INSERT is a separate transaction and adds overhead on the database server side.
This script (optimize_sql_dump.py) processes a dump file and optimizes it in several ways to make the import much faster.
- Merging
INSERTStatements: The script combines many smallINSERT INTO ... VALUES (...), (...), ...statements into a single, large statement, which drastically reduces the number of queries to the database. - Fast Load Mode (
--load-data): Generates.tsvfiles (tab-separated values) and a.sqlfile withLOAD DATA INFILE(for MySQL) orCOPY(for PostgreSQL) statements. This is the fastest method for data import. - Split Mode (
--split): Splits a single large dump file into smaller.sqlfiles, one for each table. This makes it easier to manage and import only selected tables. - Automatic Compression Detection: The script can automatically read compressed files (
.gz,.bz2,.xz,.zip), so you don't need to decompress them manually. - Support for MySQL and PostgreSQL: Automatically detects the SQL dialect or allows you to specify it manually.
- Python 3.x
- Optionally, the
tqdmlibrary for displaying a progress bar:
pip install tqdmThe script is operated from the command line.
python optimize_sql_dump.py [opcje] <file_input> [file_output]Processes dump.sql.gz and saves the optimized version to dump_optimized.sql.
python optimize_sql_dump.py --input dump.sql.gz --output dump_optimized.sqlCreates the split_dump/ directory and places separate .sql files for each table from the big_dump.sql dump.
python optimize_sql_dump.py --input big_dump.sql --split ./split_dump/Creates the fast_load/ directory, containing:
.tsvfiles with data for each table,.sqlfiles withLOAD DATA (MySQL)orCOPY (PostgreSQL)statements to load data from the.tsvfiles.
python optimize_sql_dump.py --input big_dump.sql --load-data ./fast_load/If automatic detection fails, you can explicitly specify the database type.
python optimize_sql_dump.py --db-type postgres --input pg_dump.sql --output pg_dump_optimized.sqlProcesses only the CREATE and INSERT statements for the users table.
python optimize_sql_dump.py --table users --input dump.sql --output users_only.sqlparallel -j <n> "mysql -h <host> -u <user> < {} " ::: *.sql| Option | Short | Description |
|---|---|---|
| --input | -i | Input dump file (can be compressed). |
| --output | -o | Output file for the optimized dump. |
| --db-type | Database type: mysql, postgres, or auto (default). | |
| --table | -t | Optimize only the specified table. |
| --batch-size | Number of rows in a single merged INSERT statement (default: 1000). | |
| --split [dir] | Splits the dump into separate files per table in the specified directory (defaults to current). | |
| --load-data [dir] | Generates .tsv and .sql files for fast import (fastest option). | |
| --verbose | -v | Displays additional diagnostic information and a progress bar. |
| --dry-run | Runs the script without writing any output files. |