Skip to content

Commit 03cc617

Browse files
committed
Setup poetry
1 parent b9703c6 commit 03cc617

16 files changed

Lines changed: 261 additions & 47 deletions

README.md

Lines changed: 18 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -14,17 +14,25 @@ Our open-source tool can subset databases up to 10GB, but it will struggle with
1414

1515
# Installation
1616

17-
Five steps to install, assuming Python 3.5+:
17+
Five steps to install, assuming Python 3.6+:
1818

19-
1. Download the required Python modules. You can use [`pip`](https://pypi.org/project/pip/) for easy installation. The required modules are `toposort`, `psycopg2-binary`, and `mysql-connector-python`.
19+
1. [Install Poetry](https://python-poetry.org/docs/#installation)
20+
21+
2. Install Postgres and/or MySQL database tools. For Postgres we need `pg_dump` and `psql` tools; they need to be on your `$PATH` or point to them with `$POSTGRES_PATH`. For MySQL we need `mysqldump` and `mysql`, they can be on your `$PATH` or point to them with `$MYSQL_PATH`.
22+
23+
3. Clone project locally:
2024
```
21-
$ pip install toposort
22-
$ pip install psycopg2-binary
23-
$ pip install mysql-connector-python
25+
$ git clone https://github.com/TonicAI/condenser.git
26+
$ cd condenser
2427
```
25-
2. Install Postgres and/or MySQL database tools. For Postgres we need `pg_dump` and `psql` tools; they need to be on your `$PATH` or point to them with `$POSTGRES_PATH`. For MySQL we need `mysqldump` and `mysql`, they can be on your `$PATH` or point to them with `$MYSQL_PATH`.
26-
3. Download this repo. You can clone the repo or Download it as a zip. Scroll up, it's the green button that says "Clone or download".
27-
4. Setup your configuration and save it in `config.json`. The provided `config.json.example` has the skeleton of what you need to provide: source and destination database connection details, as well as subsetting goals in `initial_targets`. Here's an example that will collect 10% of a table named `public.target_table`.
28+
29+
4. Install project:
30+
```
31+
$ poetry shell
32+
$ poetry install -E postgres # Or use -E mysql
33+
```
34+
35+
5. Setup your configuration and save it in `config.json`. The provided `config.json.example` has the skeleton of what you need to provide: source and destination database connection details, as well as subsetting goals in `initial_targets`. Here's an example that will collect 10% of a table named `public.target_table`.
2836
```
2937
"initial_targets": [
3038
{
@@ -35,7 +43,7 @@ $ pip install mysql-connector-python
3543
```
3644
There may be more required configuration depending on your database, but simple databases should be easy. See the Config section for more details, and `config.json.example_all` for all of the options in a single config file.
3745

38-
5. Run! `$ python direct_subset.py`
46+
5. Run! `$ poetry run subset`
3947

4048
# Config
4149

@@ -80,15 +88,11 @@ Below we describe the use of all configuration parameters, but the best place to
8088
Almost all the configuration is in the `config.json` file, so running is as simple as
8189

8290
```
83-
$ python direct_subset.py
91+
$ poetry run subset
8492
```
8593

8694
Two commandline arguements are supported:
8795

8896
`-v`: Verbose output. Useful for performance debugging. Lists almost every query made, and it's speed.
8997

9098
`--no-constraints`: For Postgres this will not add constraints found in the source database to the destination database. This option has no effect for MySQL.
91-
92-
# Requirements
93-
94-
Reference the requirements.txt file for a list of required python packages. Also, please note that Python 3.5+ is required.

condenser/__init__.py

Whitespace-only changes.
Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
1-
import config_reader
1+
from condenser import config_reader
22

33
def get_specific_helper():
44
if config_reader.get_db_type() == 'postgres':
5-
import psql_database_helper
5+
from condenser import psql_database_helper
66
return psql_database_helper
77
else:
8-
import mysql_database_helper
8+
from condenser import mysql_database_helper
99
return mysql_database_helper

db_connect.py renamed to condenser/db_connect.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
1-
import config_reader
2-
import psycopg2, mysql.connector
1+
from condenser import config_reader
32
import os, pathlib, re, urllib, subprocess, os.path, json, getpass, time, sys, datetime
43

54
class DbConnect:
@@ -74,6 +73,7 @@ def __enter__(self):
7473
# method across MySQL and Postgres. This one is for Postgres
7574
class PsqlConnection(DbConnection):
7675
def __init__(self, connect, read_repeatable):
76+
import psycopg2
7777
connection_string = 'dbname=\'{0}\' user=\'{1}\' password=\'{2}\' host={3} port={4}'.format(connect.db_name, connect.user, connect.password, connect.host, connect.port)
7878

7979
if connect.ssl_mode :
@@ -91,6 +91,7 @@ def cursor(self, name=None, withhold=False):
9191
# method across MySQL and Postgres. This one is for MySQL
9292
class MySqlConnection(DbConnection):
9393
def __init__(self, connect, read_repeatable):
94+
import mysql.connector
9495
DbConnection.__init__(self, mysql.connector.connect(host=connect.host, port=connect.port, user=connect.user, password=connect.password, database=connect.db_name))
9596

9697
self.db_name = connect.db_name
Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,23 @@
1+
from condenser import config_reader, result_tabulator
2+
from condenser.subset import Subset
3+
from condenser.db_connect import DbConnect
4+
from condenser.subset_utils import print_progress
5+
from condenser import database_helper
16
import uuid, sys
2-
import config_reader, result_tabulator
37
import time
4-
from subset import Subset
5-
from psql_database_creator import PsqlDatabaseCreator
6-
from mysql_database_creator import MySqlDatabaseCreator
7-
from db_connect import DbConnect
8-
from subset_utils import print_progress
9-
import database_helper
108

119
def db_creator(db_type, source, dest):
1210
if db_type == 'postgres':
11+
from condenser.psql_database_creator import PsqlDatabaseCreator
1312
return PsqlDatabaseCreator(source, dest, False)
1413
elif db_type == 'mysql':
14+
from condenser.mysql_database_creator import MySqlDatabaseCreator
1515
return MySqlDatabaseCreator(source, dest)
1616
else:
1717
raise ValueError('unknown db_type ' + db_type)
1818

1919

20-
if __name__ == '__main__':
20+
def run():
2121
if "--stdin" in sys.argv:
2222
config_reader.initialize(sys.stdin)
2323
else:
@@ -48,7 +48,7 @@ def db_creator(db_type, source, dest):
4848
print_progress(sql, idx+1, len(config_reader.get_pre_constraint_sql()))
4949
db_helper.run_query(sql, destination_dbc.get_db_connection())
5050
print("Completed pre constraint SQL calls in {}s".format(time.time()-start_time))
51-
51+
5252

5353
print("Adding database constraints")
5454
if "--no-constraints" not in sys.argv:
@@ -65,4 +65,5 @@ def db_creator(db_type, source, dest):
6565
finally:
6666
subsetter.unprep_temp_dbs()
6767

68-
68+
if __name__ == '__main__':
69+
run()
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ def connection_args(connect):
7575

7676
# This is just for unit testing the creation and tear down processes
7777
if __name__ == '__main__':
78-
import config_reader, db_connect
78+
from condenser import config_reader, db_connect
7979
config_reader.initialize()
8080
src_connect = db_connect.DbConnect(config_reader.get_source_db_connection_info(), 'mysql')
8181
dest_connect = db_connect.DbConnect(config_reader.get_destination_db_connection_info(), 'mysql')
Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
import os, uuid, csv
2-
import config_reader
2+
from condenser import config_reader
33
from pathlib import Path
4-
from subset_utils import columns_joined, columns_tupled, quoter, schema_name, table_name, fully_qualified_table, redact_relationships
4+
from condenser.subset_utils import columns_joined, columns_tupled, quoter, schema_name, table_name, fully_qualified_table, redact_relationships
55

66
system_schemas_str = ','.join(['\'' + schema + '\'' for schema in ['information_schema', 'performance_schema', 'sys', 'mysql', 'innodb','tmp']])
77
temp_db = 'tonic_subset_temp_db_398dhjr23'
Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
import os, urllib, subprocess
2-
from db_connect import DbConnect
3-
import database_helper
2+
from condenser.db_connect import DbConnect
3+
from condenser import database_helper
44

55
class PsqlDatabaseCreator:
66
def __init__(self, source_dbc, destination_dbc, use_existing_dump = False):
Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
import os, uuid, csv
2-
import config_reader
2+
from condenser import config_reader
33
from pathlib import Path
44
from psycopg2.extras import execute_values, register_default_json, register_default_jsonb
5-
from subset_utils import columns_joined, columns_tupled, schema_name, table_name, fully_qualified_table, redact_relationships, quoter
5+
from condenser.subset_utils import columns_joined, columns_tupled, schema_name, table_name, fully_qualified_table, redact_relationships, quoter
66

77
register_default_json(loads=lambda x: str(x))
88
register_default_jsonb(loads=lambda x: str(x))

0 commit comments

Comments
 (0)