You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+18-14Lines changed: 18 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,17 +14,25 @@ Our open-source tool can subset databases up to 10GB, but it will struggle with
14
14
15
15
# Installation
16
16
17
-
Five steps to install, assuming Python 3.5+:
17
+
Five steps to install, assuming Python 3.6+:
18
18
19
-
1. Download the required Python modules. You can use [`pip`](https://pypi.org/project/pip/) for easy installation. The required modules are `toposort`, `psycopg2-binary`, and `mysql-connector-python`.
2. Install Postgres and/or MySQL database tools. For Postgres we need `pg_dump` and `psql` tools; they need to be on your `$PATH` or point to them with `$POSTGRES_PATH`. For MySQL we need `mysqldump` and `mysql`, they can be on your `$PATH` or point to them with `$MYSQL_PATH`.
2. Install Postgres and/or MySQL database tools. For Postgres we need `pg_dump` and `psql` tools; they need to be on your `$PATH` or point to them with `$POSTGRES_PATH`. For MySQL we need `mysqldump` and `mysql`, they can be on your `$PATH` or point to them with `$MYSQL_PATH`.
26
-
3. Download this repo. You can clone the repo or Download it as a zip. Scroll up, it's the green button that says "Clone or download".
27
-
4. Setup your configuration and save it in `config.json`. The provided `config.json.example` has the skeleton of what you need to provide: source and destination database connection details, as well as subsetting goals in `initial_targets`. Here's an example that will collect 10% of a table named `public.target_table`.
28
+
29
+
4. Install project:
30
+
```
31
+
$ poetry shell
32
+
$ poetry install -E postgres # Or use -E mysql
33
+
```
34
+
35
+
5. Setup your configuration and save it in `config.json`. The provided `config.json.example` has the skeleton of what you need to provide: source and destination database connection details, as well as subsetting goals in `initial_targets`. Here's an example that will collect 10% of a table named `public.target_table`.
There may be more required configuration depending on your database, but simple databases should be easy. See the Config section for more details, and `config.json.example_all` for all of the options in a single config file.
37
45
38
-
5. Run! `$ python direct_subset.py`
46
+
5. Run! `$ poetry run subset`
39
47
40
48
# Config
41
49
@@ -80,15 +88,11 @@ Below we describe the use of all configuration parameters, but the best place to
80
88
Almost all the configuration is in the `config.json` file, so running is as simple as
81
89
82
90
```
83
-
$ python direct_subset.py
91
+
$ poetry run subset
84
92
```
85
93
86
94
Two commandline arguements are supported:
87
95
88
96
`-v`: Verbose output. Useful for performance debugging. Lists almost every query made, and it's speed.
89
97
90
98
`--no-constraints`: For Postgres this will not add constraints found in the source database to the destination database. This option has no effect for MySQL.
91
-
92
-
# Requirements
93
-
94
-
Reference the requirements.txt file for a list of required python packages. Also, please note that Python 3.5+ is required.
0 commit comments