Python utility that migrates tables, table definitions and structures (DDL), and merges data from Postgres database (on-prem or cloud) to Oracle 23ai database on OCI, which can then be used with AutoML.
- A base .env.dev file has been included in this repository. Rename this file to .env and enter your applicable credentials, or configure separately.
- For Postgres, either enter the full Database URL string in PG_DATABASE_URL_EP, or enter the host, port, name, and credentials into the individual fields. Both setups aren't required.
- To use SSL when connecting to Postgres (recommended), set the PG_DB_SSL_MODE = Yes, otherwise set it to No, or leave it blank.
- For Oracle, the utility will use the DSN provided in the OCI dashboard, as well as the schema's username/password. Enter those accordingly. Ensure the DSN is single line. Recommended to grab the lowest latency connection DSN if you need to migrate quickly.
- Oracle allows Mutual TLS (mTLS), or standard TLS connections. mTLS will require a Wallet which is not configured here. This tool is build with standard TLS. To use this, configure an Access Control List (ACL) in OCI and whitelist the IP Address(es) you plan to run this utility from, then disable mTLS in OCI in the ADB settings.
This information is also covered in the .env file instructions.
- Column type mapping is configured in migration/mapping.py. Example: The tool maps Postgres's TEXT data type to Oracle's CLOB data type since it doesn't have TEXT.
- For unmatched data types, the default for Oracle will either be CLOP or VARCHAR2(4000) with the goal to eliminate any room for failures in the data migrations.
- Simply changing this in mapping.py will change the behavior of the application accordingly.
- If you map TEXT fields or similar to VARCHAR2(4000) instead so you don't have to type cast in Oracle, ensure you substring the Postgres data when querying it, otherwise you risk merge failures if the Postgres column data exceeds 4k characters.
- Note: Oracle's 23ai is configured to automatically scale VARCHAR2 fields now. As a result, simply defining a column as VARCHAR2(4000) does not take up 4k bytes of space automatically, and instead will scale with actual usage.
This utility initiallty used cx_Oracle, but that has been replaced with the oracledb library. However, some individuals prefer to use cx_Oracle. This tool will work with cx_Oracle, but please note the following:
For cx_Oracle on Windows, specifically, the dependencies cannot be installed via pip or similar Python package managers. Microsoft Visual C++ 14.0 or greater is required, and if not installed, will generate an error with installing cx_Oracle, and Python package managers cannot install this for you.
Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/. When installing, ensure the C++ build tools are included. This includes but is not limited to:
- C++ x64/x86 build tools (most important)
- Windows 10 SDK OR Windows 11 SDK (most important)
- C++ Modules for v143 build tools (x64/x86)
- MSVC v143 - VS 2022 C++ x64/x86 build tools (Latest) OR MSVC v142 - VS 2019 C++ x64/x86 build tools
- C++/CLI support
- Windows Universal CRT SDK (optional: include if you want to create an interface for this utility)
- C++ ATL for v142 build tools (x86 & x64) (optional, but was installed when developing initially, so recommended)
SQL Alchemy can also be used in replace of oracledb. Technically, SQL Alchemy could be used for both Postgres and Oracle in this utility, and it is more friendly with Pandas, especially when encountering the LONG datatype in Oracle. However, SQL Alchemy was not chosen for development as it's not as native to the respective Postgres/Oracle environments as their corresponding libraries.