Skip to content

PXF JDBC drivers #74

@ostinru

Description

@ostinru

I am going to test that cloudberry-pxf works as good as greenplum-pxf with JDBC data sources (and update docs).

Currently, Automation covers only PostgreSQL JDBC (because postgresq JDBC driver is included into pxf build and we can use Cloudberry as PostgreSQL instance). However most issues we observe in production happens with other databases (usually Oracle and ClickHouse). Oracle uses weird data types that doesn't match PostgreSQL ones, ClickHouse used to have issues with huge volumes of data users going to export.

Here I see couple issues:

  1. Oracle licensing is not clear for me. I am not sure that we can run containerised Oracle in CI / dev machines. I have seen that TestContainers provides[1] Oracle as one of the options. And it is stated that this container is used across different open-source projects[2]. Here I need guidance and best practices from @tuhaihe / Apache.
  2. MS SQL Server also requires accepting EULA before running container with a database [3]. Is it OK?

And, is it possible to download these docker images from "special network environments" (#63)?

Test Design ideas

Dependencies:
I am not sure that we want to provide "batteries drivers included" with cloudberry-pxf, or even as cloudberry-pxf-driveres[4]. It will be an obligation to support different databases for ages (as we are doing with HBase). However we can keep these drivers as test dependencies for pxf-jdbc.

TestContainers:
I think that we can start Cloudberry + PXF container (ci/docker/pxf-cbdb-dev) directly from java code in TestContainers with shared network with 3rd party databases. I know, that this will run SLOW (slow Cloudberry compilation, PXF build, Hadoop start). But it seems to be step into right direction.

+----------------------------------------------------------------+
|                              Host                              |
|  +-------------------------------+  +-----------------------+  |
|  |           Docker              |  |        Docker         |  |
|  |   [Cloudberry] --> [PXF] ------------> [Database]        |  |
|  |                               |  |                       |  |
|  +-------------------------------+  +-----------------------+  |
|                                                                |
|                         [Automation]                           |
+----------------------------------------------------------------+

@MisterRaindrop , any thoughts on this?

[1] https://java.testcontainers.org/modules/databases/oraclefree/
[2] https://github.com/gvenzl/oci-oracle-free?tab=readme-ov-file#users-of-these-images
[3] https://java.testcontainers.org/modules/databases/mssqlserver/
[4] open-gpdb#6

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions