Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 14 additions & 4 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,8 @@ SOURCE_EXTENSION_DIR = external-table
TARGET_EXTENSION_DIR = gpextable

LICENSE ?= ASL 2.0
VENDOR ?= Open Source
VENDOR ?= Apache Cloudberry (Incubating)
RELEASE ?= 1

default: all

Expand Down Expand Up @@ -122,8 +123,17 @@ rpm:
make -C cli stage
make -C server stage
set -e ;\
PXF_MAIN_VERSION=$${PXF_VERSION//-SNAPSHOT/} ;\
if [[ $${PXF_VERSION} == *"-SNAPSHOT" ]]; then PXF_RELEASE=SNAPSHOT; else PXF_RELEASE=1; fi ;\
GP_MAJOR_VERSION=$$(cat $(SOURCE_EXTENSION_DIR)/build/metadata/gp_major_version) ;\
PXF_FULL_VERSION=$${PXF_VERSION} ;\
PXF_MAIN_VERSION=$$(echo $${PXF_FULL_VERSION} | sed -E 's/(-SNAPSHOT|-rc[0-9]+)$$//') ;\
if [[ $${PXF_FULL_VERSION} == *"-SNAPSHOT" ]]; then \
PXF_RELEASE=SNAPSHOT; \
elif [[ $${PXF_FULL_VERSION} =~ -rc([0-9]+)$$ ]]; then \
PXF_RELEASE="rc$${BASH_REMATCH[1]}"; \
else \
PXF_RELEASE=1; \
fi ;\
rm -rf build/rpmbuild ;\
mkdir -p build/rpmbuild/{BUILD,RPMS,SOURCES,SPECS} ;\
cp -a build/stage/$${PXF_PACKAGE_NAME}/pxf/* build/rpmbuild/SOURCES ;\
cp package/*.spec build/rpmbuild/SPECS/ ;\
Expand All @@ -133,7 +143,7 @@ rpm:
--define "pxf_release $${PXF_RELEASE}" \
--define "license ${LICENSE}" \
--define "vendor ${VENDOR}" \
-bb $${PWD}/build/rpmbuild/SPECS/pxf-cbdb$${GP_MAJOR_VERSION}.spec
-bb $${PWD}/build/rpmbuild/SPECS/cloudberry-pxf.spec

rpm-tar: rpm
rm -rf build/{stagerpm,distrpm}
Expand Down
52 changes: 20 additions & 32 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,8 @@ To build PXF, you must have:

Assuming you have installed Cloudberry into `/usr/local/cloudberrydb` directory, run its environment script:
```
source /usr/local/cloudberrydb/greenplum_path.sh
source /usr/local/cloudberrydb/greenplum_path.sh # For Cloudberry 2.0
source /usr/local/cloudberrydb/cloudberry-env.sh # For Cloudberry 2.1+
```

3. JDK 1.8 or JDK 11 to compile/run
Expand Down Expand Up @@ -171,44 +172,33 @@ cp ${PXF_HOME}/templates/*-site.xml ${PXF_BASE}/servers/default
> [!Note]
> Since the docker container will house all Single cluster Hadoop, Cloudberry and PXF, we recommend that you have at least 4 cpus and 6GB memory allocated to Docker. These settings are available under docker preferences.

The quick and easy is to download the Cloudberry RPM from GitHub and move it into the `/downloads` folder. Then run `./dev/start.bash` to get a docker image with a running Cloudberry, Hadoop cluster and an installed PXF.
We provide a Docker-based development environment that includes Cloudberry, Hadoop, and PXF. See [automation/README.Docker.md](automation/README.Docker.md) for detailed instructions.

#### Setup Cloudberry in the Docker image

Configure, build and install Cloudberry. This will be needed only when you use the container for the first time with Cloudberry source.
**Quick Start:**

```bash
~/workspace/pxf/dev/build_gpdb.bash
sudo mkdir /usr/local/cloudberry-db-devel
sudo chown gpadmin:gpadmin /usr/local/cloudberry-db-devel
~/workspace/pxf/dev/install_gpdb.bash
```
# Build and start the development container
docker compose -f ci/docker/pxf-cbdb-dev/ubuntu/docker-compose.yml build
docker compose -f ci/docker/pxf-cbdb-dev/ubuntu/docker-compose.yml up -d

For subsequent minor changes to Cloudberry source you can simply do the following:
```bash
~/workspace/pxf/dev/install_gpdb.bash
```
# Enter the container and run setup
docker exec -it pxf-cbdb-dev bash -c \
"cd /home/gpadmin/workspace/cloudberry-pxf/ci/docker/pxf-cbdb-dev/ubuntu && ./script/entrypoint.sh"

Run all the instructions below and run GROUP=smoke (in one script):
```bash
~/workspace/pxf/dev/smoke_shortcut.sh
```
# Run tests
docker exec -it pxf-cbdb-dev bash -c \
"cd /home/gpadmin/workspace/cloudberry-pxf/ci/docker/pxf-cbdb-dev/ubuntu && ./script/run_tests.sh"

Create Cloudberry Cluster
```bash
source /usr/local/cloudberrydb-db-devel/greenplum_path.sh
make -C ~/workspace/cbdb create-demo-cluster
source ~/workspace/cbdb/gpAux/gpdemo/gpdemo-env.sh
# Stop and clean up
docker compose -f ci/docker/pxf-cbdb-dev/ubuntu/docker-compose.yml down -v
```

#### Setup Hadoop
Hdfs will be needed to demonstrate functionality. You can choose to start additional hadoop components (hive/hbase) if you need them.

Setup [User Impersonation](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html) prior to starting the hadoop components (this allows the `gpadmin` user to access hadoop data).

```bash
~/workspace/pxf/dev/configure_singlecluster.bash
```
The Docker development environment automatically configures Hadoop. For manual setup, see [automation/README.Docker.md](automation/README.Docker.md).

Setup and start HDFS
```bash
Expand All @@ -233,13 +223,11 @@ popd
```

#### Setup Minio (optional)
Minio is an S3-API compatible local storage solution. The development docker image comes with Minio software pre-installed. To start the Minio server, run the following script:
```bash
source ~/workspace/pxf/dev/start_minio.bash
```
Minio is an S3-API compatible local storage solution. The development docker image comes with Minio software pre-installed. MinIO is automatically started by the Docker development environment.

After the server starts, you can access Minio UI at `http://localhost:9000` from the host OS. Use `admin` for the access key and `password` for the secret key when connecting to your local Minio instance.

The script also sets `PROTOCOL=minio` so that the automation framework will use the local Minio server when running S3 automation tests. If later you would like to run Hadoop HDFS tests, unset this variable with `unset PROTOCOL` command.
To run S3 automation tests, set `PROTOCOL=minio`. If later you would like to run Hadoop HDFS tests, unset this variable with `unset PROTOCOL` command.

#### Setup PXF

Expand Down Expand Up @@ -330,7 +318,7 @@ no JDK set for Gradle. Just cancel and retry. It goes away the second time.
- Download bin_gpdb (from any of the pipelines)
- Download pxf_tarball (from any of the pipelines)

These instructions allow you to run a Kerberized cluster
These instructions allow you to run a Kerberized cluster. See [automation/README.Docker.md](automation/README.Docker.md) for detailed Kerberos setup instructions.

```bash
docker run --rm -it \
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -157,30 +157,12 @@ public void orcReadMultiDimensionalLists() throws Exception {
runSqlTest("features/orc/read/multidim_list_types");
}

/*
* FDW fails for the data that contain a NUL-byte (i.e. '\/u000'"). This behaviour is different from external-table but same as GPDB Heap
* FDW Failure: invalid byte sequence for encoding "UTF8": 0x00
*
* GPDB also throws the same error when copying the data containing a NUL-byte
*
* postgres=# copy test from '/Users/pandeyhi/Documents/bad_data.txt' ;
* ERROR: invalid byte sequence for encoding "UTF8": 0x00
* TODO Do we need to do some changes to make sure the external-table behaves the same way as GPDB/FDW?
*
*/
@FailsWithFDW
@Test(groups = {"features", "gpdb", "security", "hcfs"})
public void orcReadStringsContainingNullByte() throws Exception {
prepareReadableExternalTable("pxf_orc_null_in_string", ORC_NULL_IN_STRING_COLUMNS, hdfsPath + ORC_NULL_IN_STRING);
runSqlTest("features/orc/read/null_in_string");
}

// @Test(groups = {"features", "gpdb", "security", "hcfs"})
// public void orcReadStringsContainingNullByte() throws Exception {
// prepareReadableExternalTable("pxf_orc_null_in_string", ORC_NULL_IN_STRING_COLUMNS, hdfsPath + ORC_NULL_IN_STRING);
// runTincTest("pxf.features.orc.read.null_in_string.runTest");
// }

private void prepareReadableExternalTable(String name, String[] fields, String path) throws Exception {
prepareReadableExternalTable(name, fields, path, false);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -195,6 +195,21 @@ public void parquetWritePrimitivesGZipClassName() throws Exception {
runWritePrimitivesScenario("pxf_parquet_write_primitives_gzip_classname", "pxf_parquet_read_primitives_gzip_classname", "parquet_write_primitives_gzip_classname", new String[]{"COMPRESSION_CODEC=org.apache.hadoop.io.compress.GzipCodec"});
}

@Test(groups = {"features", "gpdb", "security", "hcfs"})
public void parquetWritePrimitivesSnappy() throws Exception {
runWritePrimitivesScenario("pxf_parquet_write_primitives_snappy", "pxf_parquet_read_primitives_snappy", "parquet_write_primitives_snappy", new String[]{"COMPRESSION_CODEC=snappy"});
}

@Test(groups = {"features", "gpdb", "security", "hcfs"})
public void parquetWritePrimitivesUncompressed() throws Exception {
runWritePrimitivesScenario("pxf_parquet_write_primitives_uncompressed", "pxf_parquet_read_primitives_uncompressed", "parquet_write_primitives_uncompressed", new String[]{"COMPRESSION_CODEC=uncompressed"});
}

@Test(groups = {"features", "gpdb", "security", "hcfs"})
public void parquetWritePrimitivesZStd() throws Exception {
runWritePrimitivesScenario("pxf_parquet_write_primitives_zstd", "pxf_parquet_read_primitives_zstd", "parquet_write_primitives_zstd", new String[]{"COMPRESSION_CODEC=zstd"});
}

// Numeric precision not defined, test writing data precision in [1, 38]. All the data should be written correctly.
@Test(groups = {"features", "gpdb", "security", "hcfs"})
public void parquetWriteUndefinedPrecisionNumeric() throws Exception {
Expand Down
2 changes: 1 addition & 1 deletion ci/docker/pxf-cbdb-dev/ubuntu/script/entrypoint.sh
Original file line number Diff line number Diff line change
Expand Up @@ -468,7 +468,7 @@ start_hive_services() {

deploy_minio() {
log "deploying MinIO"
bash "${REPO_DIR}/dev/start_minio.bash"
bash "${PXF_SCRIPTS}/start_minio.bash"
}

main() {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -293,7 +293,7 @@ setup_ssl_material() {

deploy_minio() {
log "deploying MinIO (for S3 tests)"
bash "${REPO_ROOT}/dev/start_minio.bash"
bash "${PXF_SCRIPTS}/start_minio.bash"
}

configure_pxf_s3() {
Expand Down
File renamed without changes.
2 changes: 1 addition & 1 deletion cli/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ go install github.com/go-delve/delve/cmd/dlv@latest

```
config max-string-len 1000
break vendor/github.com/greenplum-db/gp-common-go-libs/cluster/cluster.go:351
break vendor/github.com/apache/cloudberry-go-libs/cluster/cluster.go:351
continue
print commandList
```
Expand Down
13 changes: 5 additions & 8 deletions cli/cmd/cluster.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@ import (
"os"
"strings"

"github.com/greenplum-db/gp-common-go-libs/cluster"
"github.com/greenplum-db/gp-common-go-libs/dbconn"
"github.com/greenplum-db/gp-common-go-libs/gplog"
"github.com/apache/cloudberry-go-libs/cluster"
"github.com/apache/cloudberry-go-libs/dbconn"
"github.com/apache/cloudberry-go-libs/gplog"
"github.com/spf13/cobra"
"github.com/blang/semver"
)
Expand Down Expand Up @@ -111,9 +111,6 @@ func GenerateOutput(cmd *command, clusterData *ClusterData) error {
}
response := ""
for _, failedCommand := range clusterData.Output.FailedCommands {
if failedCommand == nil {
continue
}
host := failedCommand.Host
errorMessage := failedCommand.Stderr
if len(errorMessage) == 0 {
Expand All @@ -138,8 +135,8 @@ func doSetup() (*ClusterData, error) {
connection := dbconn.NewDBConnFromEnvironment("postgres")
err := connection.Connect(1)
if err != nil {
gplog.Error(fmt.Sprintf("ERROR: Could not connect to GPDB.\n%s\n"+
"Please make sure that your Greenplum database is running and you are on the coordinator node.", err.Error()))
gplog.Error(fmt.Sprintf("ERROR: Could not connect to Cloudberry.\n%s\n"+
"Please make sure that your Apache Cloudberry is running and you are on the coordinator node.", err.Error()))
return nil, err
}

Expand Down
28 changes: 11 additions & 17 deletions cli/cmd/cluster_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ package cmd_test
import (
"pxf-cli/cmd"

"github.com/greenplum-db/gp-common-go-libs/cluster"
"github.com/apache/cloudberry-go-libs/cluster"

. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
Expand Down Expand Up @@ -145,11 +145,7 @@ var _ = Describe("GenerateOutput()", func() {
BeforeEach(func() {
clusterData.Output = &cluster.RemoteOutput{
NumErrors: 0,
FailedCommands: []*cluster.ShellCommand{
nil,
nil,
nil,
},
FailedCommands: []cluster.ShellCommand{},
Commands: []cluster.ShellCommand{
{
Host: "mdw",
Expand Down Expand Up @@ -234,8 +230,8 @@ var _ = Describe("GenerateOutput()", func() {
}
clusterData.Output = &cluster.RemoteOutput{
NumErrors: 1,
FailedCommands: []*cluster.ShellCommand{
&failedCommand,
FailedCommands: []cluster.ShellCommand{
failedCommand,
},
Commands: []cluster.ShellCommand{
{
Expand Down Expand Up @@ -358,8 +354,8 @@ stderr line three`
}
clusterData.Output = &cluster.RemoteOutput{
NumErrors: 1,
FailedCommands: []*cluster.ShellCommand{
&failedCommand,
FailedCommands: []cluster.ShellCommand{
failedCommand,
},
Commands: []cluster.ShellCommand{
{
Expand Down Expand Up @@ -393,8 +389,8 @@ stderr line three`
}
clusterData.Output = &cluster.RemoteOutput{
NumErrors: 1,
FailedCommands: []*cluster.ShellCommand{
&failedCommand,
FailedCommands: []cluster.ShellCommand{
failedCommand,
},
Commands: []cluster.ShellCommand{
{
Expand Down Expand Up @@ -422,9 +418,7 @@ stderr line three`
BeforeEach(func() {
clusterDataWithOneHost.Output = &cluster.RemoteOutput{
NumErrors: 0,
FailedCommands: []*cluster.ShellCommand{
nil,
},
FailedCommands: []cluster.ShellCommand{},
Commands: []cluster.ShellCommand{
{
Host: "mdw",
Expand Down Expand Up @@ -496,8 +490,8 @@ stderr line three`
}
clusterDataWithOneHost.Output = &cluster.RemoteOutput{
NumErrors: 1,
FailedCommands: []*cluster.ShellCommand{
&failedCommand,
FailedCommands: []cluster.ShellCommand{
failedCommand,
},
Commands: []cluster.ShellCommand{
failedCommand,
Expand Down
4 changes: 2 additions & 2 deletions cli/cmd/cmd_suite_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@ import (
"os/user"
"testing"

"github.com/greenplum-db/gp-common-go-libs/operating"
"github.com/greenplum-db/gp-common-go-libs/testhelper"
"github.com/apache/cloudberry-go-libs/operating"
"github.com/apache/cloudberry-go-libs/testhelper"
"github.com/onsi/gomega/gbytes"

. "github.com/onsi/ginkgo/v2"
Expand Down
2 changes: 1 addition & 1 deletion cli/cmd/pxf.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ import (
"os"
"strings"

"github.com/greenplum-db/gp-common-go-libs/cluster"
"github.com/apache/cloudberry-go-libs/cluster"
)

type envVar string
Expand Down
2 changes: 1 addition & 1 deletion cli/cmd/root.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ package cmd
import (
"os"

"github.com/greenplum-db/gp-common-go-libs/gplog"
"github.com/apache/cloudberry-go-libs/gplog"

"github.com/spf13/cobra"
)
Expand Down
10 changes: 5 additions & 5 deletions cli/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@ module pxf-cli
go 1.21.3

require (
github.com/greenplum-db/gp-common-go-libs v1.0.16
github.com/apache/cloudberry-go-libs v1.0.12-0.20250910014224-fc376e8a1056
github.com/blang/semver v3.5.1+incompatible
github.com/onsi/ginkgo/v2 v2.13.0
github.com/onsi/gomega v1.28.0
github.com/pkg/errors v0.9.1
Expand All @@ -12,7 +13,6 @@ require (

require (
github.com/DATA-DOG/go-sqlmock v1.5.0 // indirect
github.com/blang/semver v3.5.1+incompatible // indirect
github.com/go-logr/logr v1.2.4 // indirect
github.com/go-task/slim-sprig v0.0.0-20230315185526-52ccab3ef572 // indirect
github.com/google/go-cmp v0.6.0 // indirect
Expand All @@ -28,9 +28,9 @@ require (
github.com/jackc/pgx/v4 v4.18.2 // indirect
github.com/jmoiron/sqlx v1.3.5 // indirect
github.com/spf13/pflag v1.0.3 // indirect
golang.org/x/crypto v0.20.0 // indirect
golang.org/x/net v0.21.0 // indirect
golang.org/x/sys v0.17.0 // indirect
golang.org/x/crypto v0.21.0 // indirect
golang.org/x/net v0.23.0 // indirect
golang.org/x/sys v0.18.0 // indirect
golang.org/x/text v0.14.0 // indirect
golang.org/x/tools v0.12.0 // indirect
gopkg.in/yaml.v3 v3.0.1 // indirect
Expand Down
Loading
Loading