Skip to content

Commit 8a3de83

Browse files
Merge branch 'apache:main' into copy_to_preserve_ordering
2 parents ad8f9a3 + d1e6eb4 commit 8a3de83

39 files changed

+1189
-419
lines changed

.github/workflows/rust.yml

Lines changed: 2 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -218,6 +218,8 @@ jobs:
218218
run: cargo check --profile ci --no-default-features -p datafusion --features=string_expressions
219219
- name: Check datafusion (unicode_expressions)
220220
run: cargo check --profile ci --no-default-features -p datafusion --features=unicode_expressions
221+
- name: Check parquet encryption (parquet_encryption)
222+
run: cargo check --profile ci --no-default-features -p datafusion --features=parquet_encryption
221223

222224
# Check datafusion-functions crate features
223225
#
@@ -312,18 +314,6 @@ jobs:
312314
fetch-depth: 1
313315
- name: Setup Rust toolchain
314316
run: rustup toolchain install stable
315-
- name: Setup Minio - S3-compatible storage
316-
run: |
317-
docker run -d --name minio-container \
318-
-p 9000:9000 \
319-
-e MINIO_ROOT_USER=TEST-DataFusionLogin -e MINIO_ROOT_PASSWORD=TEST-DataFusionPassword \
320-
-v $(pwd)/datafusion/core/tests/data:/source quay.io/minio/minio \
321-
server /data
322-
docker exec minio-container /bin/sh -c "\
323-
mc ready local
324-
mc alias set localminio http://localhost:9000 TEST-DataFusionLogin TEST-DataFusionPassword && \
325-
mc mb localminio/data && \
326-
mc cp -r /source/* localminio/data"
327317
- name: Run tests (excluding doctests)
328318
env:
329319
RUST_BACKTRACE: 1
@@ -335,9 +325,6 @@ jobs:
335325
run: cargo test --profile ci -p datafusion-cli --lib --tests --bins
336326
- name: Verify Working Directory Clean
337327
run: git diff --exit-code
338-
- name: Minio Output
339-
if: ${{ !cancelled() }}
340-
run: docker logs minio-container
341328

342329

343330
linux-test-example:

Cargo.lock

Lines changed: 18 additions & 36 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -150,6 +150,7 @@ env_logger = "0.11"
150150
futures = "0.3"
151151
half = { version = "2.6.0", default-features = false }
152152
hashbrown = { version = "0.14.5", features = ["raw"] }
153+
hex = { version = "0.4.3" }
153154
indexmap = "2.10.0"
154155
itertools = "0.14"
155156
log = "^0.4"
@@ -173,6 +174,8 @@ rstest = "0.25.0"
173174
serde_json = "1"
174175
sqlparser = { version = "0.55.0", default-features = false, features = ["std", "visitor"] }
175176
tempfile = "3"
177+
testcontainers = { version = "0.24", features = ["default"] }
178+
testcontainers-modules = { version = "0.12" }
176179
tokio = { version = "1.46", features = ["macros", "rt", "sync"] }
177180
url = "2.5.4"
178181

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -120,6 +120,7 @@ Default features:
120120
- `datetime_expressions`: date and time functions such as `to_timestamp`
121121
- `encoding_expressions`: `encode` and `decode` functions
122122
- `parquet`: support for reading the [Apache Parquet] format
123+
- `parquet_encryption`: support for using [Parquet Modular Encryption]
123124
- `regex_expressions`: regular expression functions, such as `regexp_match`
124125
- `unicode_expressions`: Include unicode aware functions such as `character_length`
125126
- `unparser`: enables support to reverse LogicalPlans back into SQL
@@ -134,6 +135,7 @@ Optional features:
134135

135136
[apache avro]: https://avro.apache.org/
136137
[apache parquet]: https://parquet.apache.org/
138+
[parquet modular encryption]: https://parquet.apache.org/docs/file-format/data-pages/encryption/
137139

138140
## DataFusion API Evolution and Deprecation Guidelines
139141

datafusion-cli/CONTRIBUTING.md

Lines changed: 14 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -29,47 +29,26 @@ cargo test
2929

3030
## Running Storage Integration Tests
3131

32-
By default, storage integration tests are not run. To run them you will need to set `TEST_STORAGE_INTEGRATION=1` and
33-
then provide the necessary configuration for that object store.
32+
By default, storage integration tests are not run. These test use the `testcontainers` crate to start up a local MinIO server using docker on port 9000.
3433

35-
For some of the tests, [snapshots](https://datafusion.apache.org/contributor-guide/testing.html#snapshot-testing) are used.
36-
37-
### AWS
38-
39-
To test the S3 integration against [Minio](https://github.com/minio/minio)
40-
41-
First start up a container with Minio and load test files.
34+
To run them you will need to set `TEST_STORAGE_INTEGRATION`:
4235

4336
```shell
44-
docker run -d \
45-
--name datafusion-test-minio \
46-
-p 9000:9000 \
47-
-e MINIO_ROOT_USER=TEST-DataFusionLogin \
48-
-e MINIO_ROOT_PASSWORD=TEST-DataFusionPassword \
49-
-v $(pwd)/../datafusion/core/tests/data:/source \
50-
quay.io/minio/minio server /data
51-
52-
docker exec datafusion-test-minio /bin/sh -c "\
53-
mc ready local
54-
mc alias set localminio http://localhost:9000 TEST-DataFusionLogin TEST-DataFusionPassword && \
55-
mc mb localminio/data && \
56-
mc cp -r /source/* localminio/data"
37+
TEST_STORAGE_INTEGRATION=1 cargo test
5738
```
5839

59-
Setup environment
40+
For some of the tests, [snapshots](https://datafusion.apache.org/contributor-guide/testing.html#snapshot-testing) are used.
6041

61-
```shell
62-
export TEST_STORAGE_INTEGRATION=1
63-
export AWS_ACCESS_KEY_ID=TEST-DataFusionLogin
64-
export AWS_SECRET_ACCESS_KEY=TEST-DataFusionPassword
65-
export AWS_ENDPOINT=http://127.0.0.1:9000
66-
export AWS_ALLOW_HTTP=true
67-
```
42+
### AWS
6843

69-
Note that `AWS_ENDPOINT` is set without slash at the end.
44+
S3 integration is tested against [Minio](https://github.com/minio/minio) with [TestContainers](https://github.com/testcontainers/testcontainers-rs)
45+
This requires Docker to be running on your machine and port 9000 to be free.
7046

71-
Run tests
47+
If you see an error mentioning "failed to load IMDS session token" such as
7248

73-
```shell
74-
cargo test
75-
```
49+
> ---- object_storage::tests::s3_object_store_builder_resolves_region_when_none_provided stdout ----
50+
> Error: ObjectStore(Generic { store: "S3", source: "Error getting credentials from provider: an error occurred while loading credentials: failed to load IMDS session token" })
51+
52+
You my need to disable trying to fetch S3 credentials from the environment using the `AWS_EC2_METADATA_DISABLED`, for example:
53+
54+
> $ AWS_EC2_METADATA_DISABLED=true TEST_STORAGE_INTEGRATION=1 cargo test

datafusion-cli/Cargo.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,3 +72,5 @@ insta = { workspace = true }
7272
insta-cmd = "0.6.0"
7373
predicates = "3.0"
7474
rstest = { workspace = true }
75+
testcontainers = { workspace = true }
76+
testcontainers-modules = { workspace = true, features = ["minio"] }

0 commit comments

Comments
 (0)