Skip to content

[SNOW-TBD] AWS SDK v2 migration Step 1: Add v2 deps + migrate IcebergS3Client#1149

Open
sfc-gh-ggeng wants to merge 10 commits intomasterfrom
aws-sdk-v2-step1-deps-and-iceberg-s3
Open

[SNOW-TBD] AWS SDK v2 migration Step 1: Add v2 deps + migrate IcebergS3Client#1149
sfc-gh-ggeng wants to merge 10 commits intomasterfrom
aws-sdk-v2-step1-deps-and-iceberg-s3

Conversation

@sfc-gh-ggeng
Copy link
Copy Markdown
Contributor

@sfc-gh-ggeng sfc-gh-ggeng commented Apr 8, 2026

Summary

Begin migration from AWS SDK v1 (com.amazonaws:1.12.655) to AWS SDK v2 (software.amazon.awssdk:2.37.5). Start with the lowest-risk path: IcebergS3Client (no client-side encryption).

Step 1: Add v2 dependencies alongside v1

  • Add software.amazon.awssdk:bom:2.37.5 to dependencyManagement
  • Add s3, s3-transfer-manager, netty-nio-client, auth, http-auth-aws
  • Exclude aws-crt (cannot be shaded) — uses correct groupId software.amazon.awssdk.crt
  • Add shade rules for software.amazon.awssdk, software.amazon.eventstream, org.reactivestreams
  • Keep v1 deps (still used by SnowflakeS3Client, GCS clients)

Step 2: Migrate IcebergS3Client to v2

IcebergS3Client.java:

  • AmazonS3S3AsyncClient with NettyNioAsyncHttpClient
  • TransferManagerS3TransferManager
  • BasicAWSCredentials/BasicSessionCredentialsAwsBasicCredentials/AwsSessionCredentials
  • v1 ClientConfiguration → v2 ClientConfiguration inner class (matches JDBC's SnowflakeS3Client.ClientConfiguration)
  • ObjectMetadata SSE → ServerSideEncryption.fromValue() + ssekmsKeyId() (uses v2 constants, no hardcoded strings)
  • AmazonS3ExceptionS3Exception with null-safe awsErrorDetails() check
  • RegionUtils.getRegion()Region.of()
  • Multipart threshold set to 16MB (Risk 2 fix)
  • Upload streams wrapped in BufferedInputStream (Risk 1 fix)
  • SSLConnectionSocketFactory removed (v2 Netty handles TLS natively)
  • Netty HTTP client configured with connectionTimeout, readTimeout, writeTimeout, connectionAcquisitionTimeout (matching JDBC)
  • Exception handler checks ex.getCause() for SdkException (matching JDBC) — CompletableFuture.join() wraps exceptions in CompletionException
  • CompletionException handling added for non-SDK async failures
  • Upload builds PutObjectRequest via IcebergS3ObjectMetadata.getS3PutObjectRequest() (matching JDBC's S3ObjectMetadata pattern)
  • CRC32 checksum added to PutObjectRequest (matching JDBC)
  • String comparison fix: !=.equals() / .isEmpty() for endpoint checks
  • Proxy: .scheme() for protocol (HTTP/HTTPS), .useEnvironmentVariableValues(false), .useSystemPropertyValues(false) (matching JDBC's CloudStorageProxyFactory)

IcebergS3ObjectMetadata.java:

  • Removed v1 ObjectMetadata wrapper
  • Now uses plain Map<String, String> + fields (like IcebergCommonObjectMetadata)
  • Added getS3PutObjectRequest() method with ChecksumAlgorithm.CRC32 (matches JDBC's S3ObjectMetadata)

IcebergStorageClientFactory.java:

  • Creates IcebergS3Client.ClientConfiguration with maxConnections, maxErrorRetry, connectionTimeout, socketTimeout (matches JDBC's StorageClientFactory.createS3Client())
  • Timeout values from JdbcHttpUtil.getConnectionTimeout() / getSocketTimeout()

JDBC v2 reference code

Each change follows patterns from the JDBC driver's completed v2 migration at snowflakedb/snowflake-jdbc:

Pattern JDBC Reference
S3AsyncClient builder + region + endpoint SnowflakeS3Client.java#L165-L204
ClientConfiguration inner class SnowflakeS3Client.java#L772-L796
ClientConfiguration creation with timeouts StorageClientFactory.java#L117-L121
NettyNioAsyncHttpClient + timeouts + ProxyConfiguration SnowflakeS3Client.java#L196-L202
Proxy scheme + env isolation CloudStorageProxyFactory.java#L105-L112
Multipart threshold 16MB SnowflakeS3Client.java#L203
S3TransferManager builder SnowflakeS3Client.java#L545
BufferedInputStream wrap for upload SnowflakeS3Client.java#L645-L650
S3ObjectMetadata.getS3PutObjectRequest() with CRC32 S3ObjectMetadata.java#L72-L78
awsErrorDetails() null-safe check S3ErrorHandler.java#L67-L68
Exception handler: ex.getCause() for CompletionException unwrap SnowflakeS3Client.java#L727-L730
forcePathStyle(false) SnowflakeS3Client.java#L185

JDBC bug fix PRs applied

Risk JDBC PR Description
CipherInputStream silent corruption #2502 CipherInputStream doesn't support mark/reset; SDK retry reads from wrong position → wrap in BufferedInputStream
Multipart threshold regression #2526 v2 default changed from 16MB to 8MB → restore 16MB explicitly
S3Exception NPE #2550 awsErrorDetails() can return null → null-check before .errorCode()

Expected differences from JDBC (Iceberg path does not need these)

JDBC Feature Why Not Needed
SFSession-based proxy (CloudStorageProxyFactory) Iceberg path is sessionless — uses proxyProperties directly
ClientOverrideConfiguration + ExecutionInterceptor Only needed for JDBC's HttpHeadersCustomizer session feature
download() / downloadToStream() Iceberg path is upload-only
renew() / shutdown() Iceberg path creates fresh clients per upload
listObjects() / getObjectMetadata() Iceberg path doesn't read S3 objects
Client-side encryption (EncryptionProvider) Iceberg uses server-side encryption (SSE-S3/SSE-KMS)
S3ObjectMetadata(HeadObjectResponse) constructor Not needed without download/head operations
execution.interceptors shade patching Deferred to shade plugin cleanup PR

Stacked on #1147.

Test plan

  • mvn compiler:compile passes
  • aws-crt properly excluded from dependency tree
  • CI passes
  • Integration tests on S3 (Iceberg path)

🤖 Generated with Claude Code

@sfc-gh-ggeng sfc-gh-ggeng requested review from a team as code owners April 8, 2026 20:35
Base automatically changed from jdbc-removal-port-missing-tests to master April 14, 2026 00:30
sfc-gh-ggeng and others added 6 commits April 14, 2026 00:31
Add software.amazon.awssdk:2.37.5 (matching JDBC driver version):
- BOM import in dependencyManagement
- s3, s3-transfer-manager, netty-nio-client, auth, http-auth-aws
- aws-crt excluded (cannot be shaded)

Add shade relocation rules:
- software.amazon.awssdk
- software.amazon.eventstream
- org.reactivestreams (new transitive dep from v2 async)

Keep v1 (com.amazonaws) deps for now — will be removed after all
files are migrated.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace all com.amazonaws imports with software.amazon.awssdk equivalents:

IcebergS3Client:
- AmazonS3 → S3AsyncClient with NettyNioAsyncHttpClient
- TransferManager → S3TransferManager
- BasicAWSCredentials/BasicSessionCredentials → AwsBasicCredentials/AwsSessionCredentials
- ClientConfiguration → ProxyConfiguration + builder params
- ObjectMetadata SSE → PutObjectRequest.builder().serverSideEncryption()
- AmazonS3Exception → S3Exception with null-safe awsErrorDetails() check
- RegionUtils.getRegion() → Region.of()
- Multipart threshold set to 16MB (match JDBC)
- Streams wrapped in BufferedInputStream (JDBC CipherInputStream fix)
- SSLConnectionSocketFactory removed (v2 Netty handles SSL)

IcebergS3ObjectMetadata:
- Removed v1 ObjectMetadata wrapper
- Now uses plain Map + fields (like IcebergCommonObjectMetadata)

IcebergStorageClientFactory:
- Removed ClientConfiguration creation
- Passes maxConnections directly to IcebergS3Client

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
software.amazon.eventstream:eventstream (Apache 2.0) is a transitive
dependency of AWS SDK v2 that doesn't bundle a license file in its JAR.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add ClientConfiguration inner class (maxConnections, maxErrorRetry,
  connectionTimeout, socketTimeout) matching JDBC's SnowflakeS3Client
- Configure Netty HTTP timeouts: connectionAcquisitionTimeout(60s),
  connectionTimeout, readTimeout, writeTimeout
- Fix exception handler to check ex.getCause() for SdkException
  (CompletableFuture.join() wraps in CompletionException)
- Add CompletionException handling for non-SDK async failures
- Fix string comparison bug: != "" → .isEmpty(), != "null" → .equals()
- Add getS3PutObjectRequest() to IcebergS3ObjectMetadata with CRC32
  checksum (matching JDBC's S3ObjectMetadata)
- Restore ClientConfiguration creation in IcebergStorageClientFactory
  with timeouts from JdbcHttpUtil
- Fix aws-crt exclusion groupId: software.amazon.awssdk →
  software.amazon.awssdk.crt

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Read PROXY_PROTOCOL property and set .scheme() on ProxyConfiguration
  (HTTP vs HTTPS, matching JDBC's CloudStorageProxyFactory)
- Add .useEnvironmentVariableValues(false) and
  .useSystemPropertyValues(false) to prevent Netty from reading
  HTTP_PROXY/HTTPS_PROXY env vars (v2 equivalent of the old v1 pattern
  of setting empty proxy host/port/user/password)
- Add protocol to proxy log message

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Use ServerSideEncryption.fromValue() instead of hardcoded SSE check
- Use ServerSideEncryption.AES256/AWS_KMS.toString() for algorithm strings
- Update migration plan to reflect actual PR structure and lessons learned:
  Steps 1+2 merged, ClientConfiguration pattern, CRC32 checksum,
  CompletionException unwrapping, proxy protocol, execution.interceptors

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@sfc-gh-ggeng sfc-gh-ggeng force-pushed the aws-sdk-v2-step1-deps-and-iceberg-s3 branch from f762d6c to d41993e Compare April 14, 2026 00:31
sfc-gh-ggeng and others added 4 commits April 14, 2026 21:52
The reactive-streams library (transitive dep of AWS SDK v2) uses MIT-0
which wasn't in the license whitelist, causing the license-maven-plugin
check to fail.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New transitive dependency from AWS SDK v2's netty-nio-client. The JAR
doesn't ship a license file, causing process_licenses.py to fail.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add aws-core, http-client-spi, regions, sdk-core as direct deps
  (were used undeclared transitives)
- Remove http-auth-aws (unused — no direct imports)
- Fixes maven-dependency-plugin:analyze-only CI check

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
AWS SDK v2 sdk-core includes a VersionInfo.java source template at the
jar root. The shade plugin doesn't relocate .java files, so it leaked
into the shaded jar as a bare class outside the snowflake namespace,
failing the check-shaded-content CI check.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants