[SNOW-TBD] AWS SDK v2 migration Step 1: Add v2 deps + migrate IcebergS3Client#1149
Open
sfc-gh-ggeng wants to merge 10 commits intomasterfrom
Open
[SNOW-TBD] AWS SDK v2 migration Step 1: Add v2 deps + migrate IcebergS3Client#1149sfc-gh-ggeng wants to merge 10 commits intomasterfrom
sfc-gh-ggeng wants to merge 10 commits intomasterfrom
Conversation
sfc-gh-alhuang
approved these changes
Apr 13, 2026
Add software.amazon.awssdk:2.37.5 (matching JDBC driver version): - BOM import in dependencyManagement - s3, s3-transfer-manager, netty-nio-client, auth, http-auth-aws - aws-crt excluded (cannot be shaded) Add shade relocation rules: - software.amazon.awssdk - software.amazon.eventstream - org.reactivestreams (new transitive dep from v2 async) Keep v1 (com.amazonaws) deps for now — will be removed after all files are migrated. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace all com.amazonaws imports with software.amazon.awssdk equivalents: IcebergS3Client: - AmazonS3 → S3AsyncClient with NettyNioAsyncHttpClient - TransferManager → S3TransferManager - BasicAWSCredentials/BasicSessionCredentials → AwsBasicCredentials/AwsSessionCredentials - ClientConfiguration → ProxyConfiguration + builder params - ObjectMetadata SSE → PutObjectRequest.builder().serverSideEncryption() - AmazonS3Exception → S3Exception with null-safe awsErrorDetails() check - RegionUtils.getRegion() → Region.of() - Multipart threshold set to 16MB (match JDBC) - Streams wrapped in BufferedInputStream (JDBC CipherInputStream fix) - SSLConnectionSocketFactory removed (v2 Netty handles SSL) IcebergS3ObjectMetadata: - Removed v1 ObjectMetadata wrapper - Now uses plain Map + fields (like IcebergCommonObjectMetadata) IcebergStorageClientFactory: - Removed ClientConfiguration creation - Passes maxConnections directly to IcebergS3Client Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
software.amazon.eventstream:eventstream (Apache 2.0) is a transitive dependency of AWS SDK v2 that doesn't bundle a license file in its JAR. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add ClientConfiguration inner class (maxConnections, maxErrorRetry, connectionTimeout, socketTimeout) matching JDBC's SnowflakeS3Client - Configure Netty HTTP timeouts: connectionAcquisitionTimeout(60s), connectionTimeout, readTimeout, writeTimeout - Fix exception handler to check ex.getCause() for SdkException (CompletableFuture.join() wraps in CompletionException) - Add CompletionException handling for non-SDK async failures - Fix string comparison bug: != "" → .isEmpty(), != "null" → .equals() - Add getS3PutObjectRequest() to IcebergS3ObjectMetadata with CRC32 checksum (matching JDBC's S3ObjectMetadata) - Restore ClientConfiguration creation in IcebergStorageClientFactory with timeouts from JdbcHttpUtil - Fix aws-crt exclusion groupId: software.amazon.awssdk → software.amazon.awssdk.crt Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Read PROXY_PROTOCOL property and set .scheme() on ProxyConfiguration (HTTP vs HTTPS, matching JDBC's CloudStorageProxyFactory) - Add .useEnvironmentVariableValues(false) and .useSystemPropertyValues(false) to prevent Netty from reading HTTP_PROXY/HTTPS_PROXY env vars (v2 equivalent of the old v1 pattern of setting empty proxy host/port/user/password) - Add protocol to proxy log message Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Use ServerSideEncryption.fromValue() instead of hardcoded SSE check - Use ServerSideEncryption.AES256/AWS_KMS.toString() for algorithm strings - Update migration plan to reflect actual PR structure and lessons learned: Steps 1+2 merged, ClientConfiguration pattern, CRC32 checksum, CompletionException unwrapping, proxy protocol, execution.interceptors Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
f762d6c to
d41993e
Compare
The reactive-streams library (transitive dep of AWS SDK v2) uses MIT-0 which wasn't in the license whitelist, causing the license-maven-plugin check to fail. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New transitive dependency from AWS SDK v2's netty-nio-client. The JAR doesn't ship a license file, causing process_licenses.py to fail. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add aws-core, http-client-spi, regions, sdk-core as direct deps (were used undeclared transitives) - Remove http-auth-aws (unused — no direct imports) - Fixes maven-dependency-plugin:analyze-only CI check Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
AWS SDK v2 sdk-core includes a VersionInfo.java source template at the jar root. The shade plugin doesn't relocate .java files, so it leaked into the shaded jar as a bare class outside the snowflake namespace, failing the check-shaded-content CI check. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Begin migration from AWS SDK v1 (com.amazonaws:1.12.655) to AWS SDK v2 (software.amazon.awssdk:2.37.5). Start with the lowest-risk path: IcebergS3Client (no client-side encryption).
Step 1: Add v2 dependencies alongside v1
software.amazon.awssdk:bom:2.37.5to dependencyManagements3,s3-transfer-manager,netty-nio-client,auth,http-auth-awsaws-crt(cannot be shaded) — uses correct groupIdsoftware.amazon.awssdk.crtsoftware.amazon.awssdk,software.amazon.eventstream,org.reactivestreamsStep 2: Migrate IcebergS3Client to v2
IcebergS3Client.java:
AmazonS3→S3AsyncClientwithNettyNioAsyncHttpClientTransferManager→S3TransferManagerBasicAWSCredentials/BasicSessionCredentials→AwsBasicCredentials/AwsSessionCredentialsClientConfiguration→ v2ClientConfigurationinner class (matches JDBC'sSnowflakeS3Client.ClientConfiguration)ObjectMetadataSSE →ServerSideEncryption.fromValue()+ssekmsKeyId()(uses v2 constants, no hardcoded strings)AmazonS3Exception→S3Exceptionwith null-safeawsErrorDetails()checkRegionUtils.getRegion()→Region.of()BufferedInputStream(Risk 1 fix)SSLConnectionSocketFactoryremoved (v2 Netty handles TLS natively)connectionTimeout,readTimeout,writeTimeout,connectionAcquisitionTimeout(matching JDBC)ex.getCause()forSdkException(matching JDBC) —CompletableFuture.join()wraps exceptions inCompletionExceptionCompletionExceptionhandling added for non-SDK async failuresPutObjectRequestviaIcebergS3ObjectMetadata.getS3PutObjectRequest()(matching JDBC'sS3ObjectMetadatapattern)PutObjectRequest(matching JDBC)!=→.equals()/.isEmpty()for endpoint checks.scheme()for protocol (HTTP/HTTPS),.useEnvironmentVariableValues(false),.useSystemPropertyValues(false)(matching JDBC'sCloudStorageProxyFactory)IcebergS3ObjectMetadata.java:
ObjectMetadatawrapperMap<String, String>+ fields (likeIcebergCommonObjectMetadata)getS3PutObjectRequest()method withChecksumAlgorithm.CRC32(matches JDBC'sS3ObjectMetadata)IcebergStorageClientFactory.java:
IcebergS3Client.ClientConfigurationwithmaxConnections,maxErrorRetry,connectionTimeout,socketTimeout(matches JDBC'sStorageClientFactory.createS3Client())JdbcHttpUtil.getConnectionTimeout()/getSocketTimeout()JDBC v2 reference code
Each change follows patterns from the JDBC driver's completed v2 migration at
snowflakedb/snowflake-jdbc:SnowflakeS3Client.java#L165-L204SnowflakeS3Client.java#L772-L796StorageClientFactory.java#L117-L121SnowflakeS3Client.java#L196-L202CloudStorageProxyFactory.java#L105-L112SnowflakeS3Client.java#L203SnowflakeS3Client.java#L545SnowflakeS3Client.java#L645-L650S3ObjectMetadata.java#L72-L78S3ErrorHandler.java#L67-L68SnowflakeS3Client.java#L727-L730SnowflakeS3Client.java#L185JDBC bug fix PRs applied
CipherInputStreamdoesn't support mark/reset; SDK retry reads from wrong position → wrap inBufferedInputStreamawsErrorDetails()can return null → null-check before.errorCode()Expected differences from JDBC (Iceberg path does not need these)
SFSession-based proxy (CloudStorageProxyFactory)proxyPropertiesdirectlyClientOverrideConfiguration+ExecutionInterceptorHttpHeadersCustomizersession featuredownload()/downloadToStream()renew()/shutdown()listObjects()/getObjectMetadata()EncryptionProvider)S3ObjectMetadata(HeadObjectResponse)constructorexecution.interceptorsshade patchingStacked on #1147.
Test plan
mvn compiler:compilepassesaws-crtproperly excluded from dependency tree🤖 Generated with Claude Code