Skip to content

Add S3 timeout parameters to ArrowCursor for role assumption support#610

Merged
laughingman7743 merged 2 commits intomasterfrom
fix-issue-609-arrow-timeout-parameters
Oct 18, 2025
Merged

Add S3 timeout parameters to ArrowCursor for role assumption support#610
laughingman7743 merged 2 commits intomasterfrom
fix-issue-609-arrow-timeout-parameters

Conversation

@laughingman7743
Copy link
Member

@laughingman7743 laughingman7743 commented Oct 18, 2025

Summary

Adds connect_timeout and request_timeout parameters to ArrowCursor and AsyncArrowCursor to resolve timeout issues when using role assumption with STS or experiencing high latency to S3.

Problem

When using PyAthena's ArrowCursor with STS role assumption, S3 HeadObject operations can take longer than the default 3-second timeout, causing NETWORK_CONNECTION errors (issue #609). This is particularly problematic when:

  • Using cross-account role assumption
  • Connecting from regions far from the S3 bucket
  • Experiencing network latency

Solution

This PR exposes PyArrow's S3FileSystem connect_timeout and request_timeout parameters through PyAthena's ArrowCursor API, allowing users to configure appropriate timeout values for their environment.

Changes

  • ArrowCursor: Added connect_timeout and request_timeout parameters
  • AsyncArrowCursor: Added connect_timeout and request_timeout parameters
  • AthenaArrowResultSet: Updated to pass timeout parameters to PyArrow S3FileSystem
  • PyArrow Version: Updated minimum version from 7.0.0 to 10.0.0 (timeout parameters require PyArrow >= 10.0.0)
  • Documentation: Added comprehensive docstrings and new "S3 Timeout Configuration" section to Arrow docs
  • Tests: Added unit tests for both int and float timeout values

Usage Example

from pyathena.arrow.cursor import ArrowCursor

# For environments with role assumption or high latency
cursor = connection.cursor(
    ArrowCursor,
    connect_timeout=10.0,  # Socket connection timeout (seconds)
    request_timeout=30.0   # Request timeout (seconds)
)

cursor.execute("SELECT * FROM my_table")
table = cursor.as_arrow()

Default Behavior

When timeout parameters are not specified (None), PyArrow uses AWS SDK defaults:

  • connect_timeout: ~1 second
  • request_timeout: ~3 seconds

Version Requirements

Important: This feature requires PyArrow >= 10.0.0, which added support for S3FileSystem timeout configuration (ARROW-16521). The minimum PyArrow version has been updated from 7.0.0 to 10.0.0 in:

  • pyproject.toml (both arrow extra and dev dependencies)
  • README.rst

Testing

  • ✅ All code quality checks pass (ruff, mypy)
  • ✅ New tests for timeout parameters (int and float values)
  • ✅ All 38 existing ArrowCursor tests pass (backward compatibility confirmed)

Related Issues

Fixes #609

🤖 Generated with Claude Code

laughingman7743 and others added 2 commits October 18, 2025 13:01
This commit adds connect_timeout and request_timeout parameters to
ArrowCursor and AsyncArrowCursor, addressing timeout issues when using
role assumption with STS or experiencing high latency to S3.

Changes:
- Add connect_timeout and request_timeout parameters to ArrowCursor.__init__
- Add connect_timeout and request_timeout parameters to AsyncArrowCursor.__init__
- Pass timeout parameters to AthenaArrowResultSet
- Update AthenaArrowResultSet to configure PyArrow S3FileSystem with timeout values
- Add comprehensive docstrings explaining timeout parameters and use cases
- Add tests for both int and float timeout values

The default timeout values are None, which uses AWS SDK defaults
(typically 1s for connect, 3s for request). Users experiencing timeout
errors can now increase these values as needed.

Fixes #609

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Update PyArrow minimum version from 7.0.0 to 10.0.0
  (timeout parameters were added in Arrow 10.0.0 via ARROW-16521)
- Add "S3 Timeout Configuration" section to Arrow documentation
- Update README.rst with new PyArrow version requirement
- Document timeout parameters usage examples for both ArrowCursor and AsyncArrowCursor

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@laughingman7743 laughingman7743 marked this pull request as ready for review October 18, 2025 04:11
@laughingman7743 laughingman7743 merged commit 79f7e5d into master Oct 18, 2025
5 checks passed
@laughingman7743 laughingman7743 deleted the fix-issue-609-arrow-timeout-parameters branch October 18, 2025 04:29
laughingman7743 added a commit that referenced this pull request Oct 18, 2025
Update uv.lock to reflect the PyArrow version requirement change
from >=7.0.0 to >=10.0.0 introduced in PR #610.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Slow authentication with arrow cursor results in NETWORK_CONNECTION during HeadObject operation

1 participant