forked from cloudera/impyla
-
Notifications
You must be signed in to change notification settings - Fork 0
Sync with Cloudera Impyla #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
thyarles
wants to merge
52
commits into
smartlab-br:master
Choose a base branch
from
cloudera:master
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…515) that are applicable to username/password auth and JWT auth are not mixed together on the same call to the connect method. These additional checks prevent confusion about which authentication method is actually used for the connection. New tests were added to cover the new checks.
Impyla gets cookies from an HTTMessage object formed from a
response to an HTTP message. The format of cookies in the message
differs across the python versions. In Python 2 the HTTPMessage is a
mimetools.Message object, and the Set-Cookie values all appear in a
single header, separated by newlines. In Python 3 the HTTPMessage is an
email.message.Message, and the Set-Cookie values appear as duplicate
headers.
Add platform dependent code to get_all_matching_cookies() that loads
cookies from all the Set-Cookie headers.
TESTING:
Changed test_get_all_matching_cookies() to build the HTTPMessage
using a new utility method that creates Set-Cookie headers in
the appropriate format for the platform.
I hand tested with a proxy that inserted 3 cookies into http
responses. I added the 3 cookie names to the list of default
cookies. I ran TestHttpConnect.test_simple_connect() connecting
to Impala through the proxy and verified with the debugger that
the cookies were returned correctly from
get_all_matching_cookies() in both python2 and python3.
Co-authored-by: cravani <cravani@cloudera.com>
Current Usage part works well for Impala users but will fail for Hive users because of the `auth_mechanism` default value. This adds a comment targeted towards Hive users so they can quick start too.
ImpalaService.thrift is updated to contain CloseImpalaOperation, which can be used get the number of modified rows in DMLs. This is not just a copy, some parts of ImpalaService.thrift are not included to avoid pulling in more Thrift files as dependencies. Also updated process_thrift.sh to work with current Impala env vars.
sqlalchemy 2 (now default on pip in Python 3) removed some functions used in tests. Updated these to work both with sqlalchemy 2.* and 1.* (>=1.2).
* Support Cursor.rowcount and close finished queries With current Impala server rowcount support needs DMLs to be closed with CloseImpalaOperation() as there is no simpler way to get the number of modifed rows. See https://issues.apache.org/jira/browse/IMPALA-12647 for alternatives. This change adds option close_finished_queries for cursors with default True. Setting it to False brings back the old behavior. If queries are closed after finishing queries, calling get_log RPC is no longer possible. If close_finished_queries is true then the logs are fetched and stored before closing to query to be able to return the saved results with get_log. Generally get_log shouldn't be a too expensive RPC. Another potential side-effect is that get_profile may fail as Impala can discard the runtime profile after the query is closed (see Impala flag query_log_size). Despite the above side effects closing the queries seems a better default behavior as it helps avoiding queries hanging in the "waiting to be closed" state and provides reliable rowcount. This is also consistent with the way impala-shell works. Testing: - rowcount already had good coverage in DBAPI2 compliance tests (e.g. test_mixedfetch) - new tests were added for some missing rowcount cases and for getting warning/error log for closed queries * Fix review comments
The old version used deprecated functions that were removed in Python 3.12. The change only contains code generated by: versioneer install
Co-authored-by: David Hulsman <david.hulsman@tennet.eu>
thyarles
commented
Mar 25, 2024
Member
Author
thyarles
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
This function is called for every query during normal execution, making this info level too verbose.
* Add text() wrapper for metadata queries. Remove tablename from retrieve columnname results. * Update sqlalchemy.py remove tablename from get_columns result. * replace 'r' in re.sub argument
* Avoid retrying non-idempotent RPCs in binary connections (#549) See the #549 for the detailed analyses of the issue. The fix works similarly to the existing solution for http connections: - each RPC knows whether it is idempotent - if the error comes from establishing the connection, then retry - if the error comes from executing the RPC, only retry if the RPC is idempotent A test is added that relies on slow metadata handling in the Impala cluster to trigger timouts. It would be nice to add wider and more reliable tests in the future similarly to the http tests in test_hs2_fault_injection.py * Fix review comments * Fix review comment
The goal is to support "long poll" (IMPALA-13294). When query option long_polling_time_ms is set, the impala server will wait in GetOperationStatus for this time (or until the query status changes). This allows detecting earlier that a query has finished without making GetOperationStatus RPCs more frequent. If long_polling_time_ms is not used then the effect should be minor - GetOperationStatus is quick RPC so the time it takes should mainly come from network delay. _get_sleep_interval() is not changed (min 0.01s, max 1s) to avoid regression in existing use cases. It could be useful to override this in a later patch based on the value of long_polling_time_ms.
Supported Python versions are also updated in setup.py.
The issue was introduced in #542. Caught by Impala's LdapImpylaHttpTest.
bitarray 3.0 removed Python 2.7 support, which led to failing to build sdist for Python 2.7. Besides Python 2.7 support several function were removed in 3.0, so it seems safer to use 2.* at the moment. Impala pins the dependency to 2.3 - I preferred to be more flexible in Impyla.
In a modern Impala deployment hs2-http protocol is used in a system where http messages pass through one or more http proxies. Some of these proxies add their own http message headers to messages as they are forwarded. It would be useful to test Impala with some of the message headers that are added by http proxies. In particular the case where there are multiple http headers with the same name is hard to simulate with clients such as Impyla or Impala Shell. This is partly because these clients store http headers in a Python dict which does not allow duplicate keys. Extend the Impyla connect() method to add a 'get_user_custom_headers_func' parameter. This specifies a function that is called as http message headers are being written. The function should return a list of tuples, each tuple containing a key-value pair. This allows duplicate headers to be set on outgoing messages. TESTING Add test code which implements a reverse http proxy, which allows test code to access the outgoing http message headers generated by Impyla. Add a test using this proxy which validates the new feature. The new test code requires a new python package 'requests'. I think there is not away to add this requirement automatically so I added a note to README.md All tests pass on Python2 and Python3. Fix TestHS2FaultInjection to use setup_method() and teardown_method() so as to work in Python3
Also has improvements to test_http_connect.py#test_duplicate_headers: - added requests as dependency to tox.ini - fixed error message during http_proxy_server cleanup
Pass through retries. Adapted tests to check whether downstream rpc operations use configured retry amounts. Fixes #563 --------- Co-authored-by: Paul Mayer <dipaulmayer@gmail.com> Co-authored-by: Mayer, Paul (mayerpa) <paul.mayer@ecb.europa.eu>
…tial encodings (#562) Co-authored-by: Paul Mayer <dipaulmayer@gmail.com>
…ected argument 'info_cache' (#568) * Fixed the incompatibility of df.to_sql() with impala. --------- Co-authored-by: Jatin Rathour <152-Jatin.rathour@users.noreply.gitlab.example.com> Co-authored-by: Csaba Ringhofer <csringhofer@cloudera.com>
- add brackets to ipv6 addresses in urls (e.g. https://[::1]:28000) - fix tests to work with IMPYLA_TEST_HOST=::1 Tested with WIP patch for IPv6 support in Impala: https://gerrit.cloudera.org/#/c/22527/
- remove versioneer and set version manually - add python 3.12/3.13 to tox.ini versioneer has no version which supports both Python 2.7 and >=3.12 Though python2.7 is EOL, Apache Impala tests still use it, making its removal problematic (see #532, #533). Once Python 2.7 support is removed versioneer or some other similar tool can be added again. Testing: - ran tests with supported Python versions (including 3.12 and 3.13)
This patch update impala/thrift/ to follow latest definition from Impala 4.5.0. Did some hand edit in ImpalaService.thrift to exclude thrift files that is unrelated to query profile such as Frontend.thrift, BackendGflags.thrift, and Query.thrift. Updated DEVELOP.md and process_thrift.sh to mention this issue. Use IMPALA_THRIFT_PY_VERSION instead of IMPALA_THRIFT_CPP_VERSION. Both point to thrift-0.16.0. Testing: - Update test_get_log to validate attributes existence. - Run test with following command: tox -- -ktest_get_log
This fixes the following warning: ADeprecationWarning: The dbapi() classmethod on dialect classes has been renamed to import_dbapi(). The issue was mentioned in #214
* Update build_summary_table function This patch update build_summary_table to match the same function in impala-shell. https://github.com/apache/impala/blob/a07bf84/shell/impala_client.py#L113 Testing: Run and pass following command ``` tox -- -ktest_build_summary_table ``` * Copy exec_summary.py from apache/impala@e73e2d4 * Remove one more cur.close_operation()
Winkerberos is an alternative to kerberos package on Windows and has the same api, so can be used as a drop in replacement. Based on another PR: #504 The difference is that the current patch prefers kerberos if both are available to avoid the breaking existing workflows. setup.py is also not modified. To use winkerberos it has to be installed independently from impyla and impyla should be installed without [kerberos] extra.
test_build_summary_table failed based on what other tests ran in the same session due to using the same session level cursor which could have altered query options from other tests. The patch switches to function level cursor to isolate tests and creates new fixture session_cur to use in session/module level fixtures that need a cursor. Also fixes test_pandas_dataframe_to_sql on Python 3.7 by changing tox to use 1.* sqlalchemy
Also removed Python 3.6 (EOL since 2021) from the supported versions. It is still likely to work but testing it is a nuisance.
The validation tried to install impyla in an environment with very old setuptools (18.0.1) that didn't support the environment markers (e.g 'bitarray<3; python_version < "3"') added in #588. Installing fresh setuptools didn't work with with easy_install so moved to using pip in the py2.7 env too.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.