Skip to content

Conversation

@thyarles
Copy link
Member

No description provided.

jasonmfehr and others added 16 commits June 7, 2023 21:31
…515)

that are applicable to username/password auth and JWT auth are not
mixed together on the same call to the connect method.

These additional checks prevent confusion about which authentication
method is actually used for the connection.

New tests were added to cover the new checks.
Impyla gets cookies from an HTTMessage object formed from a
response to an HTTP message. The format of cookies in the message
differs across the python versions. In Python 2 the HTTPMessage is a
mimetools.Message object, and the Set-Cookie values all appear in a
single header, separated by newlines. In Python 3 the HTTPMessage is an
email.message.Message, and the Set-Cookie values appear as duplicate
headers.

Add platform dependent code to get_all_matching_cookies() that loads
cookies from all the Set-Cookie headers.

TESTING:
    Changed test_get_all_matching_cookies() to build the HTTPMessage
    using a new utility method that creates Set-Cookie headers in
    the appropriate format for the platform.

    I hand tested with a proxy that inserted 3 cookies into http
    responses. I added the 3 cookie names to the list of default
    cookies. I ran TestHttpConnect.test_simple_connect() connecting
    to Impala through the proxy and verified with the debugger that
    the cookies were returned correctly from
    get_all_matching_cookies() in both python2 and python3.
Co-authored-by: cravani <cravani@cloudera.com>
Current Usage part works well for Impala users but will fail for Hive users because of the `auth_mechanism` default value.
This adds a comment targeted towards Hive users so they can quick start too.
ImpalaService.thrift is updated to contain CloseImpalaOperation,
which can be used get the number of modified rows in DMLs.
This is not just a  copy, some parts of ImpalaService.thrift are
not included to avoid pulling in more Thrift files as dependencies.

Also updated process_thrift.sh to work with current Impala env vars.
sqlalchemy 2 (now default on pip in Python 3) removed some
functions used in tests. Updated these to work both with
sqlalchemy 2.* and 1.* (>=1.2).
* Support Cursor.rowcount and close finished queries

With current Impala server rowcount support needs DMLs to be
closed with CloseImpalaOperation() as there is no simpler way
to get the number of modifed rows.
See https://issues.apache.org/jira/browse/IMPALA-12647 for
alternatives.

This change adds option close_finished_queries for cursors
with default True. Setting it to False brings back the old
behavior.

If queries are closed after finishing queries, calling get_log
RPC is no longer possible. If close_finished_queries is true
then the logs are fetched and stored before closing to query
to be able to return the saved results with get_log. Generally
get_log shouldn't be a too expensive RPC.

Another potential side-effect is that get_profile may fail as
Impala can discard the runtime profile after the query is
closed (see Impala flag query_log_size).

Despite the above side effects closing the queries seems a better
default behavior as it helps avoiding queries hanging in the
"waiting to be closed" state and provides reliable rowcount. This
is also consistent with the way impala-shell works.

Testing:
- rowcount already had good coverage in DBAPI2 compliance tests
  (e.g. test_mixedfetch)
- new tests were added for some missing rowcount cases and for
  getting warning/error log for closed queries

* Fix review comments
The old version used deprecated functions that were
removed in Python 3.12.

The change only contains code generated by:
versioneer install
Python 3.12 removed deprecated certfile and key_file
arguments from http_client.HTTPSConnection. These should
be always empty in Impyla as the server is never verified
in https connections (see #362).
* Add tox.ini to help testing with multiple python versions

* Revert "Update versioneer to 0.29 (needed for Python 3.12) (#532)"

This reverts commit b98ffef.
Co-authored-by: David Hulsman <david.hulsman@tennet.eu>
@thyarles thyarles requested a review from danielamguerra March 25, 2024 21:51
Copy link
Member Author

@thyarles thyarles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

csringhofer and others added 3 commits April 15, 2024 17:41
This function is called for every query during normal execution, making
this info level too verbose.
* Add text() wrapper for metadata queries.

Remove tablename from retrieve columnname results.

* Update sqlalchemy.py

remove tablename from get_columns result.

* replace 'r' in re.sub argument
@thyarles thyarles marked this pull request as draft April 16, 2024 16:10
@thyarles thyarles added the wontfix This will not be worked on label Apr 16, 2024
WWakker and others added 7 commits August 7, 2024 19:50
Adds wildcard ('*') support to the `http_cookie_names` connect property
to preserve all cookies returned by the server. Preserves prior behavior
for any other value of `http_cookie_names`.
* Avoid retrying non-idempotent RPCs in binary connections (#549)

See the #549 for the detailed analyses of the issue.

The fix works similarly to the existing solution for http connections:
- each RPC knows whether it is idempotent
- if the error comes from establishing the connection, then retry
- if the error comes from executing the RPC, only retry if the RPC
  is idempotent

A test is added that relies on slow metadata handling in the
Impala cluster to trigger timouts. It would be nice to add wider
and more reliable tests in the future similarly to the http tests
in test_hs2_fault_injection.py

* Fix review comments

* Fix review comment
The goal is to support "long poll" (IMPALA-13294). When query option
long_polling_time_ms is set, the impala server will wait in
GetOperationStatus for this time (or until the query status changes).
This allows detecting earlier that a query has finished without making
GetOperationStatus RPCs more frequent.

If long_polling_time_ms is not used then the effect should be minor -
GetOperationStatus is quick RPC so the time it takes should mainly
come from network delay.

_get_sleep_interval() is not changed (min 0.01s, max 1s) to avoid
regression in existing use cases. It could be useful to override
this in a later patch based on the value of long_polling_time_ms.
Supported Python versions are also updated in setup.py.
The issue was introduced in #542.
Caught by Impala's LdapImpylaHttpTest.
csringhofer and others added 26 commits September 7, 2024 10:21
bitarray 3.0 removed Python 2.7 support, which led to failing to
build sdist for Python 2.7. Besides Python 2.7 support several
function were removed in 3.0, so it seems safer to use 2.* at
the moment. Impala pins the dependency to 2.3 - I preferred to
be more flexible in Impyla.
In a modern Impala deployment hs2-http protocol is used in a system
where http messages pass through one or more http proxies. Some of
these proxies add their own http message headers to messages as they
are forwarded. It would be useful to test Impala with some of the
message headers that are added by http proxies. In particular the case
where there are multiple http headers with the same name is hard to
simulate with clients such as Impyla or Impala Shell. This is partly
because these clients store http headers in a Python dict which does
not allow duplicate keys.

Extend the Impyla connect() method to add
a 'get_user_custom_headers_func' parameter. This specifies a function
that is called as http message headers are being written. The function
should return a list of tuples, each tuple containing a key-value pair.
This allows duplicate headers to be set on outgoing messages.

TESTING
Add test code which implements a reverse http proxy, which allows test
code to access the outgoing http message headers generated by Impyla.
Add a test using this proxy which validates the new feature.

The new test code requires a new python package 'requests'. I think
there is not away to add this requirement automatically so I added a
note to README.md

All tests pass on Python2 and Python3.

Fix TestHS2FaultInjection to  use setup_method() and teardown_method()
so as to work in Python3
Also has improvements to test_http_connect.py#test_duplicate_headers:
- added requests as dependency to tox.ini
- fixed error message during http_proxy_server cleanup
Pass through retries. Adapted tests to check whether downstream
rpc operations use configured retry amounts.

Fixes #563

---------

Co-authored-by: Paul Mayer <dipaulmayer@gmail.com>
Co-authored-by: Mayer, Paul (mayerpa) <paul.mayer@ecb.europa.eu>
…tial encodings (#562)


Co-authored-by: Paul Mayer <dipaulmayer@gmail.com>
…ected argument 'info_cache' (#568)

* Fixed the incompatibility of df.to_sql() with impala.

---------

Co-authored-by: Jatin Rathour <152-Jatin.rathour@users.noreply.gitlab.example.com>
Co-authored-by: Csaba Ringhofer <csringhofer@cloudera.com>
- add brackets to ipv6 addresses in urls (e.g. https://[::1]:28000)
- fix tests to work with IMPYLA_TEST_HOST=::1

Tested with WIP patch for IPv6 support in Impala:
https://gerrit.cloudera.org/#/c/22527/
- remove versioneer and set version manually
- add python 3.12/3.13 to tox.ini

versioneer has no version which supports both Python 2.7 and >=3.12
Though python2.7 is EOL, Apache Impala tests still use it, making its
removal problematic (see #532, #533). Once Python 2.7 support is
removed versioneer or some other similar tool can be added again.

Testing:
- ran tests with supported Python versions (including 3.12 and 3.13)
This patch update impala/thrift/ to follow latest definition from
Impala 4.5.0.

Did some hand edit in ImpalaService.thrift to exclude thrift files that
is unrelated to query profile such as Frontend.thrift,
BackendGflags.thrift, and Query.thrift. Updated DEVELOP.md and
process_thrift.sh to mention this issue. Use IMPALA_THRIFT_PY_VERSION
instead of IMPALA_THRIFT_CPP_VERSION. Both point to thrift-0.16.0.

Testing:
- Update test_get_log to validate attributes existence.
- Run test with following command:
  tox -- -ktest_get_log
This avoids a warning in sqlalchemy. It is not clear to me whether
it could be enabled, it would need a deeper dive into sqlalchemy.
Other dialects I checked also set it to False.
Python 2.7 needs bitarray < 3, but on Python bitarray 3.* can be used.
The bitarray functionality used in impyla is minimal and didn't change
with the major version bump.
This fixes the following warning:
ADeprecationWarning: The dbapi() classmethod on dialect classes has been renamed to import_dbapi().

The issue was mentioned in #214
* Update build_summary_table function

This patch update build_summary_table to match the same function in
impala-shell.
https://github.com/apache/impala/blob/a07bf84/shell/impala_client.py#L113

Testing:
Run and pass following command
```
tox -- -ktest_build_summary_table
```

* Copy exec_summary.py from apache/impala@e73e2d4

* Remove one more cur.close_operation()
Winkerberos is an alternative to kerberos package on Windows and
has the same api, so can be used as a drop in replacement.

Based on another PR:
#504

The difference is that the current patch prefers kerberos if both are
available to avoid the breaking existing workflows. setup.py is
also not modified. To use winkerberos it has to be installed
independently from impyla and impyla should be installed without
[kerberos] extra.
test_build_summary_table failed based on what other tests ran
in the same session due to using the same session level cursor
which could have altered query options from other tests.
The patch switches to function level cursor to isolate tests and
creates new fixture session_cur to use in session/module level
fixtures that need a cursor.

Also fixes test_pandas_dataframe_to_sql on Python 3.7 by changing
tox to use 1.* sqlalchemy
Also removed Python 3.6 (EOL since 2021) from the supported versions.
It is still likely to work but testing it is a nuisance.
The validation tried to install impyla in an environment with
very old setuptools (18.0.1) that didn't support the environment
markers (e.g 'bitarray<3; python_version < "3"') added in #588.
Installing fresh setuptools didn't work with with easy_install
so moved to using pip in the py2.7 env too.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

wontfix This will not be worked on

Projects

None yet

Development

Successfully merging this pull request may close these issues.