Redshift Profiler PR1: Redshift connection and credential support in database_manager#2305
Redshift Profiler PR1: Redshift connection and credential support in database_manager#2305ysmx-github wants to merge 4 commits intomainfrom
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2305 +/- ##
==========================================
- Coverage 66.41% 66.23% -0.19%
==========================================
Files 99 99
Lines 9094 9130 +36
Branches 974 981 +7
==========================================
+ Hits 6040 6047 +7
- Misses 2878 2907 +29
Partials 176 176 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
✅ 145/145 passed, 8 flaky, 4 skipped, 41m57s total Flaky tests:
Running from acceptance #4001 |
84c0bda to
7b0e873
Compare
39cc6de to
84c0bda
Compare
030e2f1 to
1abc3cd
Compare
… .gitignore) Co-authored-by: Cursor <cursoragent@cursor.com>
1abc3cd to
b03149d
Compare
- Remove fetch_script (multi-statement); use single-query fetch only - Return empty FetchResult for DDL (cursor.description is None) so prepare steps succeed Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
small nits: but overall good to go.
main thing is https://github.com/databrickslabs/lakebridge/blob/main/docs/lakebridge/docs/dev/contributing.md#gpg-signing all commits need to signed can you fix that by following the steps and rewriting the commit history in git for this PR.
@asnare can you take a look,
What about https://docs.localstack.cloud/aws/services/redshift/ for testing?
| @@ -1,10 +1,10 @@ | |||
| # Databricks notebook source | |||
There was a problem hiding this comment.
this is not notebook.
|
|
||
| [tool.mypy] | ||
| exclude = ["tests/resources/.*"] | ||
| exclude = ["tests/resources/.*", "src/databricks/labs/lakebridge/resources/assessments/.*"] |
There was a problem hiding this comment.
| exclude = ["tests/resources/.*", "src/databricks/labs/lakebridge/resources/assessments/.*"] | |
| exclude = ["tests/resources/.*"] |
| # Azure SDK dependencies for linting resources/assessments folder | ||
| "azure-identity~=1.19.0", | ||
| "azure-monitor-query~=1.4.0", | ||
| "azure-synapse-artifacts~=0.20.0", |
There was a problem hiding this comment.
This is needed fo synapse assessment.
| "cryptography>=44.0.2,<46.1.0", | ||
| "pyodbc>=5.2,<5.4", | ||
| "SQLAlchemy~=2.0.40", | ||
| "psycopg2-binary~=2.9.10", |
There was a problem hiding this comment.
do we still need this?
| ignore-patterns = ["^\\.#"] | ||
|
|
||
| # Ignore files under tests/resources and resources/assessments | ||
| # Ignore files under tests/resources |
There was a problem hiding this comment.
even this should match main.
asnare
left a comment
There was a problem hiding this comment.
Aside from a few nits, some things I'm missing:
- For new dependencies, we need a note on the licensing terms. Often this is straightforward but for anything AWS-related I'd like to see it.
- It's unclear to me why this new source doesn't use
SQLAlchemylike the others? That's made the implementation more complex than otherwise, with accompanying duplicate code because it doesn't extend_BaseConnector. What makes Redshift special and different to the other sources we support? - We absolutely need some test coverage: we don't have the resources to manually test this, and without automated coverage of some sort the chances of this being inadvertently broken are high. I understand that for this integration testing is going to be difficult to set up, but at the very least there needs to be some unit coverage.
| class DatabaseConnector(contextlib.AbstractContextManager): | ||
| @abstractmethod | ||
| def _connect(self) -> Engine: | ||
| def _connect(self) -> Engine | RedshiftConnection: |
There was a problem hiding this comment.
This looks suspicious to me: why does the interface have to cater specifically for RedshiftConnection? Shouldn't RedshiftConnection conform to the existing contract?
There was a problem hiding this comment.
this is because there used to be bridge lib
sqlalchemy/sqlalchemy#11950 which is abandon.
This is compromise given the connector returns alchmey compliant api for fetch. which is self contained within the database_manager.py, but happy to hear alternatives.
| import redshift_connector # type: ignore[import-untyped] | ||
| from redshift_connector import Connection as RedshiftConnection # type: ignore[import-untyped] |
There was a problem hiding this comment.
For a given module it's normally best to either use one or the other…
| import redshift_connector # type: ignore[import-untyped] | |
| from redshift_connector import Connection as RedshiftConnection # type: ignore[import-untyped] | |
| import redshift_connector # type: ignore[import-untyped] |
To clarify a bit here:
Getting back to your original question about LocalStack:
|
Changes
This PR adds Redshift as a supported profiler assessment platform.
What does this PR do?
Relevant implementation details
Caveats/things to watch out for when reviewing:
Linked issues
Resolves #..
Functionality
databricks labs lakebridge ...Tests
tests/resources/assessments/pipeline_config_main_redshift.yml: pipeline config that runs that script.