Skip to content

Redshift Profiler PR1: Redshift connection and credential support in database_manager#2305

Open
ysmx-github wants to merge 4 commits intomainfrom
feature/redshift-pr1-database-manager
Open

Redshift Profiler PR1: Redshift connection and credential support in database_manager#2305
ysmx-github wants to merge 4 commits intomainfrom
feature/redshift-pr1-database-manager

Conversation

@ysmx-github
Copy link

@ysmx-github ysmx-github commented Feb 19, 2026

Changes

This PR adds Redshift as a supported profiler assessment platform.

What does this PR do?

  • Adds Redshift as a supported platform for the profiler assessment (alongside Synapse).
  • Implements Redshift credential flows: database password, federated user, secrets manager (ARN), temporary credentials db user, temporary credentials IAM; allows SSL for all; configurator prompts and connection handling in database_manager.py.
  • Files changed
    • .gitignore
    • pyproject.toml
    • src/databricks/labs/lakebridge/connections/database_manager.py

Relevant implementation details

  • Credentials: database password, federated user, secrets manager (ARN), temporary credentials db user, temporary credentials IAM. Federated and Secrets Manager use an optional lazy boto3 import.

Caveats/things to watch out for when reviewing:

Linked issues

Resolves #..

Functionality

  • added relevant user documentation
  • added new CLI command
  • modified existing command: databricks labs lakebridge ...
  • ... +add your own

Tests

  1. Manually Tested all credential flows on all clusters in AWS Sandbox account aws-sandbox-field-eng (332745928618)
    tests/resources/assessments/pipeline_config_main_redshift.yml: pipeline config that runs that script.
  • manually tested
  • added unit tests
  • added integration tests

@codecov
Copy link

codecov bot commented Feb 19, 2026

Codecov Report

❌ Patch coverage is 36.00000% with 32 lines in your changes missing coverage. Please review.
✅ Project coverage is 66.23%. Comparing base (41c3f9a) to head (ce0e10e).

Files with missing lines Patch % Lines
...ks/labs/lakebridge/connections/database_manager.py 36.00% 32 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2305      +/-   ##
==========================================
- Coverage   66.41%   66.23%   -0.19%     
==========================================
  Files          99       99              
  Lines        9094     9130      +36     
  Branches      974      981       +7     
==========================================
+ Hits         6040     6047       +7     
- Misses       2878     2907      +29     
  Partials      176      176              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link

github-actions bot commented Feb 19, 2026

✅ 145/145 passed, 8 flaky, 4 skipped, 41m57s total

Flaky tests:

  • 🤪 test_installs_and_runs_local_bladebridge (20.765s)
  • 🤪 test_installs_and_runs_pypi_bladebridge (28.313s)
  • 🤪 test_transpiles_informatica_to_sparksql_non_interactive[False] (18.114s)
  • 🤪 test_transpiles_informatica_to_sparksql_non_interactive[True] (18.531s)
  • 🤪 test_transpiles_informatica_to_sparksql (20.282s)
  • 🤪 test_transpile_teradata_sql (19.69s)
  • 🤪 test_transpile_teradata_sql_non_interactive[True] (5.974s)
  • 🤪 test_transpile_teradata_sql_non_interactive[False] (5.986s)

Running from acceptance #4001

@ysmx-github ysmx-github changed the title Redshift connection and credential support in database_manager Redshift Profiler PR1: Redshift connection and credential support in database_manager Feb 19, 2026
@sundarshankar89 sundarshankar89 added feat/profiler Issues related to profilers do-not-merge labels Feb 20, 2026
@ysmx-github ysmx-github force-pushed the feature/redshift-pr1-database-manager branch from 84c0bda to 7b0e873 Compare February 24, 2026 08:51
@ysmx-github ysmx-github force-pushed the feature/redshift-pr1-database-manager branch from 39cc6de to 84c0bda Compare February 24, 2026 09:31
@ysmx-github ysmx-github force-pushed the feature/redshift-pr1-database-manager branch from 030e2f1 to 1abc3cd Compare February 24, 2026 20:59
… .gitignore)

Co-authored-by: Cursor <cursoragent@cursor.com>
@ysmx-github ysmx-github force-pushed the feature/redshift-pr1-database-manager branch from 1abc3cd to b03149d Compare February 24, 2026 21:28
- Remove fetch_script (multi-statement); use single-query fetch only
- Return empty FetchResult for DDL (cursor.description is None) so prepare steps succeed

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Collaborator

@sundarshankar89 sundarshankar89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small nits: but overall good to go.
main thing is https://github.com/databrickslabs/lakebridge/blob/main/docs/lakebridge/docs/dev/contributing.md#gpg-signing all commits need to signed can you fix that by following the steps and rewriting the commit history in git for this PR.

@asnare can you take a look,
What about https://docs.localstack.cloud/aws/services/redshift/ for testing?

@@ -1,10 +1,10 @@
# Databricks notebook source
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not notebook.


[tool.mypy]
exclude = ["tests/resources/.*"]
exclude = ["tests/resources/.*", "src/databricks/labs/lakebridge/resources/assessments/.*"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
exclude = ["tests/resources/.*", "src/databricks/labs/lakebridge/resources/assessments/.*"]
exclude = ["tests/resources/.*"]

Comment on lines -92 to -95
# Azure SDK dependencies for linting resources/assessments folder
"azure-identity~=1.19.0",
"azure-monitor-query~=1.4.0",
"azure-synapse-artifacts~=0.20.0",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed fo synapse assessment.

"cryptography>=44.0.2,<46.1.0",
"pyodbc>=5.2,<5.4",
"SQLAlchemy~=2.0.40",
"psycopg2-binary~=2.9.10",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still need this?

ignore-patterns = ["^\\.#"]

# Ignore files under tests/resources and resources/assessments
# Ignore files under tests/resources
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even this should match main.

Copy link
Contributor

@asnare asnare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from a few nits, some things I'm missing:

  • For new dependencies, we need a note on the licensing terms. Often this is straightforward but for anything AWS-related I'd like to see it.
  • It's unclear to me why this new source doesn't use SQLAlchemy like the others? That's made the implementation more complex than otherwise, with accompanying duplicate code because it doesn't extend _BaseConnector. What makes Redshift special and different to the other sources we support?
  • We absolutely need some test coverage: we don't have the resources to manually test this, and without automated coverage of some sort the chances of this being inadvertently broken are high. I understand that for this integration testing is going to be difficult to set up, but at the very least there needs to be some unit coverage.

class DatabaseConnector(contextlib.AbstractContextManager):
@abstractmethod
def _connect(self) -> Engine:
def _connect(self) -> Engine | RedshiftConnection:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks suspicious to me: why does the interface have to cater specifically for RedshiftConnection? Shouldn't RedshiftConnection conform to the existing contract?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is because there used to be bridge lib

sqlalchemy/sqlalchemy#11950 which is abandon.
This is compromise given the connector returns alchmey compliant api for fetch. which is self contained within the database_manager.py, but happy to hear alternatives.

Comment on lines +17 to +18
import redshift_connector # type: ignore[import-untyped]
from redshift_connector import Connection as RedshiftConnection # type: ignore[import-untyped]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a given module it's normally best to either use one or the other…

Suggested change
import redshift_connector # type: ignore[import-untyped]
from redshift_connector import Connection as RedshiftConnection # type: ignore[import-untyped]
import redshift_connector # type: ignore[import-untyped]

@asnare
Copy link
Contributor

asnare commented Mar 6, 2026

  • We absolutely need some test coverage: we don't have the resources to manually test this, and without automated coverage of some sort the chances of this being inadvertently broken are high. I understand that for this integration testing is going to be difficult to set up, but at the very least there needs to be some unit coverage.

To clarify a bit here:

  • We absolutely need integration tests against the real thing where possible: that's the source of truth, after all. At a technical level local emulations often don't behave the same way, and the difference leads to surprises.
  • There's nothing wrong with a local version of the service, especially for use during development. In fact this is preferable: you want the dev-loop to be local and fast, avoiding round-tripping through CI/CD is a massive win.
  • The local version here is a nice-to-have, and doesn't supersede the need for proper integration testing against the real thing. We're also going to need the real thing for support and troubleshooting.

Getting back to your original question about LocalStack:

  • This is a 3rd-party emulation, so I expect there will be significant drift relative to the real thing. (This is consistent with what I wrote above: we can't rely on it for testing as a source of truth.)
  • It's a commercial offering, with tiers. I can't make sense of it all, but it looks to me like: a) per-seat licensing; b) CI/CD uses a credit-based system. (It's unclear to me if the 'Pro' image is licensed separately, but I would guess we need the 'Pro' image because profiling would use some of the APIs that it provided.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge feat/profiler Issues related to profilers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants