Conversation
There was a problem hiding this comment.
5 issues found across 13 files
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="lea/databases.py">
<violation number="1" location="lea/databases.py:315">
P2: default_client indexes clients with compute_project_id even when it is optional/None, which will raise KeyError for configs without LEA_BQ_COMPUTE_PROJECT_ID.</violation>
<violation number="2" location="lea/databases.py:351">
P2: determine_client_for_script can raise KeyError when the Pick API falls back to write_project_id or compute_project_id is unset, because those IDs are not guaranteed to be present in clients.</violation>
</file>
<file name="lea/scripts.py">
<violation number="1" location="lea/scripts.py:98">
P2: load_yaml resolves and opens arbitrary paths without verifying they stay under scripts_dir, so a template can read files outside the scripts directory (e.g., ../../.env). Add a guard to prevent path traversal.</violation>
</file>
<file name="lea/conductor.py">
<violation number="1" location="lea/conductor.py:283">
P2: Script-specific compute project mappings are always built with the username-suffixed dataset, so in production mode (where `write_dataset` is the base dataset) the lookup never matches and the per-script compute projects are ignored.</violation>
</file>
<file name="pyproject.toml">
<violation number="1" location="pyproject.toml:47">
P2: Ruff is targeting Python 3.13 while the project declares support for Python 3.11. This can mask usage of 3.12/3.13-only syntax or APIs that will break on the minimum supported version. Align ruff’s target-version with the minimum runtime.</violation>
</file>
Architecture diagram
sequenceDiagram
participant Conductor
participant Loader as Script Loader
participant BQClient as BigQuery Client
participant GCP as Google Cloud (BQ)
Note over Conductor,GCP: 1. Initialization (NEW: Multi-Client Setup)
Conductor->>Conductor: NEW: Parse "LEA_BQ_SCRIPT_SPECIFIC_COMPUTE_PROJECT_IDS"
Conductor->>BQClient: __init__(..., script_specific_compute_project_ids)
loop For each unique project ID (Default + Custom + Pick API)
BQClient->>BQClient: CHANGED: Instantiate google.cloud.bigquery.Client
Note right of BQClient: Creates a pool of clients,<br/>one for each compute project
end
Note over Conductor,GCP: 2. Script Rendering (NEW: YAML Helper)
Conductor->>Loader: from_path(script_path)
Loader->>Loader: Setup Jinja Environment
opt Template calls load_yaml('path/to/file.yaml')
Loader->>Loader: NEW: Read & Parse YAML file
Loader-->>Loader: Inject data into SQL context
end
Loader-->>Conductor: Return Rendered SQLScript
Note over Conductor,GCP: 3. Execution (NEW: Compute Project Routing)
Conductor->>BQClient: materialize_script(script)
BQClient->>BQClient: NEW: determine_client_for_script(script)
alt Big Blue Pick API Active
BQClient->>BQClient: Call Pick API -> project_id
else NEW: Script ID in Config Map
BQClient->>BQClient: Map table_ref -> custom project_id
else Default
BQClient->>BQClient: Use default compute_project_id
end
BQClient->>BQClient: Select specific Client from pool
Note right of BQClient: Executes query using the<br/>SELECTED compute project
BQClient->>GCP: client.query(..., project=selected_project)
GCP-->>BQClient: Job Result
BQClient-->>Conductor: DatabaseJob
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| sql_script.table_ref, self.compute_project_id | ||
| ) | ||
| ) | ||
| return self.clients[project_id] |
There was a problem hiding this comment.
P2: determine_client_for_script can raise KeyError when the Pick API falls back to write_project_id or compute_project_id is unset, because those IDs are not guaranteed to be present in clients.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At lea/databases.py, line 351:
<comment>determine_client_for_script can raise KeyError when the Pick API falls back to write_project_id or compute_project_id is unset, because those IDs are not guaranteed to be present in clients.</comment>
<file context>
@@ -348,6 +340,16 @@ def materialize_script(self, script: scripts.Script) -> BigQueryJob:
+ sql_script.table_ref, self.compute_project_id
+ )
+ )
+ return self.clients[project_id]
+
def materialize_sql_script(self, sql_script: scripts.SQLScript) -> BigQueryJob:
</file context>
| return self.clients[project_id] | |
| project_id = project_id or self.write_project_id | |
| if project_id not in self.clients: | |
| self.clients[project_id] = bigquery.Client( | |
| project=project_id, | |
| credentials=self.credentials, | |
| location=self.location, | |
| client_options={ | |
| "scopes": [ | |
| "https://www.googleapis.com/auth/cloud-platform", | |
| "https://www.googleapis.com/auth/drive", | |
| "https://www.googleapis.com/auth/spreadsheets.readonly", | |
| "https://www.googleapis.com/auth/userinfo.email", | |
| ] | |
| }, | |
| ) | |
| return self.clients[project_id] |
|
|
||
| @property | ||
| def default_client(self) -> bigquery.Client: | ||
| return self.clients[self.compute_project_id] |
There was a problem hiding this comment.
P2: default_client indexes clients with compute_project_id even when it is optional/None, which will raise KeyError for configs without LEA_BQ_COMPUTE_PROJECT_ID.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At lea/databases.py, line 315:
<comment>default_client indexes clients with compute_project_id even when it is optional/None, which will raise KeyError for configs without LEA_BQ_COMPUTE_PROJECT_ID.</comment>
<file context>
@@ -322,17 +310,21 @@ def __init__(
+ @property
+ def default_client(self) -> bigquery.Client:
+ return self.clients[self.compute_project_id]
+
def create_dataset(self, dataset_name: str):
</file context>
| return self.clients[self.compute_project_id] | |
| project_id = self.compute_project_id or self.write_project_id | |
| if project_id not in self.clients: | |
| self.clients[project_id] = bigquery.Client( | |
| project=project_id, | |
| credentials=self.credentials, | |
| location=self.location, | |
| client_options={ | |
| "scopes": [ | |
| "https://www.googleapis.com/auth/cloud-platform", | |
| "https://www.googleapis.com/auth/drive", | |
| "https://www.googleapis.com/auth/spreadsheets.readonly", | |
| "https://www.googleapis.com/auth/userinfo.email", | |
| ] | |
| }, | |
| ) | |
| return self.clients[project_id] |
| environment = jinja2.Environment(loader=loader) | ||
|
|
||
| def load_yaml(path: str) -> dict: | ||
| full_path = (scripts_dir / path).resolve() |
There was a problem hiding this comment.
P2: load_yaml resolves and opens arbitrary paths without verifying they stay under scripts_dir, so a template can read files outside the scripts directory (e.g., ../../.env). Add a guard to prevent path traversal.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At lea/scripts.py, line 98:
<comment>load_yaml resolves and opens arbitrary paths without verifying they stay under scripts_dir, so a template can read files outside the scripts directory (e.g., ../../.env). Add a guard to prevent path traversal.</comment>
<file context>
@@ -92,8 +93,14 @@ def from_path(
environment = jinja2.Environment(loader=loader)
+
+ def load_yaml(path: str) -> dict:
+ full_path = (scripts_dir / path).resolve()
+ with open(full_path) as f:
+ return yaml.safe_load(f)
</file context>
| full_path = (scripts_dir / path).resolve() | |
| full_path = (scripts_dir / path).resolve() | |
| if not full_path.is_relative_to(scripts_dir.resolve()): | |
| raise ValueError(f"load_yaml path escapes scripts_dir: {path}") |
| ), | ||
| script_specific_compute_project_ids=parse_bigquery_script_specific_compute_project_ids( | ||
| env_var=os.environ.get("LEA_BQ_SCRIPT_SPECIFIC_COMPUTE_PROJECT_IDS"), | ||
| dataset_name=self.dataset_name_with_username, |
There was a problem hiding this comment.
P2: Script-specific compute project mappings are always built with the username-suffixed dataset, so in production mode (where write_dataset is the base dataset) the lookup never matches and the per-script compute projects are ignored.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At lea/conductor.py, line 283:
<comment>Script-specific compute project mappings are always built with the username-suffixed dataset, so in production mode (where `write_dataset` is the base dataset) the lookup never matches and the per-script compute projects are ignored.</comment>
<file context>
@@ -278,6 +278,11 @@ def make_client(self, dry_run: bool = False, print_mode: bool = False) -> Databa
),
+ script_specific_compute_project_ids=parse_bigquery_script_specific_compute_project_ids(
+ env_var=os.environ.get("LEA_BQ_SCRIPT_SPECIFIC_COMPUTE_PROJECT_IDS"),
+ dataset_name=self.dataset_name_with_username,
+ write_project_id=os.environ["LEA_BQ_PROJECT_ID"],
+ ),
</file context>
pyproject.toml
Outdated
| line-length = 100 | ||
| lint.select = ["E", "F", "I", "UP"] # https://beta.ruff.rs/docs/rules/ | ||
| target-version = 'py310' | ||
| target-version = 'py313' |
There was a problem hiding this comment.
P2: Ruff is targeting Python 3.13 while the project declares support for Python 3.11. This can mask usage of 3.12/3.13-only syntax or APIs that will break on the minimum supported version. Align ruff’s target-version with the minimum runtime.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At pyproject.toml, line 47:
<comment>Ruff is targeting Python 3.13 while the project declares support for Python 3.11. This can mask usage of 3.12/3.13-only syntax or APIs that will break on the minimum supported version. Align ruff’s target-version with the minimum runtime.</comment>
<file context>
@@ -1,47 +1,50 @@
line-length = 100
lint.select = ["E", "F", "I", "UP"] # https://beta.ruff.rs/docs/rules/
-target-version = 'py310'
+target-version = 'py313'
[tool.ruff.lint.isort]
</file context>
| target-version = 'py313' | |
| target-version = 'py311' |
Also moving to
uvSummary by cubic
Add script-specific compute projects for BigQuery and switch the project to uv for dependency management and CI. This lets expensive scripts run on a different project/reservation while simplifying local setup and CI.
New Features
Migration
Written for commit 0456dfd. Summary will update on new commits.