Skip to content

Script specific compute projects#162

Merged
MaxHalford merged 10 commits intomainfrom
script-specific-compute-project
Jan 23, 2026
Merged

Script specific compute projects#162
MaxHalford merged 10 commits intomainfrom
script-specific-compute-project

Conversation

@MaxHalford
Copy link
Member

@MaxHalford MaxHalford commented Jan 23, 2026

Also moving to uv


Summary by cubic

Add script-specific compute projects for BigQuery and switch the project to uv for dependency management and CI. This lets expensive scripts run on a different project/reservation while simplifying local setup and CI.

  • New Features

    • Route specific scripts to custom BigQuery compute projects via LEA_BQ_SCRIPT_SPECIFIC_COMPUTE_PROJECT_IDS (JSON map of script refs to project IDs). Falls back to LEA_BQ_COMPUTE_PROJECT_ID and works with Big Blue’s Pick API.
    • Jinja templates can load YAML files with a new load_yaml helper.
  • Migration

    • Move from Poetry to uv:
      • Install uv and run: uv sync
      • Use uv run pre-commit run --all-files and uv run pytest
      • poetry.lock removed; uv.lock added; packaging migrated to Hatchling.
    • Python 3.11+ is now required.

Written for commit 0456dfd. Summary will update on new commits.

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 issues found across 13 files

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="lea/databases.py">

<violation number="1" location="lea/databases.py:315">
P2: default_client indexes clients with compute_project_id even when it is optional/None, which will raise KeyError for configs without LEA_BQ_COMPUTE_PROJECT_ID.</violation>

<violation number="2" location="lea/databases.py:351">
P2: determine_client_for_script can raise KeyError when the Pick API falls back to write_project_id or compute_project_id is unset, because those IDs are not guaranteed to be present in clients.</violation>
</file>

<file name="lea/scripts.py">

<violation number="1" location="lea/scripts.py:98">
P2: load_yaml resolves and opens arbitrary paths without verifying they stay under scripts_dir, so a template can read files outside the scripts directory (e.g., ../../.env). Add a guard to prevent path traversal.</violation>
</file>

<file name="lea/conductor.py">

<violation number="1" location="lea/conductor.py:283">
P2: Script-specific compute project mappings are always built with the username-suffixed dataset, so in production mode (where `write_dataset` is the base dataset) the lookup never matches and the per-script compute projects are ignored.</violation>
</file>

<file name="pyproject.toml">

<violation number="1" location="pyproject.toml:47">
P2: Ruff is targeting Python 3.13 while the project declares support for Python 3.11. This can mask usage of 3.12/3.13-only syntax or APIs that will break on the minimum supported version. Align ruff’s target-version with the minimum runtime.</violation>
</file>
Architecture diagram
sequenceDiagram
    participant Conductor
    participant Loader as Script Loader
    participant BQClient as BigQuery Client
    participant GCP as Google Cloud (BQ)

    Note over Conductor,GCP: 1. Initialization (NEW: Multi-Client Setup)

    Conductor->>Conductor: NEW: Parse "LEA_BQ_SCRIPT_SPECIFIC_COMPUTE_PROJECT_IDS"
    Conductor->>BQClient: __init__(..., script_specific_compute_project_ids)
    
    loop For each unique project ID (Default + Custom + Pick API)
        BQClient->>BQClient: CHANGED: Instantiate google.cloud.bigquery.Client
        Note right of BQClient: Creates a pool of clients,<br/>one for each compute project
    end

    Note over Conductor,GCP: 2. Script Rendering (NEW: YAML Helper)

    Conductor->>Loader: from_path(script_path)
    Loader->>Loader: Setup Jinja Environment
    opt Template calls load_yaml('path/to/file.yaml')
        Loader->>Loader: NEW: Read & Parse YAML file
        Loader-->>Loader: Inject data into SQL context
    end
    Loader-->>Conductor: Return Rendered SQLScript

    Note over Conductor,GCP: 3. Execution (NEW: Compute Project Routing)

    Conductor->>BQClient: materialize_script(script)
    BQClient->>BQClient: NEW: determine_client_for_script(script)

    alt Big Blue Pick API Active
        BQClient->>BQClient: Call Pick API -> project_id
    else NEW: Script ID in Config Map
        BQClient->>BQClient: Map table_ref -> custom project_id
    else Default
        BQClient->>BQClient: Use default compute_project_id
    end

    BQClient->>BQClient: Select specific Client from pool

    Note right of BQClient: Executes query using the<br/>SELECTED compute project
    
    BQClient->>GCP: client.query(..., project=selected_project)
    GCP-->>BQClient: Job Result
    BQClient-->>Conductor: DatabaseJob
Loading

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

sql_script.table_ref, self.compute_project_id
)
)
return self.clients[project_id]
Copy link

@cubic-dev-ai cubic-dev-ai bot Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: determine_client_for_script can raise KeyError when the Pick API falls back to write_project_id or compute_project_id is unset, because those IDs are not guaranteed to be present in clients.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At lea/databases.py, line 351:

<comment>determine_client_for_script can raise KeyError when the Pick API falls back to write_project_id or compute_project_id is unset, because those IDs are not guaranteed to be present in clients.</comment>

<file context>
@@ -348,6 +340,16 @@ def materialize_script(self, script: scripts.Script) -> BigQueryJob:
+                sql_script.table_ref, self.compute_project_id
+            )
+        )
+        return self.clients[project_id]
+
     def materialize_sql_script(self, sql_script: scripts.SQLScript) -> BigQueryJob:
</file context>
Suggested change
return self.clients[project_id]
project_id = project_id or self.write_project_id
if project_id not in self.clients:
self.clients[project_id] = bigquery.Client(
project=project_id,
credentials=self.credentials,
location=self.location,
client_options={
"scopes": [
"https://www.googleapis.com/auth/cloud-platform",
"https://www.googleapis.com/auth/drive",
"https://www.googleapis.com/auth/spreadsheets.readonly",
"https://www.googleapis.com/auth/userinfo.email",
]
},
)
return self.clients[project_id]
Fix with Cubic


@property
def default_client(self) -> bigquery.Client:
return self.clients[self.compute_project_id]
Copy link

@cubic-dev-ai cubic-dev-ai bot Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: default_client indexes clients with compute_project_id even when it is optional/None, which will raise KeyError for configs without LEA_BQ_COMPUTE_PROJECT_ID.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At lea/databases.py, line 315:

<comment>default_client indexes clients with compute_project_id even when it is optional/None, which will raise KeyError for configs without LEA_BQ_COMPUTE_PROJECT_ID.</comment>

<file context>
@@ -322,17 +310,21 @@ def __init__(
 
+    @property
+    def default_client(self) -> bigquery.Client:
+        return self.clients[self.compute_project_id]
+
     def create_dataset(self, dataset_name: str):
</file context>
Suggested change
return self.clients[self.compute_project_id]
project_id = self.compute_project_id or self.write_project_id
if project_id not in self.clients:
self.clients[project_id] = bigquery.Client(
project=project_id,
credentials=self.credentials,
location=self.location,
client_options={
"scopes": [
"https://www.googleapis.com/auth/cloud-platform",
"https://www.googleapis.com/auth/drive",
"https://www.googleapis.com/auth/spreadsheets.readonly",
"https://www.googleapis.com/auth/userinfo.email",
]
},
)
return self.clients[project_id]
Fix with Cubic

environment = jinja2.Environment(loader=loader)

def load_yaml(path: str) -> dict:
full_path = (scripts_dir / path).resolve()
Copy link

@cubic-dev-ai cubic-dev-ai bot Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: load_yaml resolves and opens arbitrary paths without verifying they stay under scripts_dir, so a template can read files outside the scripts directory (e.g., ../../.env). Add a guard to prevent path traversal.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At lea/scripts.py, line 98:

<comment>load_yaml resolves and opens arbitrary paths without verifying they stay under scripts_dir, so a template can read files outside the scripts directory (e.g., ../../.env). Add a guard to prevent path traversal.</comment>

<file context>
@@ -92,8 +93,14 @@ def from_path(
             environment = jinja2.Environment(loader=loader)
+
+            def load_yaml(path: str) -> dict:
+                full_path = (scripts_dir / path).resolve()
+                with open(full_path) as f:
+                    return yaml.safe_load(f)
</file context>
Suggested change
full_path = (scripts_dir / path).resolve()
full_path = (scripts_dir / path).resolve()
if not full_path.is_relative_to(scripts_dir.resolve()):
raise ValueError(f"load_yaml path escapes scripts_dir: {path}")
Fix with Cubic

),
script_specific_compute_project_ids=parse_bigquery_script_specific_compute_project_ids(
env_var=os.environ.get("LEA_BQ_SCRIPT_SPECIFIC_COMPUTE_PROJECT_IDS"),
dataset_name=self.dataset_name_with_username,
Copy link

@cubic-dev-ai cubic-dev-ai bot Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Script-specific compute project mappings are always built with the username-suffixed dataset, so in production mode (where write_dataset is the base dataset) the lookup never matches and the per-script compute projects are ignored.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At lea/conductor.py, line 283:

<comment>Script-specific compute project mappings are always built with the username-suffixed dataset, so in production mode (where `write_dataset` is the base dataset) the lookup never matches and the per-script compute projects are ignored.</comment>

<file context>
@@ -278,6 +278,11 @@ def make_client(self, dry_run: bool = False, print_mode: bool = False) -> Databa
                 ),
+                script_specific_compute_project_ids=parse_bigquery_script_specific_compute_project_ids(
+                    env_var=os.environ.get("LEA_BQ_SCRIPT_SPECIFIC_COMPUTE_PROJECT_IDS"),
+                    dataset_name=self.dataset_name_with_username,
+                    write_project_id=os.environ["LEA_BQ_PROJECT_ID"],
+                ),
</file context>
Fix with Cubic

pyproject.toml Outdated
line-length = 100
lint.select = ["E", "F", "I", "UP"] # https://beta.ruff.rs/docs/rules/
target-version = 'py310'
target-version = 'py313'
Copy link

@cubic-dev-ai cubic-dev-ai bot Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Ruff is targeting Python 3.13 while the project declares support for Python 3.11. This can mask usage of 3.12/3.13-only syntax or APIs that will break on the minimum supported version. Align ruff’s target-version with the minimum runtime.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At pyproject.toml, line 47:

<comment>Ruff is targeting Python 3.13 while the project declares support for Python 3.11. This can mask usage of 3.12/3.13-only syntax or APIs that will break on the minimum supported version. Align ruff’s target-version with the minimum runtime.</comment>

<file context>
@@ -1,47 +1,50 @@
 line-length = 100
 lint.select = ["E", "F", "I", "UP"] # https://beta.ruff.rs/docs/rules/
-target-version = 'py310'
+target-version = 'py313'
 
 [tool.ruff.lint.isort]
</file context>
Suggested change
target-version = 'py313'
target-version = 'py311'
Fix with Cubic

@MaxHalford MaxHalford merged commit 0c37033 into main Jan 23, 2026
3 checks passed
@MaxHalford MaxHalford deleted the script-specific-compute-project branch January 23, 2026 23:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant