Script specific compute projects by MaxHalford · Pull Request #162 · carbonfact/lea

MaxHalford · 2026-01-23T23:35:39Z

Also moving to uv

Summary by cubic

Add script-specific compute projects for BigQuery and switch the project to uv for dependency management and CI. This lets expensive scripts run on a different project/reservation while simplifying local setup and CI.

New Features
- Route specific scripts to custom BigQuery compute projects via LEA_BQ_SCRIPT_SPECIFIC_COMPUTE_PROJECT_IDS (JSON map of script refs to project IDs). Falls back to LEA_BQ_COMPUTE_PROJECT_ID and works with Big Blue’s Pick API.
- Jinja templates can load YAML files with a new load_yaml helper.
Migration
- Move from Poetry to uv:
  - Install uv and run: uv sync
  - Use uv run pre-commit run --all-files and uv run pytest
  - poetry.lock removed; uv.lock added; packaging migrated to Hatchling.
- Python 3.11+ is now required.

^{Written for commit 0456dfd. Summary will update on new commits.}

cubic-dev-ai

5 issues found across 13 files

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="lea/databases.py">

<violation number="1" location="lea/databases.py:315">
P2: default_client indexes clients with compute_project_id even when it is optional/None, which will raise KeyError for configs without LEA_BQ_COMPUTE_PROJECT_ID.</violation>

<violation number="2" location="lea/databases.py:351">
P2: determine_client_for_script can raise KeyError when the Pick API falls back to write_project_id or compute_project_id is unset, because those IDs are not guaranteed to be present in clients.</violation>
</file>

<file name="lea/scripts.py">

<violation number="1" location="lea/scripts.py:98">
P2: load_yaml resolves and opens arbitrary paths without verifying they stay under scripts_dir, so a template can read files outside the scripts directory (e.g., ../../.env). Add a guard to prevent path traversal.</violation>
</file>

<file name="lea/conductor.py">

<violation number="1" location="lea/conductor.py:283">
P2: Script-specific compute project mappings are always built with the username-suffixed dataset, so in production mode (where `write_dataset` is the base dataset) the lookup never matches and the per-script compute projects are ignored.</violation>
</file>

<file name="pyproject.toml">

<violation number="1" location="pyproject.toml:47">
P2: Ruff is targeting Python 3.13 while the project declares support for Python 3.11. This can mask usage of 3.12/3.13-only syntax or APIs that will break on the minimum supported version. Align ruff’s target-version with the minimum runtime.</violation>
</file>

Architecture diagram

sequenceDiagram
    participant Conductor
    participant Loader as Script Loader
    participant BQClient as BigQuery Client
    participant GCP as Google Cloud (BQ)

    Note over Conductor,GCP: 1. Initialization (NEW: Multi-Client Setup)

    Conductor->>Conductor: NEW: Parse "LEA_BQ_SCRIPT_SPECIFIC_COMPUTE_PROJECT_IDS"
    Conductor->>BQClient: __init__(..., script_specific_compute_project_ids)
    
    loop For each unique project ID (Default + Custom + Pick API)
        BQClient->>BQClient: CHANGED: Instantiate google.cloud.bigquery.Client
        Note right of BQClient: Creates a pool of clients,<br/>one for each compute project
    end

    Note over Conductor,GCP: 2. Script Rendering (NEW: YAML Helper)

    Conductor->>Loader: from_path(script_path)
    Loader->>Loader: Setup Jinja Environment
    opt Template calls load_yaml('path/to/file.yaml')
        Loader->>Loader: NEW: Read & Parse YAML file
        Loader-->>Loader: Inject data into SQL context
    end
    Loader-->>Conductor: Return Rendered SQLScript

    Note over Conductor,GCP: 3. Execution (NEW: Compute Project Routing)

    Conductor->>BQClient: materialize_script(script)
    BQClient->>BQClient: NEW: determine_client_for_script(script)

    alt Big Blue Pick API Active
        BQClient->>BQClient: Call Pick API -> project_id
    else NEW: Script ID in Config Map
        BQClient->>BQClient: Map table_ref -> custom project_id
    else Default
        BQClient->>BQClient: Use default compute_project_id
    end

    BQClient->>BQClient: Select specific Client from pool

    Note right of BQClient: Executes query using the<br/>SELECTED compute project
    
    BQClient->>GCP: client.query(..., project=selected_project)
    GCP-->>BQClient: Job Result
    BQClient-->>Conductor: DatabaseJob

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

cubic-dev-ai · 2026-01-23T23:42:13Z

lea/databases.py

+                sql_script.table_ref, self.compute_project_id
+            )
+        )
+        return self.clients[project_id]


P2: determine_client_for_script can raise KeyError when the Pick API falls back to write_project_id or compute_project_id is unset, because those IDs are not guaranteed to be present in clients.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At lea/databases.py, line 351: <comment>determine_client_for_script can raise KeyError when the Pick API falls back to write_project_id or compute_project_id is unset, because those IDs are not guaranteed to be present in clients.</comment> <file context> @@ -348,6 +340,16 @@ def materialize_script(self, script: scripts.Script) -> BigQueryJob: + sql_script.table_ref, self.compute_project_id + ) + ) + return self.clients[project_id] + def materialize_sql_script(self, sql_script: scripts.SQLScript) -> BigQueryJob: </file context>

Suggested change

return self.clients[project_id]

project_id = project_id or self.write_project_id

if project_id not in self.clients:

self.clients[project_id] = bigquery.Client(

project=project_id,

credentials=self.credentials,

location=self.location,

client_options={

"scopes": [

"https://www.googleapis.com/auth/cloud-platform",

"https://www.googleapis.com/auth/drive",

"https://www.googleapis.com/auth/spreadsheets.readonly",

"https://www.googleapis.com/auth/userinfo.email",

]

},

)

return self.clients[project_id]

cubic-dev-ai · 2026-01-23T23:42:13Z

lea/databases.py


+    @property
+    def default_client(self) -> bigquery.Client:
+        return self.clients[self.compute_project_id]


P2: default_client indexes clients with compute_project_id even when it is optional/None, which will raise KeyError for configs without LEA_BQ_COMPUTE_PROJECT_ID.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At lea/databases.py, line 315: <comment>default_client indexes clients with compute_project_id even when it is optional/None, which will raise KeyError for configs without LEA_BQ_COMPUTE_PROJECT_ID.</comment> <file context> @@ -322,17 +310,21 @@ def __init__( + @property + def default_client(self) -> bigquery.Client: + return self.clients[self.compute_project_id] + def create_dataset(self, dataset_name: str): </file context>

Suggested change

return self.clients[self.compute_project_id]

project_id = self.compute_project_id or self.write_project_id

if project_id not in self.clients:

self.clients[project_id] = bigquery.Client(

project=project_id,

credentials=self.credentials,

location=self.location,

client_options={

"scopes": [

"https://www.googleapis.com/auth/cloud-platform",

"https://www.googleapis.com/auth/drive",

"https://www.googleapis.com/auth/spreadsheets.readonly",

"https://www.googleapis.com/auth/userinfo.email",

]

},

)

return self.clients[project_id]

cubic-dev-ai · 2026-01-23T23:42:13Z

lea/scripts.py

            environment = jinja2.Environment(loader=loader)
+
+            def load_yaml(path: str) -> dict:
+                full_path = (scripts_dir / path).resolve()


P2: load_yaml resolves and opens arbitrary paths without verifying they stay under scripts_dir, so a template can read files outside the scripts directory (e.g., ../../.env). Add a guard to prevent path traversal.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At lea/scripts.py, line 98: <comment>load_yaml resolves and opens arbitrary paths without verifying they stay under scripts_dir, so a template can read files outside the scripts directory (e.g., ../../.env). Add a guard to prevent path traversal.</comment> <file context> @@ -92,8 +93,14 @@ def from_path( environment = jinja2.Environment(loader=loader) + + def load_yaml(path: str) -> dict: + full_path = (scripts_dir / path).resolve() + with open(full_path) as f: + return yaml.safe_load(f) </file context>

Suggested change

full_path = (scripts_dir / path).resolve()

full_path = (scripts_dir / path).resolve()

if not full_path.is_relative_to(scripts_dir.resolve()):

raise ValueError(f"load_yaml path escapes scripts_dir: {path}")

cubic-dev-ai · 2026-01-23T23:42:14Z

lea/conductor.py

                ),
+                script_specific_compute_project_ids=parse_bigquery_script_specific_compute_project_ids(
+                    env_var=os.environ.get("LEA_BQ_SCRIPT_SPECIFIC_COMPUTE_PROJECT_IDS"),
+                    dataset_name=self.dataset_name_with_username,


P2: Script-specific compute project mappings are always built with the username-suffixed dataset, so in production mode (where write_dataset is the base dataset) the lookup never matches and the per-script compute projects are ignored.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At lea/conductor.py, line 283: <comment>Script-specific compute project mappings are always built with the username-suffixed dataset, so in production mode (where `write_dataset` is the base dataset) the lookup never matches and the per-script compute projects are ignored.</comment> <file context> @@ -278,6 +278,11 @@ def make_client(self, dry_run: bool = False, print_mode: bool = False) -> Databa ), + script_specific_compute_project_ids=parse_bigquery_script_specific_compute_project_ids( + env_var=os.environ.get("LEA_BQ_SCRIPT_SPECIFIC_COMPUTE_PROJECT_IDS"), + dataset_name=self.dataset_name_with_username, + write_project_id=os.environ["LEA_BQ_PROJECT_ID"], + ), </file context>

cubic-dev-ai · 2026-01-23T23:42:14Z

pyproject.toml

 line-length = 100
 lint.select = ["E", "F", "I", "UP"] # https://beta.ruff.rs/docs/rules/
-target-version = 'py310'
+target-version = 'py313'


P2: Ruff is targeting Python 3.13 while the project declares support for Python 3.11. This can mask usage of 3.12/3.13-only syntax or APIs that will break on the minimum supported version. Align ruff’s target-version with the minimum runtime.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At pyproject.toml, line 47: <comment>Ruff is targeting Python 3.13 while the project declares support for Python 3.11. This can mask usage of 3.12/3.13-only syntax or APIs that will break on the minimum supported version. Align ruff’s target-version with the minimum runtime.</comment> <file context> @@ -1,47 +1,50 @@ line-length = 100 lint.select = ["E", "F", "I", "UP"] # https://beta.ruff.rs/docs/rules/ -target-version = 'py310' +target-version = 'py313' [tool.ruff.lint.isort] </file context>

Suggested change

target-version = 'py313'

target-version = 'py311'

MaxHalford added 8 commits January 23, 2026 23:32

wip

3fc5c18

load_yaml

9846e8c

it works!

9b56e8e

Update README.md

93cea46

Update README.md

c1fb67d

Update pyproject.toml

b2ee1d3

Update databases.py

a0626ea

bump

a379e0b

cubic-dev-ai bot reviewed Jan 23, 2026

View reviewed changes

MaxHalford added 2 commits January 24, 2026 00:42

Create publish.yml

3b63a83

Update publish.yml

0456dfd

MaxHalford merged commit 0c37033 into main Jan 23, 2026
3 checks passed

MaxHalford deleted the script-specific-compute-project branch January 23, 2026 23:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Script specific compute projects#162

Script specific compute projects#162
MaxHalford merged 10 commits intomainfrom
script-specific-compute-project

MaxHalford commented Jan 23, 2026 •

edited by cubic-dev-ai bot

Loading

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

cubic-dev-ai bot Jan 23, 2026 •

edited

Loading

Uh oh!

cubic-dev-ai bot Jan 23, 2026 •

edited

Loading

Uh oh!

cubic-dev-ai bot Jan 23, 2026 •

edited

Loading

Uh oh!

cubic-dev-ai bot Jan 23, 2026 •

edited

Loading

Uh oh!

cubic-dev-ai bot Jan 23, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

-        return self.clients[project_id]
+        project_id = project_id or self.write_project_id
+        if project_id not in self.clients:
+            self.clients[project_id] = bigquery.Client(
+                project=project_id,
+                credentials=self.credentials,
+                location=self.location,
+                client_options={
+                    "scopes": [
+                        "https://www.googleapis.com/auth/cloud-platform",
+                        "https://www.googleapis.com/auth/drive",
+                        "https://www.googleapis.com/auth/spreadsheets.readonly",
+                        "https://www.googleapis.com/auth/userinfo.email",
+                    ]
+                },
+            )
+        return self.clients[project_id]

Conversation

MaxHalford commented Jan 23, 2026 • edited by cubic-dev-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by cubic

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

MaxHalford commented Jan 23, 2026 •

edited by cubic-dev-ai bot

Loading

cubic-dev-ai bot Jan 23, 2026 •

edited

Loading

cubic-dev-ai bot Jan 23, 2026 •

edited

Loading

cubic-dev-ai bot Jan 23, 2026 •

edited

Loading

cubic-dev-ai bot Jan 23, 2026 •

edited

Loading

cubic-dev-ai bot Jan 23, 2026 •

edited

Loading