Skip to content

Conversation

@scholtzan
Copy link
Contributor

@scholtzan scholtzan commented Dec 23, 2025

Description

Depends on mozilla/bigquery-etl#8648

Currently, bqetl schema query update * is running for all queries before table deploys. This takes 13-15minutes to complete. We might not need to update schemas for queries with an existing schema.yaml file. If the schemas are out of date then the dryrun task will flag them instead. This saves about 7 minutes (when tested locally).

I ran the schema update with the skip option against the generated-sql branch and also executed a dryrun with schema validation and no errors showed up.

Related Tickets & Documents

@scholtzan scholtzan force-pushed the skip-updating-existing-schemas branch from 0fa6e23 to 7925222 Compare January 6, 2026 18:23
@scholtzan scholtzan marked this pull request as ready for review January 6, 2026 18:24
generate_sql_cmd_template
+ "script/bqetl query initialize '*' --skip-existing --project-id=moz-fx-data-shared-prod --project-id=moz-fx-data-experiments --project-id=moz-fx-data-marketing-prod --project-id=moz-fx-data-bq-people && "
"script/bqetl query schema update '*' --use-cloud-function=false --ignore-dryrun-skip --project-id=moz-fx-data-shared-prod --project-id=moz-fx-data-experiments --project-id=moz-fx-data-marketing-prod --project-id=moz-fx-glam-prod --project-id=moz-fx-data-bq-people && "
"script/bqetl query schema update '*' --skip-existing --use-cloud-function=false --ignore-dryrun-skip --project-id=moz-fx-data-shared-prod --project-id=moz-fx-data-experiments --project-id=moz-fx-data-marketing-prod --project-id=moz-fx-glam-prod --project-id=moz-fx-data-bq-people && "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (blocking): I believe this will cause the subsequent bqetl query schema deploy command to fail in cases where the schema.yaml file only contains a subset of the fields currently in the table, which I know is currently the case for some ETLs which are using the ALLOW_FIELD_ADDITION schema update option.

This isn't an absolute dealbreaker, but does mean a BigQuery ETL PR will be needed first to update such ETLs' schema.yaml files. And then whenever new fields get added to such ETLs (e.g. due to new fields being added in upstream data sources and getting passed through) that will cause table deployment failures, so PRs to re-update their schema.yaml files will need to be submitted and merged promptly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of these cases should be detected by bqetl_dryrun, which compares the schema.yaml files in the generated-sql branch against the query and table schema. However, some queries are skipped in dryruns, so those don't get caught and might cause the issues you described.

I created mozilla/bigquery-etl#8670 to still update schemas for queries that have ALLOW_FIELD_ADDITION configured in their metadata even when --skip_existing is used

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. 👍

generate_sql_cmd_template
+ "script/bqetl query initialize '*' --skip-existing --project-id=moz-fx-data-shared-prod --project-id=moz-fx-data-experiments --project-id=moz-fx-data-marketing-prod --project-id=moz-fx-data-bq-people && "
"script/bqetl query schema update '*' --use-cloud-function=false --ignore-dryrun-skip --project-id=moz-fx-data-shared-prod --project-id=moz-fx-data-experiments --project-id=moz-fx-data-marketing-prod --project-id=moz-fx-glam-prod --project-id=moz-fx-data-bq-people && "
"script/bqetl query schema update '*' --skip-existing --use-cloud-function=false --ignore-dryrun-skip --project-id=moz-fx-data-shared-prod --project-id=moz-fx-data-experiments --project-id=moz-fx-data-marketing-prod --project-id=moz-fx-glam-prod --project-id=moz-fx-data-bq-people && "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. 👍

@scholtzan scholtzan merged commit 96bb249 into main Jan 6, 2026
6 checks passed
@scholtzan scholtzan deleted the skip-updating-existing-schemas branch January 6, 2026 21:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants