-
Notifications
You must be signed in to change notification settings - Fork 40
BigQuery
BigQuery is Google Cloud's fully managed, serverless, petabyte-scale, SQL data warehouse that lets you run analytics over vast amounts of data.
https://cloud.google.com/bigquery/docs/quickstarts
https://cloud.google.com/bigquery/docs/quickstarts/quickstart-command-line
https://cloud.google.com/bigquery/docs/sandbox
https://cloud.google.com/bigquery/docs/omni-aws-introduction
https://cloud.google.com/bigquery/docs/omni-azure-introduction
https://cloud.google.com/bigquery/docs/external-data-cloud-storage
https://codelabs.developers.google.com/codelabs/bigquery-cli
https://github.com/googleapis/python-bigquery
BigQuery has a distributed architecture running on thousands of nodes across Google's data centers. Your datasets are not stored in a unique server but are chunked and replicated across different regions.
The storage and compute layers are fully decoupled in BigQuery. This means that the query engine runs on different servers from the servers where the data is stored. This feature enables BigQuery to provide great scalability both in terms of data volume and query execution. This decoupled paradigm is only possible thanks to Google's Petabit network, which moves data very quickly from one server to another, leveraging Google's proprietary fiber cables across the globe.
Unlike traditional data warehouses, BigQuery stores data in columnar format in Google File System codename Colossus.
Fully decoupled from storage, the compute layer is responsible for receiving query statements from BigQuery users and executing them in the fastest way. The query engine is based on Dremel.
BigQuery is an append-only database, meaning as new rows are updated, rows are added to the database, rather than being updated in place.
BigQuery supports Standard SQL.
Each node provides a number of processing units called BigQuery slots to execute the business logic of the query. A BigQuery slot can be considered a virtual CPU on a Dremel node. The calculation of the slots needed to perform a specific query is automatically managed depending on the complexity of the query and impacted data volumes.
https://medium.com/google-cloud/a-seniors-guide-to-kickstart-your-bigquery-journey-75566e131983
Each time BigQuery executes a query, it executes a full-column scan. BigQuery doesn't use or support indexes. Because BigQuery performance and query s are based on the amount of data scanned during a query, design your queries so that they reference only the columns that are relevant to the query.
https://cloud.google.com/files/BigQueryTechnicalWP.pdf
https://medium.com/google-cloud/bigquery-basics-internals-part-2-7769f59d01e4
https://cloud.google.com/bigquery/docs/querying-partitioned-tables
https://cloud.google.com/architecture/bigquery-data-warehouse#query_optimization
https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types
https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax
The Query Plan explanation breaks down the stages that the query went through, the number of input/output rows handled at each stage, and the timing profile within each stage. Using the results from the explanation can help you understand and optimize your queries.
There's no infrastructure to manage in BigQuery. Developers focus on finding insights using standard SQL using on-demand or flat-rate options.
BigQuery is designed to ingest and store large amounts of data, and make that data accessible for large-scale analytics. BigQuery stores data in wide columns which are in separate files that are compressed and work well with distributed systems like Colossus.
https://cloud.google.com/blog/topics/developers-practitioners/bigquery-admin-reference-guide-storage
BigQuery uses dynamic query planning and different shards using in-memory shuffling for optimal distributed performance.
Clustered tables allow for better partitioning and performance.
Learn how to load CSV data in batch and analyze in BigQuery.
BigQuery is fully managed and lets you search through terabytes of data in seconds.
You can upload data files from local sources, Google Drive, or Cloud Storage buckets, take advantage of BigQuery Data Transfer Service (DTS), Cloud Data Fusion plug-ins, or leverage Google's industry-leading data integration partnerships.
https://cloud.google.com/blog/topics/data-warehousing/announcing-bigquery-migration-service
Google BigQuery has the benchmark data for the Wiki100B table. This table contains 100 billion rows and is about 7 Terabytes in size.
https://cloud.google.com/blog/products/gcp/anatomy-of-a-bigquery-query
BigQuery is a fast petabyte-scale analytics database. To achieve that level of performance, BigQuery executes queries completely in memory by leveraging Google’s peta bit scale networking technologies, such as Andromeda and Jupiter.
Shuffle is required for execution of large and complex joins, aggregations and analytic operations.
In-memory BigQuery shuffle stores intermediate data produced from various stages of query processing in a set of nodes that are dedicated to hosting remote memory.
When BigQuery executes a query job, it converts the declarative SQL statement into a graph of execution, broken up into a series of query stages, which themselves are composed of more granular sets of execution steps.
When evaluating your input data, consider the required I/O. How many bytes does your query read? Are you properly limiting the amount of input data? Is your data in native BigQuery storage or an external data source? The amount of data read by a query and the source of the data impact query performance and .
BigQuery Spanner federation enables BigQuery to query data residing in Spanner in real-time, without copying or moving data.
https://cloud.google.com/bigquery/docs/cloud-spanner-federated-queries
When evaluating your communication between slots consider the amount of shuffling that is required by your query. How many bytes are passed between stages? How many bytes are passed to each slot? The amount of data that is shuffled directly impacts communication throughput and query performance.
https://boonepeter.github.io/posts/unnecessary_bigquery_optimization
BigQuery lets you use time travel to access data stored in BigQuery that has been changed or deleted. You can access the data from any point within the last seven days. You can use time travel to query data that was updated or deleted, restore a table that was deleted, or restore a table that expired.
https://cloud.google.com/bigquery/docs/time-travel
https://medium.com/codex/more-options-to-restore-your-data-in-google-bigquery-181a32f7fa76
When evaluating the computation that is required by a query, consider the amount of work that is required. How much CPU time is required? Are you using functions like JavaScript user-defined functions that require additional CPU resources?
When evaluating your output data, consider the number of bytes written by your query. How many bytes are written for your result set? Are you properly limiting the amount of data written? Are you repeatedly writing the same data? The amount of data written by a query impacts query performance (I/O). If you are writing results to a permanent (destination) table, the amount of data written also has a .
BI Engine is an in-memory analysis service that helps customers get low latency performance for their queries across all BI tools that connect to BigQuery.
https://cloud.google.com/bigquery/docs/bi-engine-intro
https://dzone.com/articles/cloud-data-warehouse-comparison-redshift-vs-bigque
https://medium.com/99dotco/a-migration-misstep-from-redshift-to-bigquery-13e9000c3f50
Avoid query anti-patterns that impact performance in BigQuery.
https://towardsdatascience.com/how-to-use-partitions-and-clusters-in-bigquery-using-sql-ccf84c89dd65
A partitioned table is a special table that is divided into segments, called partitions, that make it easier to manage and query your data. By dividing a large table into smaller partitions, you can improve query performance, and you can control s by reducing the number of bytes read by a query.
You can partition BigQuery tables by:
- Time-unit column: Tables are partitioned based on a TIMESTAMP, DATE, or DATETIME column in the table.
- Ingestion time: Tables are partitioned based on the timestamp when BigQuery ingests the data.
- Integer range: Tables are partitioned based on an integer column.
If a query filters on the value of the partitioning column, BigQuery can scan the partitions that match the filter and skip the remaining partitions. This process is called pruning.
https://cloud.google.com/bigquery/docs/managing-partitioned-tables
A dataset is contained within a specific project. Datasets are top-level containers that are used to organize and control access to your tables and views. A table or view must belong to a dataset, so you need to create at least one dataset before loading data into BigQuery.
BigQuery datasets are subject to the following limitations:
- You can set the geographic location at creation time only. After a dataset has been created, the location becomes immutable and can't be changed by using the Cloud Console, using the bq command-line tool, or calling the patch or update API methods. All tables that are referenced in a query must be stored in datasets in the same location.
- When you copy a table, the datasets that contain the source table and destination table must reside in the same location.
- Dataset names must be unique for each project.
https://cloud.google.com/bigquery/docs/datasets-intro
After you create a dataset, you can update the following dataset properties:
- Description
- Default expiration time for new tables
- Default partition expiration for new partitioned tables
- Access controls
- Labels
https://cloud.google.com/bigquery/docs/updating-datasets
https://medium.com/cstech/google-bigquery-data-update-optimization-9d788bfe811b
https://cloud.google.com/bigquery/docs/updating-datasets#table-expiration
A BigQuery table contains individual records organized in rows. Each record is composed of columns (also called fields).
Every table is defined by a schema that describes the column names, data types, and other information. You can specify the schema of a table when it is created, or you can create a table without a schema and declare the schema in the query job or load job that first populates it with data.
BigQuery supports the following table types:
- Native tables: tables backed by native BigQuery storage.
- External tables: tables backed by storage external to BigQuery. For more information, see Querying External Data Sources.
- Views: Virtual tables defined by a SQL query. For more information, see Creating views.
https://cloud.google.com/bigquery/docs/tables-intro
https://cloud.google.com/bigquery/docs/schemas
When you create a clustered table in BigQuery, the table data is automatically organized based on the contents of one or more columns in the table’s schema. The columns you specify are used to colocate related data. When you cluster a table using multiple columns, the order of columns you specify is important. The order of the specified columns determines the sort order of the data.
Clustering can improve the performance of certain types of queries such as queries that use filter clauses and queries that aggregate data. When data is written to a clustered table by a query job or a load job, BigQuery sorts the data using the values in the clustering columns. These values are used to organize the data into multiple blocks in BigQuery storage. When you submit a query that contains a clause that filters data based on the clustering columns, BigQuery uses the sorted blocks to eliminate scans of unnecessary data. You might not see a significant difference in query performance between a clustered and unclustered table if the table or partition is under 1 GB.
https://cloud.google.com/bigquery/docs/clustered-tables
Both partitioning and clustering can improve performance and reduce query . Use clustering when you don't need strict guarantees before running the query. Use partitioning when you want to know query s before a query runs. You may prefer cluster over partitioning when partitioning results in a small amount of data per partition approximately less than 1GB.
https://cloud.google.com/bigquery/docs/partitioned-tables#partitioning_versus_clustering
A view is a virtual table defined by a SQL query. When you create a view, you query it in the same way you query a table. When a user queries the view, the query results contain data only from the tables and fields specified in the query that defines the view.
https://cloud.google.com/bigquery/docs/views-intro
In BigQuery, materialized views are precomputed views that periodically cache the results of a query for increased performance and efficiency. BigQuery leverages precomputed results from materialized views and whenever possible reads only delta changes from the base table to compute up-to-date results. Materialized views can be queried directly or can be used by the BigQuery optimizer to process queries to the base tables.
Queries that use materialized views are generally faster and consume fewer resources than queries that retrieve the same data only from the base table. Materialized views can significantly improve the performance of workloads that have the characteristic of common and repeated queries.
https://cloud.google.com/bigquery/docs/materialized-views-intro
https://cloud.google.com/blog/products/data-analytics/extending-bigquery-functions
https://cloud.google.com/bigquery/docs/reference/standard-sql/remote-functions
https://towardsdatascience.com/remote-functions-in-bigquery-af9921498438
BigQuery Reservations enables you to switch between on-demand pricing and flat-rate pricing. With flat-rate pricing, you purchase dedicated query processing capacity. You can allocate this capacity across your organization, by reserving pools of capacity for different projects or different parts of your organization. You can also combine the two billing models, taking advantage of both on-demand and flat-rate pricing.
https://cloud.google.com/bigquery/docs/reservations-intro
https://cloud.google.com/bigquery/docs/scheduling-queries
https://jimbeepbeep.medium.com/getting-started-with-bigquery-scripting-45bdd968010c
https://cloud.google.com/bigquery/docs/reference/standard-sql/scripting
https://cloud.google.com/blog/products/data-analytics/bigquery-audit-logs-pipelines-analysis
https://cloud.google.com/bigquery/docs/controlling-s
https://fares-daoud.medium.com/how-i-have-optimized-bigquery-s-for-my-company-948df95b9f0d
Resource Hierarchy.
https://cloud.google.com/bigquery/docs/resource-hierarchy
Dataform is a platform to manage data in BigQuery, Snowflake, Redshift, and other data warehouses.
https://cloud.google.com/bigquery/docs/reference/standard-sql/user-defined-functions
https://towardsdatascience.com/bigquery-udfs-complete-guide-181cbdaea55b
https://medium.com/codex/using-collation-in-google-bigquery-e63d34ee4799
https://tufin.medium.com/testable-bigquery-sql-61a911e35ab5
https://cloud.google.com/bigquery/docs/reference/standard-sql/transactions
https://cloud.google.com/bigquery/docs/sessions-write-queries
https://dev.to/stack-labs/bigquery-transactions-over-multiple-queries-with-sessions-2ll5
Data Fusion is built using open source project CDAP. It is a GUI based data integration service for building and managing data pipelines.
https://cloud.google.com/data-fusion/
https://codelabs.developers.google.com/codelabs/batch-csv-cdf-bq
Dremel is a scalable, interactive ad-hoc query system for analysis of read-only nested data. BigQuery is based on Dremel.
https://www.wired.com/2012/08/googles-dremel-makes-big-data-look-small/
Dremel has been in production at Google since 2006. A selection of use cases for Dremel at Google include analysis of:
- Crawled web documents
- Spam
- Build system results
- Crash reports
Further, there are two ways to use Dremel outside of Google. The first is Google’s BigQuery service, which Google provides as part of its cloud offering. The second is Apache Drill, effectively an open source re-implementation of Dremel. Cloudera Impala is also influenced by Dremel. As is Presto and Dremio. All these are addressing SQL over Hadoop issues.
http://www.goldsborough.me/distributed-systems/2019/05/18/21-09-00-a_look_at_dremel/
INFORMATION_SCHEMA is a series of views that provide access to metadata about datasets, routines, tables, views, jobs, reservations, and streaming data.
https://cloud.google.com/bigquery/docs/information-schema-intro
You can query the INFORMATION_SCHEMA.JOBS_TIMELINE_BY_* views to retrieve real-time BigQuery metadata by timeslice. This view contains currently running and completed jobs. Data is retained for 180 days.
https://cloud.google.com/bigquery/docs/information-schema-jobs-timeline
https://www.youtube.com/watch?v=1gYUGv_omJA
https://www.youtube.com/watch?v=STo98QUKDS8
https://levelup.gitconnected.com/enhancing-bigquery-search-features-with-search-index-771c1eec186e
Compare and review various Data Warehousing solutions: BigQuery, Snowflake and RedShift.
Star Schema in a data warehouse, in which the center of the star can have one fact table and a number of associated dimension tables. It is known as star schema as its structure resembles a star. The Star Schema data model is the simplest type of Data Warehouse schema. It is also known as Star Join Schema and is optimized for querying large data sets.
Snowflake Schema in a data warehouse is a logical arrangement of tables in a multidimensional database such that the ER diagram resembles a snowflake shape. A Snowflake Schema is an extension of a Star Schema, and it adds additional dimensions. The dimension tables are normalized which splits data into additional tables.
A Galaxy Schema contains two fact table that share dimension tables between them. It is also called Fact Constellation Schema. The schema is viewed as a collection of stars hence the name Galaxy Schema.
https://www.guru99.com/star-snowflake-data-warehousing.html
Denormalization is a strategy used on a previously-normalized database to increase performance. In computing, denormalization is the process of trying to improve the read performance of a database, at the expense of losing some write performance, by adding redundant copies of data or by grouping data.
https://cloud.google.com/bigquery/docs/reference/standard-sql/transactions
https://medium.com/@wojcikpawel/exactly-once-delivery-in-bigquerys-storage-write-api-67885c5c5e16
https://cloud.google.com/bigquery/public-data
https://medium.com/codex/bigquery-now-supporting-query-queues-378a65fdc9c1
BigQuery supports two different SQL dialects: standard SQL and legacy SQL. Legacy SQL may be useful if you want to test queries coming from legacy applications.
https://cloud.google.com/blog/topics/developers-practitioners/bigquery-explained-querying-your-data
BigQuery was developed as an internal product within Google and was initially realized to process log records. The query engine Dremel was able to support a limited set of SQL operations that are now defined as Legacy SQL.
https://cloud.google.com/bigquery/docs/reference/standard-sql/migrating-from-legacy-sql
https://cloud.google.com/blog/topics/developers-practitioners/shine-user-friendly-sql-bigquery
https://cloud.google.com/bigquery/docs/query-overview
https://cloud.google.com/bigquery/docs/analytics-hub-introduction
BigQuery BI Engine is a fast, in-memory analysis service that allows you to analyze data stored in BigQuery.
https://cloud.google.com/bigquery/docs/bi-engine-intro
https://cloud.google.com/blog/products/data-analytics/bigquery-bi-engine-generally-available
https://cloud.google.com/bigquery/docs/bi-engine-data-studio
https://cloud.google.com/bigquery/docs/analyze-data-looker
https://cloud.google.com/bigquery/docs/analyze-data-tableau
https://cloud.google.com/bigquery/docs/connected-sheets
https://cloud.google.com/bigquery/docs/bigquery-connector-for-excel
Data governance is a principled approach to manage data during its lifecycle — from acquisition, to use, to disposal. Your data governance program clearly outlines policies, procedures, responsibilities, and controls surrounding data activities. This program helps to ensure that information is collected, maintained, used, and disseminated in such a way that both meets your organization's data integrity and security needs, and also helps empower your employees to discover and use the data to its fullest potential.
https://cloud.google.com/bigquery/docs/data-governance
https://medium.com/@VishalBulbule/access-control-in-bigquery-d5d800654f47
https://cloud.google.com/bigquery/docs/access-control-examples
https://cloud.google.com/bigquery/docs/encryption-at-rest
https://cloud.google.com/bigquery/docs/column-level-security
https://medium.com/plumbersofdatascience/restrict-access-to-columns-on-bigquery-1550895b3356
https://medium.com/codex/google-improves-data-security-in-bigquery-195a90cc5b85
https://medium.com/plumbersofdatascience/dynamic-data-masking-on-bigquery-ae3d004b496c
With Connected Sheets, you can access, analyze, visualize, and share billions of rows of BigQuery data from your Sheets spreadsheet.
You can also do the following:
- Collaborate with partners, analysts, or other stakeholders in a familiar spreadsheet interface.
- Ensure a single source of truth for data analysis without additional spreadsheet exports.
- Streamline your reporting and dashboard workflows.
https://cloud.google.com/bigquery/docs/connected-sheets
BigQuery is a serverless data analytics platform. You don't need to provision individual instances or virtual machines to use BigQuery. Instead, BigQuery automatically allocates computing resources as you need them. You can also reserve compute capacity ahead of time in the form of slots, which represent virtual CPUs. The pricing structure of BigQuery reflects this design.
BigQuery pricing has two main components:
-
Analysis pricing is the to process queries, including SQL queries, user-defined functions, scripts, and certain data manipulation language (DML) and data definition language (DDL) statements that scan tables.
-
Storage pricing is the to store data that you load into BigQuery.
https://cloud.google.com/bigquery/pricing
https://cloud.google.com/blog/products/data-analytics/introducing-bigquery-flex-slots
https://cloud.google.com/bigquery/pricing#flat_rate_pricing
https://cloud.google.com/blog/topics/developers-practitioners/controlling-your-bigquery-costs
https://medium.com/google-cloud/bigquery-tell-me-your-region-i-will-tell-you-your-speed-41dcf42b8cc
https://cloud.google.com/bigquery/docs/sandbox
Looker is an enterprise platform for business intelligence, data applications, and embedded analytics. Looker helps you explore, share, and visualize your company's data so that you can make better business decisions.
https://cloud.google.com/bigquery/docs/looker
BigQuery GIS uniquely combines the serverless architecture of BigQuery with native support for geospatial analysis, so you can augment your analytics workflows with location intelligence.
https://mentin.medium.com/bigquery-geospatial-query-tricks-8ebb4453ab5e
BigQuery Omni is a flexible, fully managed, multi-cloud analytics solution that allows you to analyze data across clouds such as AWS and Azure.
https://medium.com/google-cloud/bigquery-omni-is-everywhere-afa2b5f64688
BigQuery ML enables users to create and execute machine learning models in BigQuery by using SQL queries.
https://medium.com/paypal-tech/comparing-bigquery-processing-and-spark-dataproc-4c90c10e31ac
https://betterprogramming.pub/4-ways-big-query-metadata-can-help-you-2cdf3b899fbc
https://medium.com/@erkan.ekser/how-to-keep-metadata-of-all-tables-in-bigquery-125516742bad
https://cloud.google.com/bigquery/docs/reference/standard-sql/scripting
https://jimbeepbeep.medium.com/getting-started-with-bigquery-scripting-45bdd968010c?p=4bb74216b8c8
https://cloud.google.com/bigquery/docs/pandas-gbq-migration
https://cloud.google.com/bigquery/docs/bigquery-storage-python-pandas
https://medium.com/codex/improved-storage-read-api-quotas-in-google-bigquery-f415a4c27bf1
https://cloud.google.com/bigquery/docs/visualize-jupyter
https://medium.com/codeshake/bigquery-101-how-to-tame-the-beast-part-3-212356720b18
https://cloud.google.com/bigquery/docs/tutorials
https://jimbeepbeep.medium.com/google-cloud-storage-gcs-to-bigquery-the-simple-way-4bb74216b8c8
https://blog.coupler.io/bigquery-tutorial/
https://cloud.google.com/dataproc/docs/tutorials/bigquery-connector-mapreduce-example
https://cloud.google.com/dataproc/docs/tutorials/bigquery-connector-spark-example
https://cloud.google.com/bigquery/docs/tutorials
https://cloud.google.com/bigquery/docs/share-access-views
https://cloud.google.com/bigquery/docs/gis-tutorial-hurricane
https://cloud.google.com/bigquery/docs/visualize-jupyter
https://medium.com/google-cloud/how-to-integrate-external-data-sources-with-bigquery-9e126d5751ea
https://cloud.google.com/bigquery/docs/gis-tutorial-hurricane
https://blog.fourninecloud.com/how-to-sync-data-from-mysql-to-bigquery-9ef980ef602c
https://towardsdatascience.com/3-ways-to-query-bigquery-in-python-66838f45cb43
https://cloud.google.com/blog/products/data-analytics/genomics-data-analytics-with-cloud-pt2
https://soumendra-mishra.medium.com/bigquery-dynamic-sql-using-jinja-template-5c1332317960
https://medium.com/@suzane.gregatti/project-overview-in-bigquery-with-dynamic-sql-846350f0c343
https://medium.com/@urruchua.xabier/bike-share-chicago-case-study-72444a268dd1
https://towardsdatascience.com/know-more-about-your-app-users-through-bigquery-4c0b6d67abfa
https://github.com/ploomber/ploomber
https://medium.com/google-cloud/simplifying-data-quality-analysis-808e9fb8667f
https://medium.com/google-cloud/streaming-data-into-bigquery-using-google-cloud-run-469365a731b9
https://towardsdatascience.com/migrating-from-aws-glue-to-bigquery-for-etl-ac12980f2036
https://medium.com/@michalwesleymnach/an-introduction-to-dynamic-sql-in-bigquery-4c8bb8d6dde7
https://towardsdatascience.com/the-fastest-way-to-fetch-bigquery-tables-352e2e26c9e1
https://towardsdatascience.com/bigquery-fetching-multiprocessing-dcb79de50108
https://medium.com/teads-engineering/managing-a-bigquery-data-warehouse-at-scale-e6ec9a8406b2
https://medium.com/geekculture/mathematical-functions-you-should-know-in-bigquery-ee674109be6d
https://towardsdatascience.com/slack-alerts-from-a-sql-query-on-bigquery-f626b767304c
https://medium.com/sardineai/open-sourcing-protobuf-to-bigquery-converter-c9168046b36b
https://towardsdev.com/retrieve-your-bigquery-query-history-with-nodejs-sdk-6671dc5be503
https://medium.com/codex/collaborate-better-with-data-versioning-566c2299c435
https://conalldalydev.medium.com/why-i-built-the-python-bigquery-validator-package-3f2b32e9bc5b
https://blog.devgenius.io/cool-bigquery-features-using-standard-sql-syntax-e7a47ef9b72e
https://chriskyfung.github.io/blog/qwiklabs/Insights-from-Data-with-BigQuery-Challenge-Lab
https://briansuk.medium.com/connecting-steampipe-with-google-bigquery-ae37f258090f
https://blog.devgenius.io/evaluate-arithmetic-expressions-without-values-using-bigquery-a4abd99f0932
https://medium.com/google-cloud/google-analytics-data-transfer-to-bigquery-fad388ae646a
https://medium.com/google-cloud/connect-oracle-to-bigquery-using-db-link-6f2040336a47
https://datadice.medium.com/raw-google-analytics-4-ga4-data-in-bigquery-bq-9cb776ce1f3d
https://towardsdatascience.com/measuring-string-similarity-in-bigquery-using-sql-33c490638c89
https://medium.com/google-cloud/streaming-json-messages-into-bigquery-json-type-column-7b9702a49a36
https://www.qwiklabs.com/quests/69
Build and Optimize Data Warehouses with BigQuery
Insights from Data with BigQuery
Create ML Models with BigQuery ML
NCAA® March Madness®: Bracketology with Google Cloud