diff --git a/quickstarts/images/dagster/dagster-teradata-azure1.png b/quickstarts/images/dagster/dagster-teradata-azure1.png new file mode 100644 index 0000000000..65018d3fec Binary files /dev/null and b/quickstarts/images/dagster/dagster-teradata-azure1.png differ diff --git a/quickstarts/images/dagster/dagster-teradata-azure2.png b/quickstarts/images/dagster/dagster-teradata-azure2.png new file mode 100644 index 0000000000..7b3047158b Binary files /dev/null and b/quickstarts/images/dagster/dagster-teradata-azure2.png differ diff --git a/quickstarts/images/dagster/dagster-teradata-azure3.png b/quickstarts/images/dagster/dagster-teradata-azure3.png new file mode 100644 index 0000000000..1c1f4ef347 Binary files /dev/null and b/quickstarts/images/dagster/dagster-teradata-azure3.png differ diff --git a/quickstarts/images/dagster/dagster-teradata-s31.png b/quickstarts/images/dagster/dagster-teradata-s31.png new file mode 100644 index 0000000000..8fec571f05 Binary files /dev/null and b/quickstarts/images/dagster/dagster-teradata-s31.png differ diff --git a/quickstarts/images/dagster/dagster-teradata-s32.png b/quickstarts/images/dagster/dagster-teradata-s32.png new file mode 100644 index 0000000000..5366189e32 Binary files /dev/null and b/quickstarts/images/dagster/dagster-teradata-s32.png differ diff --git a/quickstarts/images/dagster/dagster-teradata-s33.png b/quickstarts/images/dagster/dagster-teradata-s33.png new file mode 100644 index 0000000000..b7a8dcdabf Binary files /dev/null and b/quickstarts/images/dagster/dagster-teradata-s33.png differ diff --git a/quickstarts/images/dagster/dagster-teradata1.png b/quickstarts/images/dagster/dagster-teradata1.png new file mode 100644 index 0000000000..67c6d27e19 Binary files /dev/null and b/quickstarts/images/dagster/dagster-teradata1.png differ diff --git a/quickstarts/images/dagster/dagster-teradata2.png b/quickstarts/images/dagster/dagster-teradata2.png new file mode 100644 index 0000000000..2160a0b633 Binary files /dev/null and b/quickstarts/images/dagster/dagster-teradata2.png differ diff --git a/quickstarts/manage-data/dagster-teradata-azure-to-teradata-transfer.md b/quickstarts/manage-data/dagster-teradata-azure-to-teradata-transfer.md new file mode 100644 index 0000000000..48201bbafd --- /dev/null +++ b/quickstarts/manage-data/dagster-teradata-azure-to-teradata-transfer.md @@ -0,0 +1,241 @@ +--- +sidebar_position: 4.7 +author: Mohan Talla +email: mohan.talla@teradata.com +page_last_update: February 5th, 2025 +description: Transferring CSV, JSON, and Parquet data from Azure Blob Storage to Teradata Vantage with dagster-teradata +keywords: [data warehouses, teradata, vantage, transfer, cloud data platform, object storage, business intelligence, enterprise analytics, dagster, dagster-teradata, microsoft azure blob storage] +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import ClearscapeDocsNote from '../_partials/vantage_clearscape_analytics.mdx' +import InstallTabs from '../_partials/tabsDBT.mdx' + +# Data Transfer from Azure Blob to Teradata Vantage Using dagster-teradata + +## Overview + +This document provides instructions and guidance for transferring data in CSV, JSON and Parquet formats from Microsoft Azure Blob Storage to Teradata Vantage using **dagster-teradata**. It outlines the setup, configuration and execution steps required to establish a seamless data transfer pipeline between these platforms. + +## Prerequisites + +* Access to a Teradata Vantage instance. + + + +* Python **3.9** or higher, Python **3.12** is recommended. +* pip + +## Setting Up a Virtual Enviroment + +A virtual environment is recommended to isolate project dependencies and avoid conflicts with system-wide Python packages. Here’s how to set it up: + + + +## Install dagster and dagster-teradata + +With your virtual environment active, the next step is to install dagster and the Teradata provider package (dagster-teradata) to interact with Teradata Vantage. + +1. Install the Required Packages: + + ```bash + pip install dagster dagster-webserver dagster-teradata[azure] + ``` + +2. Verify the Installation: + + To confirm that Dagster is correctly installed, run: + ```bash + dagster –version + ``` + If installed correctly, it should show the version of Dagster. + + +## Initialize a Dagster Project + +Now that you have the necessary packages installed, the next step is to create a new Dagster project. + +### Scaffold a New Dagster Project + +Run the following command: + +```bash +dagster project scaffold --name dagster-teradata-azure + ``` +This command will create a new project named dagster-teradata-azure. It will automatically generate the following directory structure: + +```bash +dagster-teradata-azure +│ pyproject.toml +│ README.md +│ setup.cfg +│ setup.py +│ +├───dagster_teradata_azure +│ assets.py +│ definitions.py +│ __init__.py +│ +└───dagster_teradata_azure_tests + test_assets.py + __init__.py + ``` + +Refer [here](https://docs.dagster.io/guides/build/projects/dagster-project-file-reference) to know more above this directory structure + +You need to modify the `definitions.py` file inside the `jaffle_dagster/jaffle_dagster` directory. + +### Step 1: Open `definitions.py` in `dagster-teradata-azure/dagster-teradata-azure` Directory +Locate and open the file where Dagster job definitions are configured. +This file manages resources, jobs, and assets needed for the Dagster project. + +### Step 2: Implement Azure to Teradata Transfer in Dagster + +``` python +import os + +from dagster import job, op, Definitions, EnvVar, DagsterError +from dagster_azure.adls2 import ADLS2Resource, ADLS2SASToken +from dagster_teradata import TeradataResource, teradata_resource + +azure_resource = ADLS2Resource( + storage_account="", + credential=ADLS2SASToken(token=""), +) + +td_resource = TeradataResource( + host=os.getenv("TERADATA_HOST"), + user=os.getenv("TERADATA_USER"), + password=os.getenv("TERADATA_PASSWORD"), + database=os.getenv("TERADATA_DATABASE"), +) + +@op(required_resource_keys={"teradata"}) +def drop_existing_table(context): + context.resources.teradata.drop_table("people") + return "Tables Dropped" + +@op(required_resource_keys={"teradata", "azure"}) +def ingest_azure_to_teradata(context, status): + if status == "Tables Dropped": + context.resources.teradata.azure_blob_to_teradata(azure_resource, "/az/akiaxox5jikeotfww4ul.blob.core.windows.net/td-usgs/CSVDATA/09380000/2018/06/", "people", True) + else: + raise DagsterError("Tables not dropped") + +@job(resource_defs={"teradata": td_resource, "azure": azure_resource}) +def example_job(): + ingest_azure_to_teradata(drop_existing_table()) + +defs = Definitions( + jobs=[example_job] +) +``` + +### Explanation of the Code + +1. **Resource Setup**: + - The code sets up two resources: one for **Azure Data Lake Storage** (ADLS2) and one for **Teradata**. + - **Azure Blob Storage**: + - For a **public bucket**, the `storage_account` and `credential` (SAS token) are left empty. + - For a **private bucket**, the `storage_account` (Azure Storage account name) and a valid SAS `credential` are required for access. + - **Teradata resource**: The `teradata_resource` is configured using credentials pulled from environment variables (`TERADATA_HOST`, `TERADATA_USER`, `TERADATA_PASSWORD`, `TERADATA_DATABASE`). + +2. **Operations**: + - **`drop_existing_table`**: This operation drops the "people" table in Teradata using the `teradata_resource`. + - **`ingest_azure_to_teradata`**: This operation checks if the "people" table was successfully dropped. If the table is dropped successfully, it loads data from Azure Blob Storage into Teradata. The data is ingested using the `azure_blob_to_teradata` method, which fetches data from the specified Azure Blob Storage path. + +3. **Job Execution**: + - The **`example_job`** runs the operations in sequence. First, it drops the table, and if successful, it transfers data from the Azure Blob Storage (either public or private) to Teradata. + +This setup allows for dynamic handling of both **public** and **private Azure Blob Storage** configurations while transferring data into Teradata. + +## Running the Pipeline + +After setting up the project, you can now run your Dagster pipeline: + +1. **Start the Dagster Dev Server:** In your terminal, navigate to the root directory of your project and run: +dagster dev +After executing the command dagster dev, the Dagster logs will be displayed directly in the terminal. Any errors encountered during startup will also be logged here. Once you see a message similar to: + ```bash + 2025-02-04 09:15:46 +0530 - dagster-webserver - INFO - Serving dagster-webserver on http://127.0.0.1:3000 in process 32564, + ``` + It indicates that the Dagster web server is running successfully. At this point, you can proceed to the next step. + +2. **Access the Dagster UI:** Open a web browser and navigate to http://127.0.0.1:3000. This will open the Dagster UI where you can manage and monitor your pipelines. + +![dagster-teradata-azure1.png](../images/dagster/dagster-teradata-azure1.png) + +In the Dagster UI, you will see the following: + +- The job **`example_job`** is displayed, along with the associated dbt asset. +- The dbt asset is organized under the **"default"** asset group. +- In the middle, you can view the **lineage** of each `@op`, showing its dependencies and how each operation is related to others. + +![dagster-teradata-azure2.png](../images/dagster/dagster-teradata-azure2.png) + +Go to the **"Launchpad"** and provide the configuration for the **TeradataResource** as follows: + +```yaml +resources: + teradata: + config: + host: + user: + password: + database: +``` +Replace `, , and ` with the actual hostname and credentials of the Teradata VantageCloud Lake instance. + +Once the configuration is done, click on **"Launch Run"** to start the process. + +![dagster-teradata-azure3.png](../images/dagster/dagster-teradata-azure3.png) + +The Dagster UI allows you to visualize the pipeline's progress, view logs, and inspect the status of each step. + +## Arguments Supported by `azure_blob_to_teradata` + +- **azure (ADLS2Resource)**: + The `ADLS2Resource` object used to interact with the Azure Blob Storage. + +- **blob_source_key (str)**: + The URI specifying the location of the Azure Blob object. The format is: + `/az/YOUR-STORAGE-ACCOUNT.blob.core.windows.net/YOUR-CONTAINER/YOUR-BLOB-LOCATION` + For more details, refer to the Teradata documentation: + [Teradata Documentation - Native Object Store](https://docs.teradata.com/search/documents?query=native+object+store&sort=last_update&virtual-field=title_only&content-lang=en-US) + +- **teradata_table (str)**: + The name of the Teradata table where the data will be loaded. + +- **public_bucket (bool, optional)**: + Indicates whether the Azure Blob container is public. If `True`, the objects in the container can be accessed without authentication. + Defaults to `False`. + +- **teradata_authorization_name (str, optional)**: + The name of the Teradata Authorization Database Object used to control access to the Azure Blob object store. This is required for secure access to private containers. + Defaults to an empty string. + For more details, refer to the documentation: + [Teradata Vantage Native Object Store - Setting Up Access](https://docs.teradata.com/r/Enterprise_IntelliFlex_VMware/Teradata-VantageTM-Native-Object-Store-Getting-Started-Guide-17.20/Setting-Up-Access/Controlling-Foreign-Table-Access-with-an-AUTHORIZATION-Object) + +## Transfer data from Private Blob Storage Container to Teradata instance +To successfully transfer data from a Private Blob Storage Container to a Teradata instance, the following prerequisites are necessary. + +* An Azure account. You can start with a [free account](https://azure.microsoft.com/free/). +* Create an [Azure storage account](https://docs.microsoft.com/en-us/azure/storage/common/storage-quickstart-create-account?tabs=azure-portal) +* Create a [blob container](https://learn.microsoft.com/en-us/azure/storage/blobs/blob-containers-portal) under Azure storage account +* [Upload](https://learn.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-portal) CSV/JSON/Parquest format files to blob container +* Create a Teradata Authorization object with the Azure Blob Storage Account and the Account Secret Key + + ``` sql + CREATE AUTHORIZATION azure_authorization USER 'azuretestquickstart' PASSWORD 'AZURE_BLOB_ACCOUNT_SECRET_KEY' + ``` + + :::note + Replace `AZURE_BLOB_ACCOUNT_SECRET_KEY` with Azure storage account `azuretestquickstart` [access key](https://learn.microsoft.com/en-us/azure/storage/common/storage-account-keys-manage?toc=%2Fazure%2Fstorage%2Fblobs%2Ftoc.json&bc=%2Fazure%2Fstorage%2Fblobs%2Fbreadcrumb%2Ftoc.json&tabs=azure-portal) + ::: + +## Summary +This guide details the utilization of the dagster-teradata to seamlessly transfer CSV, JSON, and Parquet data from Microsoft Azure Blob Storage to Teradata Vantage, facilitating streamlined data operations between these platforms. + +## Further reading +* [Teradata Authorization](https://docs.teradata.com/r/Enterprise_IntelliFlex_VMware/SQL-Data-Definition-Language-Syntax-and-Examples/Authorization-Statements-for-External-Routines/CREATE-AUTHORIZATION-and-REPLACE-AUTHORIZATION) diff --git a/quickstarts/manage-data/dagster-teradata-s3-to-teradata-transfer.md b/quickstarts/manage-data/dagster-teradata-s3-to-teradata-transfer.md new file mode 100644 index 0000000000..db180fdda2 --- /dev/null +++ b/quickstarts/manage-data/dagster-teradata-s3-to-teradata-transfer.md @@ -0,0 +1,229 @@ +--- +sidebar_position: 4.6 +author: Mohan Talla +email: mohan.talla@teradata.com +page_last_update: February 5th, 2025 +description: Transferring CSV, JSON, and Parquet data from AWS S3 Storage to Teradata Vantage with dagster-teradata +keywords: [data warehouses, teradata, vantage, transfer, cloud data platform, object storage, business intelligence, enterprise analytics, dagster, dagster-teradata, aws s3 storage] +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import ClearscapeDocsNote from '../_partials/vantage_clearscape_analytics.mdx' +import InstallTabs from '../_partials/tabsDBT.mdx' + +# Data Transfer from AWS S3 to Teradata Vantage Using dagster-teradata + +## Overview + +This document provides instructions and guidance for transferring data in CSV, JSON and Parquet formats from AWS S3 to Teradata Vantage using **dagster-teradata**. It outlines the setup, configuration and execution steps required to establish a seamless data transfer pipeline between these platforms. + +## Prerequisites + +* Access to a Teradata Vantage instance. + + + +* Python **3.9** or higher, Python **3.12** is recommended. +* pip + +## Setting Up a Virtual Enviroment + +A virtual environment is recommended to isolate project dependencies and avoid conflicts with system-wide Python packages. Here’s how to set it up: + + + +## Install dagster and dagster-teradata + +With your virtual environment active, the next step is to install dagster and the Teradata provider package (dagster-teradata) to interact with Teradata Vantage. + +1. Install the Required Packages: + + ```bash + pip install dagster dagster-webserver dagster-teradata[aws] + ``` + +2. Verify the Installation: + + To confirm that Dagster is correctly installed, run: + ```bash + dagster –version + ``` + If installed correctly, it should show the version of Dagster. + + +## Initialize a Dagster Project + +Now that you have the necessary packages installed, the next step is to create a new Dagster project. + +### Scaffold a New Dagster Project + +Run the following command: + +```bash +dagster project scaffold --name dagster-teradata-s3 + ``` +This command will create a new project named dagster-teradata-s3. It will automatically generate the following directory structure: + +```bash +dagster-teradata-s3 +│ pyproject.toml +│ README.md +│ setup.cfg +│ setup.py +│ +├───dagster_teradata_s3 +│ assets.py +│ definitions.py +│ __init__.py +│ +└───dagster_teradata_s3_tests + test_assets.py + __init__.py + ``` + +Refer [here](https://docs.dagster.io/guides/build/projects/dagster-project-file-reference) to know more above this directory structure + +You need to modify the `definitions.py` file inside the `jaffle_dagster/jaffle_dagster` directory. + +### Step 1: Open `definitions.py` in `dagster-teradata-s3/dagster-teradata-s3` Directory +Locate and open the file where Dagster job definitions are configured. +This file manages resources, jobs, and assets needed for the Dagster project. + +### Step 2: Implement AWS S3 to Teradata Transfer in Dagster + +``` python +import os + +from dagster import job, op, Definitions, EnvVar, DagsterError +from dagster_aws.s3 import S3Resource, s3_resource +from dagster_teradata import TeradataResource, teradata_resource + +s3_resource = S3Resource( + aws_access_key_id=os.getenv("AWS_ACCESS_KEY_ID"), + aws_secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY"), + aws_session_token=os.getenv("AWS_SESSION_TOKEN"), +) + +td_resource = TeradataResource( + host=os.getenv("TERADATA_HOST"), + user=os.getenv("TERADATA_USER"), + password=os.getenv("TERADATA_PASSWORD"), + database=os.getenv("TERADATA_DATABASE"), +) + +@op(required_resource_keys={"teradata"}) +def drop_existing_table(context): + context.resources.teradata.drop_table("people") + return "Tables Dropped" + +@op(required_resource_keys={"teradata", "s3"}) +def ingest_s3_to_teradata(context, status): + if status == "Tables Dropped": + context.resources.teradata.s3_to_teradata(s3_resource, os.getenv("AWS_S3_LOCATION"), "people") + else: + raise DagsterError("Tables not dropped") + +@job(resource_defs={"teradata": td_resource, "s3": s3_resource}) +def example_job(): + ingest_s3_to_teradata(drop_existing_table()) + +defs = Definitions( + jobs=[example_job] +) +``` + +### Explanation of the Code + +1. **Resource Configuration for S3 and Teradata**: + - The code configures resources for interacting with S3 and Teradata. + - The `S3Resource` is created using AWS credentials (access key, secret key, and session token) from environment variables. + - The `TeradataResource` is set up with connection details (host, user, password, database) for Teradata from environment variables. + +2. **Defining Operations**: + - `drop_existing_table`: This operation uses the Teradata resource to drop the "people" table in Teradata. + - `ingest_s3_to_teradata`: This operation checks if the "Tables Dropped" status was returned from the previous operation. If true, it ingests data from an S3 bucket to the Teradata table `people` using the S3 resource. If the table wasn't dropped, it raises an error. + +3. **Job Execution**: + - The `example_job` is defined to execute the two operations sequentially: first, drop the existing table, and then ingest data from S3 to Teradata. + - The job is registered under the `Definitions` object for execution within the Dagster environment. + + +## Running the Pipeline + +After setting up the project, you can now run your Dagster pipeline: + +1. **Start the Dagster Dev Server:** In your terminal, navigate to the root directory of your project and run: +dagster dev +After executing the command dagster dev, the Dagster logs will be displayed directly in the terminal. Any errors encountered during startup will also be logged here. Once you see a message similar to: + ```bash + 2025-02-04 09:15:46 +0530 - dagster-webserver - INFO - Serving dagster-webserver on http://127.0.0.1:3000 in process 32564, + ``` + It indicates that the Dagster web server is running successfully. At this point, you can proceed to the next step. + +2. **Access the Dagster UI:** Open a web browser and navigate to http://127.0.0.1:3000. This will open the Dagster UI where you can manage and monitor your pipelines. + +![dagster-teradata-s31.png](../images/dagster/dagster-teradata-s31.png) + +In the Dagster UI, you will see the following: + +- The job **`example_job`** is displayed, along with the associated dbt asset. +- The dbt asset is organized under the **"default"** asset group. +- In the middle, you can view the **lineage** of each `@op`, showing its dependencies and how each operation is related to others. + +![dagster-teradata-s32.png](../images/dagster/dagster-teradata-s32.png) + +Go to the **"Launchpad"** and provide the configuration for the **TeradataResource** as follows: + +```yaml +resources: + s3: + config: + aws_access_key_id: + aws_secret_access_key: + aws_session_token: + max_attempts: 5 + use_ssl: true + use_unsigned_session: false + teradata: + config: + host: + user: s + password: + database: +``` +Replace `, , , , , , and ` with the actual values for your S3 and Teradata configuration. +Once the configuration is done, click on **"Launch Run"** to start the process. + +![dagster-teradata-s33.png](../images/dagster/dagster-teradata-s33.png) + +The Dagster UI allows you to visualize the pipeline's progress, view logs, and inspect the status of each step. + +## Arguments Supported by `s3_blob_to_teradata` + +- **s3 (S3Resource)**: + The `S3Resource` object used to interact with the S3 bucket. + +- **s3_source_key (str)**: + The URI specifying the location of the S3 bucket. The URI format is: + `/s3/YOUR-BUCKET.s3.amazonaws.com/YOUR-BUCKET-NAME` + For more details, refer to: + [Teradata Documentation - Native Object Store](https://docs.teradata.com/search/documents?query=native+object+store&sort=last_update&virtual-field=title_only&content-lang=en-US) + +- **teradata_table (str)**: + The name of the Teradata table to which the data will be loaded. + +- **public_bucket (bool)**: + Indicates whether the provided S3 bucket is public. If `True`, the objects within the bucket can be accessed via a URL without authentication. If `False`, the bucket is considered private, and authentication must be provided. + Defaults to `False`. + +- **teradata_authorization_name (str)**: + The name of the Teradata Authorization Database Object, which controls access to the S3 object store. + For more details, refer to: + [Teradata Vantage Native Object Store - Setting Up Access](https://docs.teradata.com/r/Enterprise_IntelliFlex_VMware/Teradata-VantageTM-Native-Object-Store-Getting-Started-Guide-17.20/Setting-Up-Access/Controlling-Foreign-Table-Access-with-an-AUTHORIZATION-Object) + +## Summary +This guide details the utilization of the dagster-teradata to seamlessly transfer CSV, JSON, and Parquet data from AWS S3 Storage to Teradata Vantage, facilitating streamlined data operations between these platforms. + +## Further reading +* [Teradata Authorization](https://docs.teradata.com/r/Enterprise_IntelliFlex_VMware/SQL-Data-Definition-Language-Syntax-and-Examples/Authorization-Statements-for-External-Routines/CREATE-AUTHORIZATION-and-REPLACE-AUTHORIZATION) diff --git a/quickstarts/manage-data/use-dagster-with-teradata-vantage.md b/quickstarts/manage-data/use-dagster-with-teradata-vantage.md new file mode 100644 index 0000000000..8ecc230dab --- /dev/null +++ b/quickstarts/manage-data/use-dagster-with-teradata-vantage.md @@ -0,0 +1,268 @@ +--- +id: use-dagster-with-teradata-vantage +sidebar_position: 4.5 +author: Mohan Talla +email: mohan.talla@teradata.com +description: Use dagster-teradata with Teradata Vantage. +keywords: [dagster, dagster-teradata, data warehouses, compute storage separation, teradata, vantage, cloud data platform, object storage, business intelligence, enterprise analytics, elt] +--- + +import ClearscapeDocsNote from '../_partials/vantage_clearscape_analytics.mdx' +import InstallTabs from '../_partials/tabsDBT.mdx' + +# dagster-teradata with Teradata Vantage + +This guide walks you through integrating Dagster with Teradata Vantage to create and manage ETL pipelines. It provides step-by-step instructions for installing and configuring the necessary packages, setting up a Dagster project, and implementing a pipeline that interacts with Teradata Vantage. + +## Dagster + +* Dagster is a data orchestrator built for data engineers, with integrated lineage, observability, a declarative programming model and best-in-class testability. +* Data pipelines are automated workflows that ingest raw data, process it through various transformations (such as cleaning and structuring), and produce a final, usable format—much like an assembly line for data. +* Dagster orchestrates this process by defining each stage of the pipeline, ensuring tasks execute in the correct sequence and at scheduled intervals. It provides a structured way to manage dependencies, track execution, and maintain reliable data workflows. +* Dagster orchestrates dbt alongside other technologies. Dagster's asset-oriented approach allows Dagster to understand dbt at the level of individual dbt models. + + +## Prerequisites + +* Access to a Teradata Vantage instance. + + + +* Python **3.9** or higher, Python **3.12** is recommended. +* pip + +## Setting Up a Virtual Enviroment + +A virtual environment is recommended to isolate project dependencies and avoid conflicts with system-wide Python packages. Here’s how to set it up: + + + +## Install dagster and dagster-teradata + +With your virtual environment active, the next step is to install dagster and the Teradata provider package (dagster-teradata) to interact with Teradata Vantage. + +1. Install the Required Packages: + + ```bash + pip install dagster dagster-webserver dagster-teradata + ``` + +2. Note about Optional Dependencies: + + a) `dagster-teradata` relies on dagster-aws for ingesting data from an S3 bucket into Teradata Vantage. Since `dagster-aws` is an optional dependency, users can install it by running: + + ```bash + pip install dagster-teradata[aws] + ``` + b) `dagster-teradata` also relies on `dagster-azure` for ingesting data from an Azure Blob Storage container into Teradata Vantage. To install this dependency, run: + + ```bash + pip install dagster-teradata[azure] + ``` + +3. Verify the Installation: + + To confirm that Dagster is correctly installed, run: + ```bash + dagster –version + ``` + If installed correctly, it should show the version of Dagster. + + +## Initialize a Dagster Project + +Now that you have the necessary packages installed, the next step is to create a new Dagster project. + +### Scaffold a New Dagster Project + +Run the following command: + +```bash +dagster project scaffold --name dagster-quickstart + ``` +This command will create a new project named dagster-quickstart. It will automatically generate the following directory structure: + +```bash +dagster-quickstart +│ pyproject.toml +│ README.md +│ setup.cfg +│ setup.py +│ +├───dagster_quickstart +│ assets.py +│ definitions.py +│ __init__.py +│ +└───dagster_quickstart_tests + test_assets.py + __init__.py + ``` + +Refer [here](https://docs.dagster.io/guides/build/projects/dagster-project-file-reference) to know more above this directory structure + +## Create Sample Data + +To simulate an ETL pipeline, create a CSV file with sample data that your pipeline will process. + +**Create the CSV File:** Inside the dagster_quickstart/data/ directory, create a file named sample_data.csv with the following content: + +```bash +id,name,age,city +1,Alice,28,New York +2,Bob,35,San Francisco +3,Charlie,42,Chicago +4,Diana,31,Los Angeles + ``` +This file represents sample data that will be used as input for your ETL pipeline. + +## Define Assets for the ETL Pipeline + +Now, we’ll define a series of assets for the ETL pipeline inside the assets.py file. + +Edit the assets.py File: Open the dagster_quickstart/assets.py file and add the following code to define the pipeline: + +```python +import pandas as pd +from dagster import asset + +@asset(required_resource_keys={"teradata"}) +def read_csv_file(context): + df = pd.read_csv("dagster_quickstart/data/sample_data.csv") + context.log.info(df) + return df + +@asset(required_resource_keys={"teradata"}) +def drop_table(context): + result = context.resources.teradata.drop_table(["tmp_table"]) + context.log.info(result) + +@asset(required_resource_keys={"teradata"}) +def create_table(context, drop_table): + result = context.resources.teradata.execute_query('''CREATE TABLE tmp_table ( + id INTEGER, + name VARCHAR(50), + age INTEGER, + city VARCHAR(50));''') + context.log.info(result) + +@asset(required_resource_keys={"teradata"}, deps=[read_csv_file]) +def insert_rows(context, create_table, read_csv_file): + data_tuples = [tuple(row) for row in read_csv_file.to_numpy()] + for row in data_tuples: + result = context.resources.teradata.execute_query( + f"INSERT INTO tmp_table (id, name, age, city) VALUES ({row[0]}, '{row[1]}', {row[2]}, '{row[3]}');" + ) + context.log.info(result) + +@asset(required_resource_keys={"teradata"}) +def read_table(context, insert_rows): + result = context.resources.teradata.execute_query("select * from tmp_table;", True) + context.log.info(result) + +``` + +This Dagster pipeline defines a series of assets that interact with Teradata. It starts by reading data from a CSV file, then drops and recreates a table in Teradata. After that, it inserts rows from the CSV into the table and finally retrieves the data from the table. + +## Define the Pipeline Definitions + +The next step is to configure the pipeline by defining the necessary resources and jobs. + +**Edit the definitions.py File**: Open dagster_quickstart/definitions.py and define your Dagster pipeline as follows: + +```python +from dagster import EnvVar, Definitions +from dagster_teradata import TeradataResource + +from .assets import read_csv_file, read_table, create_table, drop_table, insert_rows + +# Define the pipeline and resources +defs = Definitions( + assets=[read_csv_file, read_table, create_table, drop_table, insert_rows], + resources={ + "teradata": TeradataResource( + host=EnvVar("TERADATA_HOST"), + user=EnvVar("TERADATA_USER"), + password=EnvVar("TERADATA_PASSWORD"), + database=EnvVar("TERADATA_DATABASE"), + ) + } +) +``` + +This code sets up a Dagster project that interacts with Teradata by defining assets and resources + +1. It imports necessary modules, including pandas, Dagster, and dagster-teradata. +2. It imports asset functions (read_csv_file, read_table, create_table, drop_table, insert_rows) from the assets.py module. +3. It registers these assets with Dagster using Definitions, allowing Dagster to track and execute them. +4. It defines a Teradata resource (TeradataResource) that reads database connection details from environment variables (TERADATA_HOST, TERADATA_USER, TERADATA_PASSWORD, TERADATA_DATABASE). + +## Running the Pipeline + +After setting up the project, you can now run your Dagster pipeline: + +1. **Start the Dagster Dev Server:** In your terminal, navigate to the root directory of your project and run: +dagster dev +After executing the command dagster dev, the Dagster logs will be displayed directly in the terminal. Any errors encountered during startup will also be logged here. Once you see a message similar to: + ```bash + 2025-02-04 09:15:46 +0530 - dagster-webserver - INFO - Serving dagster-webserver on http://127.0.0.1:3000 in process 32564, + ``` + It indicates that the Dagster web server is running successfully. At this point, you can proceed to the next step. +
+2. **Access the Dagster UI:** Open a web browser and navigate to http://127.0.0.1:3000. This will open the Dagster UI where you can manage and monitor your pipelines. +
+ ![dagster-teradata1.png](../images/dagster/dagster-teradata1.png) +
+3. **Run the Pipeline:** +* In the top navigation of the Dagster UI, click on Assets > View global asset lineage. +* Click Materialize to execute the pipeline. +* In the popup window, click View to see the details of the pipeline run. +
+ ![dagster-teradata2.png](../images/dagster/dagster-teradata2.png) +
+ +4. **Monitor the Run:** The Dagster UI allows you to visualize the pipeline's progress, view logs, and inspect the status of each step. You can switch between different views to see the execution logs and metadata for each asset. + +## Below are some of the operations provided by the TeradataResource: + +### 1. Execute a Query (`execute_query`) + +This operation executes a SQL query within Teradata Vantage. + +**Args:** +- `sql` (str) – The query to be executed. +- `fetch_results` (bool, optional) – If True, fetch the query results. Defaults to False. +- `single_result_row` (bool, optional) – If True, return only the first row of the result set. Effective only if `fetch_results` is True. Defaults to False. + +### 2. Execute Multiple Queries (`execute_queries`) + +This operation executes a series of SQL queries within Teradata Vantage. + +**Args:** +- `sql_queries` (Sequence[str]) – List of queries to be executed in series. +- `fetch_results` (bool, optional) – If True, fetch the query results. Defaults to False. +- `single_result_row` (bool, optional) – If True, return only the first row of the result set. Effective only if `fetch_results` is True. Defaults to False. + +### 3. Drop a Database (`drop_database`) + +This operation drops one or more databases from Teradata Vantage. + +**Args:** +- `databases` (Union[str, Sequence[str]]) – Database name or list of database names to drop. + +### 4. Drop a Table (`drop_table`) + +This operation drops one or more tables from Teradata Vantage. + +**Args:** +- `tables` (Union[str, Sequence[str]]) – Table name or list of table names to drop. + + +## Summary +This guide provides a step-by-step approach to integrating Dagster with Teradata Vantage for building ETL pipelines + +## Further reading +* https://docs.dagster.io/ +* https://docs.dagster.io/getting-started/quickstart +* https://docs.dagster.io/getting-started/installation +* https://docs.dagster.io/etl-pipeline-tutorial/ diff --git a/quickstarts/vantagecloud-lake/images/vantagecloud-lake-compute-cluster-dagster/dbt-dagster-teradata1.png b/quickstarts/vantagecloud-lake/images/vantagecloud-lake-compute-cluster-dagster/dbt-dagster-teradata1.png new file mode 100644 index 0000000000..5ed04773d5 Binary files /dev/null and b/quickstarts/vantagecloud-lake/images/vantagecloud-lake-compute-cluster-dagster/dbt-dagster-teradata1.png differ diff --git a/quickstarts/vantagecloud-lake/images/vantagecloud-lake-compute-cluster-dagster/dbt-dagster-teradata2.png b/quickstarts/vantagecloud-lake/images/vantagecloud-lake-compute-cluster-dagster/dbt-dagster-teradata2.png new file mode 100644 index 0000000000..b49b6946df Binary files /dev/null and b/quickstarts/vantagecloud-lake/images/vantagecloud-lake-compute-cluster-dagster/dbt-dagster-teradata2.png differ diff --git a/quickstarts/vantagecloud-lake/images/vantagecloud-lake-compute-cluster-dagster/dbt-dagster-teradata3.png b/quickstarts/vantagecloud-lake/images/vantagecloud-lake-compute-cluster-dagster/dbt-dagster-teradata3.png new file mode 100644 index 0000000000..3b3501c469 Binary files /dev/null and b/quickstarts/vantagecloud-lake/images/vantagecloud-lake-compute-cluster-dagster/dbt-dagster-teradata3.png differ diff --git a/quickstarts/vantagecloud-lake/images/vantagecloud-lake-compute-cluster-dagster/dbt-dagster-teradata4.png b/quickstarts/vantagecloud-lake/images/vantagecloud-lake-compute-cluster-dagster/dbt-dagster-teradata4.png new file mode 100644 index 0000000000..3293e6bfc4 Binary files /dev/null and b/quickstarts/vantagecloud-lake/images/vantagecloud-lake-compute-cluster-dagster/dbt-dagster-teradata4.png differ diff --git a/quickstarts/vantagecloud-lake/images/vantagecloud-lake-compute-cluster-dagster/dbt-dagster-teradata5.png b/quickstarts/vantagecloud-lake/images/vantagecloud-lake-compute-cluster-dagster/dbt-dagster-teradata5.png new file mode 100644 index 0000000000..cfbbee5a51 Binary files /dev/null and b/quickstarts/vantagecloud-lake/images/vantagecloud-lake-compute-cluster-dagster/dbt-dagster-teradata5.png differ diff --git a/quickstarts/vantagecloud-lake/vantagecloud-lake-compute-cluster-dagster.md b/quickstarts/vantagecloud-lake/vantagecloud-lake-compute-cluster-dagster.md new file mode 100644 index 0000000000..cb2a71525d --- /dev/null +++ b/quickstarts/vantagecloud-lake/vantagecloud-lake-compute-cluster-dagster.md @@ -0,0 +1,412 @@ +--- +sidebar_position: 7.1 +author: Mohan Talla +email: mohan.talla@teradata.com +page_last_update: February 5th, 2025 +description: Manage VantageCloud Lake compute clusters with dagster-teradata +keywords: [data warehouses, compute storage separation, teradata, vantage, cloud data platform, business intelligence, enterprise analytics, dagster, dagster-teradata, workflow, teradatasql, ipython-sql, cloud computing, machine learning, vantagecloud, vantagecloud lake, lake] +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import InstallTabs from '../_partials/tabsDBT.mdx' + +# Manage VantageCloud Lake Compute Clusters with dagster-teradata + +## Overview + +This tutorial showcases how to use dagster-teradata to manage VantageCloud Lake compute clusters. The goal is to run dbt transformations from the [jaffle_shop](https://github.com/Teradata/jaffle_shop-dev.git) dbt project on VantageCloud Lake compute clusters. + +Additionally, we leverage dagster-dbt and dbt-teradata to import a dbt project and treat it as an asset within Dagster. + +## Prerequisites + +* Ensure you have the necessary credentials and access rights to use Teradata VantageCloud Lake. + +:::tip +To request a VantageCloud Lake environment, refer to the form provided in this [link](https://www.teradata.com/about-us/contact). If you already have a VantageCloud Lake environment and seek guidance on configuration, please consult this [guide](https://quickstarts.teradata.com/getting-started-with-vantagecloud-lake.html). +::: + +* Python **3.9** or higher, Python **3.12** is recommended. +* pip + +## Setting Up a Virtual Enviroment + +A virtual environment is recommended to isolate project dependencies and avoid conflicts with system-wide Python packages. Here’s how to set it up: + + + +## Install dagster and dagster-teradata + +With your virtual environment active, the next step is to install dagster and the Teradata provider package (dagster-teradata) to interact with Teradata Vantage. + +1. Install the Required Packages: + + ```bash + pip install dagster dagster-webserver dagster-dbt dagster-teradata + ``` + +2. Verify the Installation: + + To confirm that Dagster is correctly installed, run: + ```bash + dagster –version + ``` + If installed correctly, it should show the version of Dagster. + +## Install dbt + +Install `dbt-teradata` and `dbt-core` modules: + +```bash +pip install dbt-teradata dbt-core + ``` + +## Create a database + +:::note +A database client connected to VantageCloud Lake is needed to execute SQL statements. [Vantage Editor Desktop](https://downloads.teradata.com/download/tools/vantage-editor-desktop), or [dbeaver](https://quickstarts.teradata.com/other-integrations/configure-a-teradata-vantage-connection-in-dbeaver.html) can be used for this purpose. +::: + +Let's create the `jaffle_shop` database in the VantageCloud Lake instance with TD_OFSSTORAGE as default. + +```sql +CREATE DATABASE jaffle_shop +AS DEFAULT STORAGE = TD_OFSSTORAGE OVERRIDE ON ERROR, +PERMANENT = 120e6, -- 120MB + SPOOL = 120e6; -- 120MB +``` + +## Create a database user + +:::note +A database client connected to VantageCloud Lake is needed to execute SQL statements. [Vantage Editor Desktop](https://downloads.teradata.com/download/tools/vantage-editor-desktop), or [dbeaver](https://quickstarts.teradata.com/other-integrations/configure-a-teradata-vantage-connection-in-dbeaver.html) can be used to execute `CREATE USER` query. +::: + +Let's create a `lake_user` user in the VantageCloud Lake instance. + +```sql +CREATE USER lake_user +AS PERMANENT = 1000000, +PASSWORD = lake_user, +SPOOL = 1200000, +DEFAULT DATABASE = jaffle_shop; +``` + +## Grant access to user +:::note +A database client connected to VantageCloud Lake is needed to execute SQL statements. [Vantage Editor Desktop](https://downloads.teradata.com/download/tools/vantage-editor-desktop), or [dbeaver](https://quickstarts.teradata.com/other-integrations/configure-a-teradata-vantage-connection-in-dbeaver.html) can be used to execute `GRANT ACCESS` queries. +::: + +Let's provide the required privileges to the user `lake_user` to manage compute clusters. + +```sql +GRANT ALL ON jaffle_shop TO lake_user; +GRANT CREATE COMPUTE GROUP To lake_user; +GRANT DROP COMPUTE GROUP TO lake_user; +GRANT CREATE COMPUTE PROFILE To lake_user; +GRANT DROP COMPUTE PROFILE TO lake_user; +GRANT SELECT ON DBC TO lake_user; +GRANT ALL ON jaffle_shop TO lake_user; +``` + +## Setup dbt project + +### Step 1: Download the sample dbt project + +Let's get started by downloading a sample dbt project. We'll use the standard dbt [Jaffle Shop](https://github.com/Teradata/jaffle_shop-dev.git) example. + +1. First, create a folder that will ultimately contain both your dbt project and Dagster code. + + ```shell + mkdir dbt-dagster-teradata + ``` + +2. Then, navigate into that folder: + + ```shell + cd dbt-dagster-teradata + ``` + +3. Finally, download the sample dbt project into that folder. + + ```shell + git clone https://github.com/Teradata/jaffle_shop-dev.git + ``` + +### Step 2: Configure your dbt project to run with Teradata Vantage + +You'll set up dbt to work with Teradata Vantage by configuring a dbt [profile](https://docs.getdbt.com/docs/core/connect-data-platform/connection-profiles): + +1. Navigate into the `jaffle_shop` folder, which was created when you downloaded the project, inside your `dbt-dagster-teradata` folder: + + ```shell + cd jaffle_shop-dev + ``` + +2. In this folder, with your text editor of choice, create a file named `profiles.yml` and add the following code to it: + + ```yaml + jaffle_shop: + outputs: + dev: + type: teradata + host: + user: lake_user + password: lake_user + logmech: TD2 + schema: jaffle_shop + tmode: ANSI + threads: 1 + timeout_seconds: 300 + priority: interactive + retries: 1 + target: dev + ``` + +### Step 3: Build your dbt project + +With the profile configured above, your dbt project should now be usable. To test it out, run: + +```shell +dbt build +``` + +This will run all the models, seeds, and snapshots in the project and store a set of tables in your Teradata Vantage. + +## Load dbt models as Dagster assets + +### Step 1: Create a Dagster Project for Your dbt Project + +To integrate your dbt project with Dagster, use the **dagster-dbt** CLI. Navigate to the directory containing your `dbt_project.yml` and run: + +```shell +dagster-dbt project scaffold --project-name jaffle_dagster +``` + +This command generates a `jaffle_dagster/` directory, which contains the necessary files for a Dagster project. + +Typically, it's best to place your Dagster project at the root of your Git repository. Since `dbt_project.yml` is located at the root of the `jaffle_shop` repository, we create the Dagster project there. + +**Note**: If you run the command from a different directory than your dbt project, use the `--dbt-project-dir option` to specify the correct path. + +### Step 2: View Your Dagster Project in the Dagster UI + +Now that you have a Dagster project, you can run Dagster's UI to take a look at it. + +1. Change directories to the Dagster project directory: + + ```shell + cd jaffle_dagster/ + ``` + +2. To start Dagster's UI, run the following: + + ```shell + dagster dev + ``` + + Which will result in output similar to: + + ```shell + Serving dagster-webserver on http://127.0.0.1:3000 in process 70635 + ``` + +3. In your browser, navigate to [http://127.0.0.1:3000](http://127.0.0.1:3000) The page will display the assets: + + ![dbt-dagster-teradata1.png](./images/vantagecloud-lake-compute-cluster-dagster/dbt-dagster-teradata1.png) + +### Step 3: Run Your dbt Models in Dagster + +In Dagster, you can not only view but also run your dbt models by materializing them as assets. + +To build your dbt project, click **"Materialize all"** in the top right corner. This starts a run to materialize the assets. Once complete, the **Materialized** and **Latest Run** details will be updated. + + ![dbt-dagster-teradata2.png](./images/vantagecloud-lake-compute-cluster-dagster/dbt-dagster-teradata2.png) + +After the run completes, you can: + +- Click on an **asset** to open a sidebar with details such as its last materialization stats and a link to the **Asset Details** page. +- Click the **Latest Run ID** within an asset to access the **Run Details** page, which provides timing information, errors, and logs. + +## VantageCloud Lake Compute Cluster Management + +Now, we integrate VantageCloud Lake compute cluster management with this dbt asset. + +You need to modify the `definitions.py` file inside the `jaffle_dagster/jaffle_dagster` directory. + +### Step 1: Open `definitions.py` in `jaffle_dagster/jaffle_dagster` Directory +Locate and open the file where Dagster job definitions are configured. +This file manages resources, jobs, and assets needed for the Dagster project. + +### Step 2: Implement Compute Cluster Management in Dagster + +```python +from dagster import Definitions, DagsterError, op, materialize, job +from dagster_dbt import DbtCliResource +from dagster_teradata import teradata_resource, TeradataResource + +from .assets import jaffle_shop_dbt_assets +from .project import jaffle_shop_project +from .schedules import schedules + +@op(required_resource_keys={"teradata"}) +def create_compute_cluster(context): + context.resources.teradata.create_teradata_compute_cluster( + "ShippingCG01", + "Shipping", + "STANDARD", + "TD_COMPUTE_MEDIUM", + "MIN_COMPUTE_COUNT(1) MAX_COMPUTE_COUNT(1) INITIALLY_SUSPENDED('FALSE')", + ) + return "Compute Cluster Created" + +@op(required_resource_keys={"teradata", "dbt"}) +def run_dbt(context, status): + if status == "Compute Cluster Created": + materialize( + [jaffle_shop_dbt_assets], + resources={ + "dbt": DbtCliResource(project_dir=jaffle_shop_project) + } + ) + return "DBT Run Completed" + else: + raise DagsterError("DBT Run Failed") + +@op(required_resource_keys={"teradata"}) +def drop_compute_cluster(context, status): + if status == "DBT Run Completed": + context.resources.teradata.drop_teradata_compute_cluster("ShippingCG01", "Shipping", True) + else: + raise DagsterError("DBT Run Failed") + +@job(resource_defs={"teradata": teradata_resource, "dbt": DbtCliResource}) +def example_job(): + drop_compute_cluster(run_dbt(create_compute_cluster())) + +defs = Definitions( + assets=[jaffle_shop_dbt_assets], + jobs=[example_job], + schedules=schedules, + resources={ + "dbt": DbtCliResource(project_dir=jaffle_shop_project), + "teradata": TeradataResource(), + }, +) +``` + +> ##### 1. Create a Compute Cluster +> The `create_compute_cluster` operation provisions a Teradata compute cluster using `teradata_resource`. +> +> ##### 2. Run dbt Transformations +> The `run_dbt` operation triggers dbt materialization **only if** the compute cluster was created successfully. +> +> ##### 3. Drop the Compute Cluster +> The `drop_compute_cluster` operation removes the cluster **only if** the dbt transformation completes successfully. +> +> ##### 4. Define a Dagster Job +>The `example_job` function executes these operations in sequence: +>- Create a compute cluster +>- Run dbt transformations +>- Drop the compute cluster +> +>##### 5. Register Definitions +>The `Definitions` object registers assets, jobs, schedules, and resources for Dagster. + +After making the changes in `definitions.py`, stop the existing Dagster webserver by pressing `Ctrl+C` in the terminal where it’s running. Then, restart the server by running the command `dagster dev` again. In your browser, navigate to http://127.0.0.1:3000 + + +In the Dagster UI, you will see the following: + +- The job **`example_job`** is displayed, along with the associated dbt asset. +- The dbt asset is organized under the **"default"** asset group. +- In the middle, you can view the **lineage** of each `@op`, showing its dependencies and how each operation is related to others. + +![dbt-dagster-teradata3.png](./images/vantagecloud-lake-compute-cluster-dagster/dbt-dagster-teradata3.png) + +Go to the **"Launchpad"** and provide the configuration for the **TeradataResource** as follows: + +```yaml +resources: + teradata: + config: + host: + user: lake_user + password: lake_user + database: lake_user +``` +Replace `` with the actual host address for your Teradata VantageCloud Lake instance. + +Once the configuration is done, click on **"Launch Run"** to start the process. + +![dbt-dagster-teradata4.png](./images/vantagecloud-lake-compute-cluster-dagster/dbt-dagster-teradata4.png) + +The Dagster UI allows you to visualize the pipeline's progress, view logs, and inspect the status of each step. + +![dbt-dagster-teradata5.png](./images/vantagecloud-lake-compute-cluster-dagster/dbt-dagster-teradata5.png) + +## Teradata VantageCloud Lake Compute Cluster Management with Dagster-Teradata + +Teradata VantageCloud Lake provides robust compute cluster management capabilities, enabling users to dynamically allocate, suspend, resume, and delete compute resources. These operations are fully supported through **`dagster-teradata`**, allowing users to manage compute clusters directly within their Dagster pipelines. This integration ensures optimal performance, scalability, and cost efficiency. The following operations facilitate seamless compute cluster management within Dagster: + +### 1. Create a Compute Cluster (`create_teradata_compute_cluster`) + +This operation provisions a new compute cluster within Teradata VantageCloud Lake using `dagster-teradata`. It enables users to define the cluster's configuration, including compute profiles, resource allocation, and query execution strategies, directly within a Dagster job. + +**Args:** +- `compute_profile_name` (str) – Specifies the name of the compute profile. +- `compute_group_name` (str) – Identifies the compute group to which the profile belongs. +- `query_strategy` (str, optional, default="STANDARD") – Defines the method used by the Teradata Optimizer to execute SQL queries efficiently. Acceptable values: + - `STANDARD` – The default strategy at the database level, optimized for general query execution. + - `ANALYTIC` – Optimized for complex analytical workloads. +- `compute_map` (Optional[str], default=None) – Maps compute resources to specific nodes within the cluster. +- `compute_attribute` (Optional[str], default=None) – Specifies additional configuration attributes for the compute profile, such as: + - `MIN_COMPUTE_COUNT(1) MAX_COMPUTE_COUNT(5) INITIALLY_SUSPENDED('FALSE')` +- `timeout` (int, optional, default=constants.CC_OPR_TIME_OUT) – The maximum duration (in seconds) to wait for the cluster creation process to complete. Default: 20 minutes. + +### 2. Suspend a Compute Cluster (`suspend_teradata_compute_cluster`) + +This operation temporarily suspends a compute cluster within Teradata VantageCloud Lake using **`dagster-teradata`**, reducing resource consumption while retaining the compute profile for future use. + +**Args:** +- `compute_profile_name` (str) – Specifies the name of the compute profile. +- `compute_group_name` (str) – Identifies the compute group associated with the profile. +- `timeout` (int, optional, default=constants.CC_OPR_TIME_OUT) – The maximum wait time for the suspension process to complete. Default: 20 minutes. + +### 3. Resume a Compute Cluster (`resume_teradata_compute_cluster`) + +This operation restores a previously suspended compute cluster using **`dagster-teradata`**, allowing workloads to resume execution within a Dagster pipeline. + +**Args:** +- `compute_profile_name` (str) – Specifies the name of the compute profile. +- `compute_group_name` (str) – Identifies the compute group associated with the profile. +- `timeout` (int, optional, default=constants.CC_OPR_TIME_OUT) – The maximum wait time for the resumption process to complete. Default: 20 minutes. + +### 4. Delete a Compute Cluster (`drop_teradata_compute_cluster`) + +This operation removes a compute cluster from Teradata VantageCloud Lake using **`dagster-teradata`**, with an option to delete the associated compute group. You can run this operation directly from your Dagster workflow. + +**Args:** +- `compute_profile_name` (str) – Specifies the name of the compute profile. +- `compute_group_name` (str) – Identifies the compute group associated with the profile. +- `delete_compute_group` (bool, optional, default=False) – Determines whether the compute group should be deleted: + - `True` – Deletes the compute group. + - `False` – Retains the compute group without modifications. + +--- + +These operations are designed to be fully integrated into **`dagster-teradata`** for managing compute clusters in Teradata VantageCloud Lake. By utilizing these operations within Dagster jobs, users can optimize resource allocation, perform complex transformations, and automate compute cluster management to align with workload demands. + + +## Summary + +In this quick start guide, we explored how to utilize Teradata VantageCloud Lake compute clusters to execute dbt transformations using `dagster-dbt` and Teradata compute cluster operators from `dagster-teradata`. + + +## Further reading +* [Using dbt with Dagster](https://docs.dagster.io/integrations/libraries/dbt/using-dbt-with-dagster/) +* [Teradata VantageCloud Lake Compute Clusters](https://docs.teradata.com/r/Teradata-VantageCloud-Lake/Managing-Compute-Resources/Compute-Clusters) + + +