Exercise: Data Workflow

Now that we've created Ingestion and Transformation jobs and associated crawlers, how do we coordinate all of these resources? We'll create an AWS Glue Workflow to automatically trigger everything in the right order.

Overview

Create Trigger for the Ingestion Job
Create Trigger for the Ingestion Crawler
Create Trigger for the Transformation Job
Create Trigger for the Transformation Crawler
Create a Workflow
Add Triggers to Workflow

Create Trigger for the Ingestion Job

Navigate to AWS Console > Glue > ETL and click on Triggers
Click Add Trigger
Set up Trigger with -trigger-ingestion suffix and an Job events trigger for the data-ingestion job. Click Next.
Add Jobs to trigger and click Next"
Review and click Finish.

Create Trigger for the Ingestion Crawler

Unfortunately for us, you cannot create a Trigger for a Crawler using the AWS Console (at the time of writing).

Using a Terminal session with valid AWS CLI credentials, run the following:

UPSTREAM_JOB_NAME=awesome-project-awesome-module-data-ingestion
CRAWLER_NAME=awesome-project-awesome-module-data-ingestion
REGION=your-region
aws glue create-trigger --name awesome-project-awesome-module-data-ingestion-crawler-trigger \
 --type CONDITIONAL \
 --predicate "Logical=ANY,Conditions=[{LogicalOperator=EQUALS,JobName=${UPSTREAM_JOB_NAME},State=SUCCEEDED}]" \
 --actions CrawlerName=${CRAWLER_NAME} \
 --no-start-on-creation \
 --region $REGION

Success Response:

{
 "Name": "awesome-project-awesome-module-data-ingestion-crawler-trigger"
}

Create Trigger for the Transformation Job

Unfortunately for us, you cannot create a Trigger for a Crawler/Job combination using the AWS Console (at the time of writing).

Using a Terminal session with valid AWS CLI credentials, run the following:

UPSTREAM=awesome-project-awesome-module-data-ingestion
DOWNSTREAM=awesome-project-awesome-module-data-transformation
REGION=your-region
aws glue create-trigger --name awesome-project-awesome-module-transformation-trigger \
 --type CONDITIONAL \
 --predicate "Logical=ANY,Conditions=[{LogicalOperator=EQUALS,CrawlerName=${UPSTREAM},CrawlState=SUCCEEDED}]" \
 --actions JobName=${DOWNSTREAM} \
 --no-start-on-creation \
 --region $REGION

Success Response:

{
 "Name": "awesome-project-awesome-module-transformation-trigger"
}

Create Trigger for the Transformation Crawler

Unfortunately for us, you cannot create a Trigger for a Crawler using the AWS Console (at the time of writing).

Using a Terminal session with valid AWS CLI credentials, run the following:

UPSTREAM=awesome-project-awesome-module-data-transformation
DOWNSTREAM=awesome-project-awesome-module-data-transformation
REGION=your-region
aws glue create-trigger --name awesome-project-awesome-module-data-transformation-crawler-trigger \
 --type CONDITIONAL \
 --predicate "Logical=ANY,Conditions=[{LogicalOperator=EQUALS,JobName=${UPSTREAM},State=SUCCEEDED}]" \
 --actions CrawlerName=${DOWNSTREAM} \
 --no-start-on-creation \
 --region $REGION

Success Response:

{
 "Name": "awesome-project-awesome-module-data-transformation-crawler-trigger"
}

Create a Workflow

Navigate to AWS Console > Glue > ETL and click on Workflows
Click Add workflow
Name your workflow
Click Add workflow

Add Triggers to Workflow

Navigate to AWS Console > Glue > ETL and click on Workflows
Select your workflow and click Add trigger
Add the -ingestion job trigger
In AWS Console > Glue > ETL and click on Workflows, select your workflow and click Add trigger
Add the data ingestion crawler trigger
In AWS Console > Glue > ETL and click on Workflows, select your workflow and click Add trigger
Add the -transformation job trigger
In AWS Console > Glue > ETL and click on Workflows, select your workflow and click Add trigger
Add the data transformation crawler trigger
In AWS Console > Glue > ETL and click on Workflows, select your workflow and click Actions > Run
Open the History tab, select the most recent workflow run, and click View Run Details to see the progress.
Once the workflow is successful, verify that there are newly ingested files in the relevant directories in your AWS S3 bucket and the data is accessible via AWS Athena.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exercise: Data Workflow

Overview

Create Trigger for the Ingestion Job

Create Trigger for the Ingestion Crawler

Create Trigger for the Transformation Job

Create Trigger for the Transformation Crawler

Create a Workflow

Add Triggers to Workflow

FilesExpand file tree

data-workflow.md

Latest commit

History

data-workflow.md

File metadata and controls

Exercise: Data Workflow

Overview

Create Trigger for the Ingestion Job

Create Trigger for the Ingestion Crawler

Create Trigger for the Transformation Job

Create Trigger for the Transformation Crawler

Create a Workflow

Add Triggers to Workflow