Now that we've created Ingestion and Transformation jobs and associated crawlers, how do we coordinate all of these resources? We'll create an AWS Glue Workflow to automatically trigger everything in the right order.
- Create Trigger for the Ingestion Job
- Create Trigger for the Ingestion Crawler
- Create Trigger for the Transformation Job
- Create Trigger for the Transformation Crawler
- Create a Workflow
- Add Triggers to Workflow
-
Navigate to AWS Console > Glue > ETL and click on Triggers
-
Click Add Trigger
-
Set up Trigger with
-trigger-ingestionsuffix and an Job events trigger for thedata-ingestionjob. Click Next. -
Add Jobs to trigger and click Next"
-
Review and click Finish.
- Unfortunately for us, you cannot create a Trigger for a Crawler using the AWS Console (at the time of writing).
- Using a Terminal session with valid AWS CLI credentials, run the following:
Success Response:
UPSTREAM_JOB_NAME=awesome-project-awesome-module-data-ingestion CRAWLER_NAME=awesome-project-awesome-module-data-ingestion REGION=your-region aws glue create-trigger --name awesome-project-awesome-module-data-ingestion-crawler-trigger \ --type CONDITIONAL \ --predicate "Logical=ANY,Conditions=[{LogicalOperator=EQUALS,JobName=${UPSTREAM_JOB_NAME},State=SUCCEEDED}]" \ --actions CrawlerName=${CRAWLER_NAME} \ --no-start-on-creation \ --region $REGION
{ "Name": "awesome-project-awesome-module-data-ingestion-crawler-trigger" }
- Unfortunately for us, you cannot create a Trigger for a Crawler/Job combination using the AWS Console (at the time of writing).
- Using a Terminal session with valid AWS CLI credentials, run the following:
Success Response:
UPSTREAM=awesome-project-awesome-module-data-ingestion DOWNSTREAM=awesome-project-awesome-module-data-transformation REGION=your-region aws glue create-trigger --name awesome-project-awesome-module-transformation-trigger \ --type CONDITIONAL \ --predicate "Logical=ANY,Conditions=[{LogicalOperator=EQUALS,CrawlerName=${UPSTREAM},CrawlState=SUCCEEDED}]" \ --actions JobName=${DOWNSTREAM} \ --no-start-on-creation \ --region $REGION
{ "Name": "awesome-project-awesome-module-transformation-trigger" }
- Unfortunately for us, you cannot create a Trigger for a Crawler using the AWS Console (at the time of writing).
- Using a Terminal session with valid AWS CLI credentials, run the following:
Success Response:
UPSTREAM=awesome-project-awesome-module-data-transformation DOWNSTREAM=awesome-project-awesome-module-data-transformation REGION=your-region aws glue create-trigger --name awesome-project-awesome-module-data-transformation-crawler-trigger \ --type CONDITIONAL \ --predicate "Logical=ANY,Conditions=[{LogicalOperator=EQUALS,JobName=${UPSTREAM},State=SUCCEEDED}]" \ --actions CrawlerName=${DOWNSTREAM} \ --no-start-on-creation \ --region $REGION
{ "Name": "awesome-project-awesome-module-data-transformation-crawler-trigger" }
- Navigate to AWS Console > Glue > ETL and click on Workflows
- Click Add workflow

- Name your workflow

- Click Add workflow
- Navigate to AWS Console > Glue > ETL and click on Workflows
- Select your workflow and click Add trigger

- Add the
-ingestionjob trigger
- In AWS Console > Glue > ETL and click on Workflows, select your workflow and click Add trigger
- Add the data ingestion crawler trigger

- In AWS Console > Glue > ETL and click on Workflows, select your workflow and click Add trigger
- Add the
-transformationjob trigger
- In AWS Console > Glue > ETL and click on Workflows, select your workflow and click Add trigger
- Add the data transformation crawler trigger

- In AWS Console > Glue > ETL and click on Workflows, select your workflow and click Actions > Run
- Open the History tab, select the most recent workflow run, and click View Run Details to see the progress.

- Once the workflow is successful, verify that there are newly ingested files in the relevant directories in your AWS S3 bucket and the data is accessible via AWS Athena.



