This repository contains the DAG code used in the Orchestrate OpenAI operations with Apache Airflow tutorial.
The DAG in this repository uses the following packages:
- OpenAI Airflow provider.
- OpenAI Python client
- scikit-learn.
- pandas.
- numpy.
- matplotlib.
- seaborn.
- AdjustText.
This section explains how to run this repository with Airflow. Note that you will need to copy the contents of the .env_example file to a newly created .env file. You will need to have a valid OpenAI API key of at least tier 1 to run this repository.
Download the Astro CLI to run Airflow locally in Docker. astro is the only package you will need to install locally.
- Run
git clone https://github.com/astronomer/airflow-openai-tutorial.giton your computer to create a local clone of this repository. - Install the Astro CLI by following the steps in the Astro CLI documentation. Docker Desktop/Docker Engine is a prerequisite, but you don't need in-depth Docker knowledge to run Airflow with the Astro CLI.
- Create a
.envfile in the root of your cloned repository and copy the contents of the.env_examplefile to it. Provide your own OpenAI API key in the.envfile. - Run
astro dev startin your cloned repository. - After your Astro project has started. View the Airflow UI at
localhost:8080. - Run the
captains_dagDAG manually by clicking the play button. Provide your own question and adjust the parameters in the DAG to your liking.
In this project astro dev start spins up 4 Docker containers:
- The Airflow webserver, which runs the Airflow UI and can be accessed at
https://localhost:8080/. - The Airflow scheduler, which is responsible for monitoring and triggering tasks.
- The Airflow triggerer, which is an Airflow component used to run deferrable operators.
- The Airflow metadata database, which is a Postgres database that runs on port 5432.