@Author: gggordon
This is modified and configurable airflow setup using docker compose that includes common providers and configurations for Data Analytics/ Engineering activities.
This docker-compose.yml file contains several service definitions:
-
airflow-scheduler- The scheduler monitors all tasks and DAGs, then triggers the task instances once their dependencies are complete. -
airflow-webserver- The webserver is available athttp://localhost:8080. -
airflow-worker- The worker that executes the tasks given by the scheduler. -
airflow-triggerer- The triggerer runs an event loop for deferrable tasks. -
airflow-init- The initialization service. -
postgres- The database. -
redis- The redis - broker that forwards messages from scheduler to worker.
Optionally, you can enable flower by adding --profile flower option, e.g. docker compose --profile flower up, or by explicitly specifying it on the command line e.g. docker compose up flower.
flower- The flower app for monitoring the environment. It is available athttp://localhost:5555.
All these services allow you to run Airflow with CeleryExecutor. For more information, see Architecture Overview.
Some directories in the container are mounted, which means that their contents are synchronized between your computer and the container.
-
./dags- you can put your DAG files here. -
./logs- contains logs from task execution and scheduler. -
./config- you can add custom log parser or addairflow_local_settings.pyto configure cluster policy. -
./plugins- you can put your custom plugins here.
This file uses the latest Airflow image (apache/airflow). If you need to install a new Python library or system library, you can build your image.
Before starting Airflow for the first time, you need to prepare your environment, i.e. create the necessary files, directories and initialize the database.
- Copy the sample.env to
.envand update the defaulr variables. Seedocker-compose.ymlfor additional details and usage - Run
./scripts/1_prepare-env.shfrom the root directory
You should have the airflow dag and configuration folders in your base directory and container images created/available after running step 2.
This will start the docker containers and have the airflow webserver available on port 8080 by default. You may log in and use airflow.
- Run
./scripts/2_run.sh
This will stop all related docker containers
- Run
./scripts/3_stop.sh
This will remove all related containers, volumes and images.
- Run
./scripts/3_teardown.sh