The goal of this project is to create an ecosystem where to run Data Pipelines and monitor Machine Learning Experiments.
From Airflow documentation:
Apache Airflow is an open-source platform for developing, scheduling, and monitoring batch-oriented workflows
From MLflow documentation:
MLflow is an open source platform for managing the end-to-end machine learning lifecycle
From Docker documentation:
Docker Compose is a tool for defining and running multi-container Docker applications.
The first step to structure this project is connecting Airflow and MLflow together: docker compose.
Create docker-compose.yaml, which contains the configuration of those docker containers responsible for running Airflow and MLflow services.
Each of those services runs on a different container:
- airflow-webserver
- airflow-scheduler
- airflow-worker
- airflow-triggerer
- mlflow
To create and start multiple container, from terminal run the following command:
docker compose up -d
In order to access to Airflow server visit the page: localhost:8080

And take a step into Airflow world!
To start creating DAGS initialize an empty folder named dags and populate it with as many scripts as you need.
└── dags
└── example_dag.pyIn order to monitor MLflow experiments through its server, visit the page: localhost:600

To establish a connection between Airflow and MLflow, define the URI of the MLflow server:
mlflow.set_tracking_uri('http://mlflow:600')
After that, create a new connection on Airflow that points to that port.

