A toy MLOps Project
- a
pyproject.tomlfile contains the project metadata, including the dependencies. It is common to see a "setup.py" file in Python projects but we use this more modern approach to define the project metadata. - The
srcfolder contains the code code (dsba) as well as the code for the CLI, the API, the web app, the notebooks, as well as the Dockerfiles. - The
testsfolder contains some unit and integration tests .gitignoreis a special file name that will be detected by git. This file contains a list of files and folders that should not be committed to the repository. For example (see below for setup), the.envfile is specific to your own deployment so it should not be committed to the repository (it may contain specific file paths that are only meaningful on your machine, and it may contain secrets like API keys - API keys and passwords should never be stored in a git repository).
Your machine should have the following software installed:
- Python 3.12
- git
- to use the model training notebook (not required), you may need to install openmp (libomp) which is required by xgboost. But you can also not use the model_training module from this example or adapt it to use scikit-learn rather than xgboost.
-
The first things to do is to copy this repository, to have a copy that you own on GitHub. This is because you are not allowed to push directly to the main repository owned by Joachim. Copying a repository on GitHub to have your own is called a "fork". You should understand that "forking" and "cloning" are not the same. Forking is a GitHub concept to copy a repository in your own GitHub account. Cloning basically means "downloading for the first time a repo to your computer". Just click on the fork button above when seeing this document from GitHub.
-
Move into the folder you want to work in (I saw many students not choosing a folder and just working in their home directory, you don't want to do that)
-
To be certain things are ok type:
git statusThis should fail and tell you there is no repository at this location. I saw many students trying to clone a repository inside a repository, you also don't want to be in this situation.
Now you can clone the repository:
git clone <the address of your fork>cd into the repository folder.
Create a virtual environment with the following command (for windows, python, not python3). Using the name ".venv" for your virtual environment is recommended. It is quite standard and tools like vscode will automatically find it.
python3 -m venv .venvInstall dependencies (as specified in pyproject.toml):
pip install -e .This will install the project in editable mode, meaning that any changes you make to the code will be reflected in your local environment.
To run the tests, you can use the following command:
pytestThis will run all the tests in the tests folder.
You must set the environment variable DSBA_MODELS_ROOT_PATH to the address you want to store the models in before you can use the platform.
For example as a MacOS user I set /Users/joachim/dev/dsba/models_registry.
There are many ways to set environment variables depending on the context.
In a python notebook, you can use the following code:
import os
os.environ["DSBA_MODELS_ROOT_PATH"] = "/path/to/your/models"In a terminal or shell script, you can use the following code (Linux and MacOS):
export DSBA_MODELS_ROOT_PATH="/path/to/your/models"For windows, something of the sort may work:
set DSBA_MODELS_ROOT_PATH="C:\path\to\your\models"This project aimed to create an interface for a bank employees that allows them to choose a prediction model, a customer dataset, and then provide a result on whether customers will churn or not.
To do this, we based our work on a machine learning project on bank churners that one of us had previously created. This project used the dataset of a kaggle challenge available at : https://www.kaggle.com/datasets/thedevastator/predicting-credit-card-customer-attrition-with-m
data/This contains:
- the original
BankChurners.csvfile of the Kaggle challenge used for the original ML project - the
X_test.csvand they_test.csvfiles created after the preprocessing of the original dataset, this files are the ones used for the rest of the MLOps project
models/This contains the different models trained and available to predict results. The 4 models are :
lgbm_model.pklrf_model.pklsvm_model.pklxgb_model.pkl
src/This is the main source directory where the code resides.
Not used in the MLOps project, normally it's used to list models registered on your system:
src/cli/dsba_cli listUse a model to predict on a file:
src/cli/dsba_cli predict --input /path/to/your/data/file.csv --output /path/to/your/output/file.csv --model-id your_model_idAn API is provided, it allows to interact with models. You can start the API by running:
uvicorn api:app --reloadDockerized API To run the API in a Docker container, follow these steps:
- Build the Docker image:
docker build -f Dockerfile.api -t fastapi .- Run the Docker container:
docker run -d -p 8000:80 fastapiThe API will be available at http://127.0.0.1:8000/
Note: Ensure Docker is installed on your machine. 3. Tag the image
docker tag fastapi stanchiangtw/fastapiNote: Ensure Docker Desktop is logged in by using
docker login- Push the tag image to Docker Hub
docker push stanchiangtw/fastapi:latestAWS ECS & EC2 We successfully implemented the deployment and scaling of Docker containers on AWS ECS. The process involved the following steps:
- Creating an ECS cluster
- Defining a Task Definition
- Configuring a Security Group
- Setting up a service
- Accessing the running service
The API was successfully running at http://13.37.241.233:8000/.
However, the service quickly exceeded the free tier quota, resulting in costs ($12.46). Before stopping the service, we took a screenshot to document the progress made.
The templates/ directory contains HTML templates used for rendering web views in our application. The main file dashboard.html serves as the user interface for our ML model evaluation dashboard.

The static/ directory stores static resources like generated plots (.png files). The image is served directly to the client browser and don't change during runtime. The visualization image created during model evaluation is stored here, allowing it to be displayed in the dashboard.
The .api file includes all the necessary instructions, organized in a way that ensures Docker executes them correctly.
The .txt file contains all the necessary Python packages required to process the data, train the model, and run the API.
This contains the core functionality of the MLOps project, including model handling, data preprocessing, and utilities for training and predicting.
This contains:
model_training_example.ipynbthe original example of Notebook of the MLOps platform projectBank_MLOps.ipynbthe notebook of an ML project from which we used it for our MLOps project. The original code from the ML project has not been modified; it may contain some elements from LLM. To use the notebook, navigate to thenotebooks/folder and open the file. You can use the provided utilities to train models, preprocess data, and evaluate performance.
No remarks