(Heavily based on the leggedrobotics/plr-exercise repository)
Note
In this laboratory session, we will refine your previous ML project. For reference, consider the previous mini-projects.
- A github.com account
- A computer with GPU or a Google Account for Colaboratory
- An existing machine learning project
Before proceeding, you must define the development environment for your Python-based project. There are two main approaches:
- Containerization - Create a Dockerfile to define an image with all dependencies.
- Virtual Environment - Set up a virtual environment to isolate Python packages from the OS and other projects.
If you have access to a local machine with admin privileges, containerization (e.g., with Docker or Podman) is recommended.
If you are a standard user on the local machine, please proceed with a virtual environment:
# Create a folder for virtual environments
mkdir ~/venv
# Create the virtual environment
python3 -m venv ~/venv/mldevops
# Test the virtual environment
source ~/venv/mldevops/bin/activate
which python
# Create an alias for easier sourcing (edit the ~/.bashrc file)
nano ~/.bashrc
# Add the following line at the end of the file and save it
alias venv_plr="source ~/venv/plr/bin/activate"In cases where the local machine is not prepared for project development and installing new software is not possible, it can be used as an SSH terminal to access a remote development environment.
Alternatively, you can use a web browser to access remote IDEs such as Jupyter Lab instance or Google Colaboratory. Note, however, that it is generally not recommended to work exclusively in Jupyter Notebooks, as various issues may arise.
The following exercises can be completed using only Google Colaboratory, though this should be viewed as a temporary solution rather than a best practice.
You will need to interact with git using commands like the following:
!git config --global user.email "student@student.agh.edu.pl"
!git config --global user.name "student"
!git clone https://github.com/vision-agh/mldevops_exercise.git
!cd mldevops_exercise && git status
!cd mldevops_exercise && git add filename.txt
!cd mldevops_exercise && git commit -m "message"
!cd mldevops_exercise && git pushTo begin, use Git for version control on your project:
- Create a fork of this project on GitHub.
- Clone the repository via SSH:
mkdir ~/ws
cd ~/ws
git clone git@github.com:vision-agh/mldevops_exercise.git(Replace the
vision-aghwith yourgithub_username).
Important
Cloning via https allows only for pulling the code. For both pulling and pushing, use the ssh protocol. Instructions for setting up SSH keys and adding them to your GitHub account are available here.
- Copy your previous ML project files (
.pyand.ipynb) the the root of the cloned repository.
cd ~/ws/mldevops_exercise
# Copy here your project- Commit and push your changes to the origin (i.e. GitHub) repository.
# We use the dot to add all files. Note, that it is not a typical practice.
git add .
git commit -m "initial project commit"
git push-
Enable the
Issuesfeature in your GitHub repository:- Open
https://github.com/GITHUB_USERNAME/mldevops_exercise. - Go to
Settingson the right. - Enable the
Issuesfeature (the other features can be disabled).
- Open
-
Secure your default branch (
main/master) from accidental commits:- Open
https://github.com/GITHUB_USERNAME/mldevops_exercise. - Go to
Settingson the right. - Select
Brancheson the left menu. - Click on the
Add branch rulesetbutton. - Name the ruleset "default," set the enforcement status to "enabled," and configure the Targets with the
Add targetdropdown by selecting the "Include default branch" option. - For the options, enable the following:
Restrict deletions,Require a pull request before merging, andBlock force pushes. - Finish by clicking the
Createbutton.
- Open
After completing the setup, you are ready to proceed with the exercises.
- For each task, create a branch named
feature/task_X. - Commit all changes (and only those changes) related to the specific task to its branch, then push them to GitHub.
- To complete a task, create a pull request (PR) from
feature/task_Xtomain. Set the PR title to the task description (see below). - Do not delete the branches after merging the PR.
Tasks:
- Task 1: Improve formatting using
black. - Task 2: Set up Pre-commit to automate formatting.
- Task 3: Create a Python package for your project.
- Task 4: Add an online logging framework.
- Task 5: Use Optuna to perform hyperparameter search.
- Task 6: Add docstrings and type annotations to every Python file.
Your first task is to install black and format the code. For more information, visit: https://github.com/psf/black.
pip3 install black
black --line-length 120 ~/ws/mldevops_exerciseNow everything should look well-formatted.
While it's possible to run black manually, relying on memory before every commit can be unreliable. Fortunately, automation with Pre-commit makes this easier.
Begin by following the official quick-setup guide.
pip3 install pre-commit
pre-commit --versionIn the repository, there is an already-prepared .pre-commit-config.yaml file. Inspect it—this file contains the necessary configuration for Pre-commit.
Next, register the pre-commit command as a Git hook, which will automatically run each time you use git commit:
# Register the hook
pre-commit install
# Run the pre-commit on all files
pre-commit run --all-filesYou may notice many changes, particularly regarding whitespace in your code. It should now appear much cleaner (at least from git's perspective).
To further extend automation, add tools like black (for formatting), codespell (to fix typos), and pyupgrade (to update syntax to Python 3.10).
# Black formatter
- repo: https://github.com/psf/black
rev: 24.4.0
hooks:
- id: black
args: ["--line-length=120"]
# Codespell - Fix common misspellings in text files.
- repo: https://github.com/codespell-project/codespell
rev: v2.2.6
hooks:
- id: codespell
args: [--write-changes]
# Pyupgrade - automatically upgrade syntax for newer versions of the language.
- repo: https://github.com/asottile/pyupgrade
rev: v3.15.2
hooks:
- id: pyupgrade
args: [--py310-plus]There are two main systems for dependency management and package building in Python: setuptools and poetry. As a rule of thumb, if your project requires complex builds (e.g., with Python bindings for dynamic C/C++ libraries), setuptools is a suitable choice. However, for many modern Python projects, poetry offers simpler configuration, making both dependency management (which can simplify Dockerfiles) and package building easier.
Explore the poetry Introduction and Basic Usage to set up dependency management and enable package building.
python3 -m pip install --user pipx
python3 -m pipx ensurepath
pipx install poetryRemember to update the .gitignore file to exclude any necessary files from Git tracking.
Add the Weights & Biases (wandb) logger to track and visualize your experiments.
pip3 install wandb- Follow the official wandb guide.
- Log
training_loss,validation_loss, and your code as an artifact. - Capture a screenshot of a run showing the loss curve and the uploaded artifact.
- Commit this screenshot to the repository.
Use optuna to find the best hyperparameters (e.g., learning rate or epochs).
pip3 install optunaRefer to the official examples and conduct a hyperparameter search. Here is a small example:
import optuna
def objective(trial):
x = trial.suggest_float('x', -10, 10)
return (x - 2) ** 2
study = optuna.create_study()
study.optimize(objective, n_trials=100)
study.best_params # E.g. {'x': 2.002108042}- Add docstrings to all public or non-trivial classes and functions. Refer to PEP 8 (Style Guide for Python Code) for guidance: https://peps.python.org/pep-0257/
- Add type annotations to every function and method in your project. This article provides a good introduction: https://realpython.com/python-type-checking/#hello-types
- Linting
- Automatic testing
- Automation (GitHub Actions)
- CI/CD (Continuous Integration/Continuous Delivery)
- github.com/leggedrobotics/plr-exercise by @JonasFrey96