GitHub - Teleinfrastructure-Research-Lab/smol: A simple, lightweight, and non-intrusive declarative MLOps framework.

smol is a simple, lightweight, and non-intrusive declarative MLOps framework designed to facilitate the development of reproducible ML models without interfering with the established workflows of researchers. All smol functionalities are additive, allowing users to adopt only the features they need without restructuring their codebase. It enhances reproducibility by using text-based declarative experiment definitions to track every experiment’s parameters. smol's experiment tracking is based on the principle that an experiment is defined by its data input, code, and hyperparameters. smol is compatible with Git and DVC and is recommended (but optional) to be used with them to ensure reproducibility.

Version 1.0.0 allows for:

Configuration

YAML config parsing with custom tags:
- !include - Include the contents of another yaml file as a sub-dict under this key.
- !root_include - Include the contents of another yaml file as root level keys.
- !tuple - Converts a sequence into a Python tuple.
- !link - A string in format "key1->key2->key3...", that points to another key in the parsed config. Links are resolved at the end of the yaml file parsing, so they can point to incldued files. A link cannot point to another link, this will cause the parsed config dict to contain a ConfigLink object as value under the first link key.
The YAML config files are parsed on from smol.core import smol. By default it loads a config/smol_config.yaml, where the other "custom" config files are included with !include. The path to the default config can be changed by:

from smol import variables
variables.CONFIG_PATH = "/new/path/to/config.yaml"
from smol.core import smol

YAML config access with smol.get_config(). It takes a list of arguments that are treated as nested keys for the config dictionary. Example:

#config/smol_config.yaml
logs_dir: smol_tests/logs
paths:
  ARCHITECTURES_PATH: smol_tests/architectures
  LOSSES_PATH: smol_tests/losses
  DATA_LOADERS_PATH: smol_tests/data_loaders

from smol.core import smol

arch_path = smol.get_config("paths", "ARCHITECTURES_PATH") # returns smol_tests/architectures

smol expects a minimum structure of smol_config.yaml:

#config/smol_config.yaml
logs_dir: /path/to/log/dir
paths:
  ARCHITECTURES_PATH: /path/to/architectures/dir
  LOSSES_PATH: /path/to/losses/dir
  DATA_LOADERS_PATH: /path/to/dataloaders/dir

Additionally a log_level entry can also be provided to specify the level of logging.

Additional yaml files can be parsed and appended to the global config at runtime with smol.add_runtime_configs(). It takes as an argument a list of paths to yaml files. These files also support the custom tags mentioned above.

Logging

A logger is intialized on from smol.core import smol. This logger can be accessed at smol.logger.
The logger has a file handler as well as console handler.
On creation logger creates a log file in the logs_dir specified in smol_config.yaml. By default the log level is INFO, if log_level is not specified in the config.
Errors and messages generated from smol internal functionalities are written to the log file (and printed in console), as well as user generated messages.

Registers

smol currently supports 3 types of registers: Data loaders, Architectures, Losses.
The aim of these 3 registers is to allow the user to label and declare properties of a block (data loader, architecture, loss) they wrote the code for. This makes this block discoverable and usable by smol and thus the user can build pipelines using their blocks in a textual format like yaml. Since data loaders, architectures and losses are the building blocks of machine learning and model training pipelines, a combination of them allows the user to solve almost all imaginable tasks in a declarative and reproducible manner.
Blocks can be registered with the @smol.register_dl(), @smol.register_architecture and @smol.register_loss() decorators. The to import all registered blocks and make them accessible to smol, call smol.register() in the main script.

Tasks

Trainer - BasicRuntimeConfigTrainer

Allows the user to train a model using their registered data loader, architecture and loss.
The user creates a experiment definition in the form of a yaml file, that is appended to the global smol config with smol.add_runtime_configs(["experiment_def.yaml"]).
The trainer is created by trainer = BasicRuntimeConfigTrainer(exp_name, outp), where exp_name is the name of the experiment definition yaml file (without the extension), and outp is the path to the output directory where the run output files will be saved.
Several callbacks can be attached to the trainer to call on every batch in training, every batch in validation, every epoch, and start/end of the run.
The experiment run is started with trainer.run().
The trainer saves the checkpoints of the model on every epoch in a sckp format, which can be opened as a .zip file.
The trainer saves a srun file which contains metadata about the run such as hyperparamets and best metrics (its contents are in the json format).
The trainer saves a copy of the experiment definition it was run with in a sxpd format (its contents are in the yaml format).

Version 1.5.0 should:

Write unit and E2E tests for all smol functionalities.
Add metadata to pyproject.py and give good description in README.
Docs.
Docstrings in the code.

Version 2.0.0 should:

Register trained models (by branching).
CLI - Print data about all experiments in folder. - Clone experiment. - Run tasks from CLI (like trainer). ...
Deploy model to executable.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
config		config
smol		smol
smol_tests		smol_tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Version 1.0.0 allows for:

Configuration

Logging

Registers

Tasks

Trainer - BasicRuntimeConfigTrainer

Version 1.5.0 should:

Version 2.0.0 should:

About

Uh oh!

Releases

Packages

Languages

License

Teleinfrastructure-Research-Lab/smol

Folders and files

Latest commit

History

Repository files navigation

Version 1.0.0 allows for:

Configuration

Logging

Registers

Tasks

Trainer - BasicRuntimeConfigTrainer

Version 1.5.0 should:

Version 2.0.0 should:

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages