Skip to content

Teleinfrastructure-Research-Lab/smol

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

smol is a simple, lightweight, and non-intrusive declarative MLOps framework designed to facilitate the development of reproducible ML models without interfering with the established workflows of researchers. All smol functionalities are additive, allowing users to adopt only the features they need without restructuring their codebase. It enhances reproducibility by using text-based declarative experiment definitions to track every experiment’s parameters. smol's experiment tracking is based on the principle that an experiment is defined by its data input, code, and hyperparameters. smol is compatible with Git and DVC and is recommended (but optional) to be used with them to ensure reproducibility.

Version 1.0.0 allows for:

Configuration

  • YAML config parsing with custom tags:

    • !include - Include the contents of another yaml file as a sub-dict under this key.
    • !root_include - Include the contents of another yaml file as root level keys.
    • !tuple - Converts a sequence into a Python tuple.
    • !link - A string in format "key1->key2->key3...", that points to another key in the parsed config. Links are resolved at the end of the yaml file parsing, so they can point to incldued files. A link cannot point to another link, this will cause the parsed config dict to contain a ConfigLink object as value under the first link key.
  • The YAML config files are parsed on from smol.core import smol. By default it loads a config/smol_config.yaml, where the other "custom" config files are included with !include. The path to the default config can be changed by:

from smol import variables
variables.CONFIG_PATH = "/new/path/to/config.yaml"
from smol.core import smol
  • YAML config access with smol.get_config(). It takes a list of arguments that are treated as nested keys for the config dictionary. Example:
#config/smol_config.yaml
logs_dir: smol_tests/logs
paths:
  ARCHITECTURES_PATH: smol_tests/architectures
  LOSSES_PATH: smol_tests/losses
  DATA_LOADERS_PATH: smol_tests/data_loaders
from smol.core import smol

arch_path = smol.get_config("paths", "ARCHITECTURES_PATH") # returns smol_tests/architectures
  • smol expects a minimum structure of smol_config.yaml:
#config/smol_config.yaml
logs_dir: /path/to/log/dir
paths:
  ARCHITECTURES_PATH: /path/to/architectures/dir
  LOSSES_PATH: /path/to/losses/dir
  DATA_LOADERS_PATH: /path/to/dataloaders/dir

Additionally a log_level entry can also be provided to specify the level of logging.

  • Additional yaml files can be parsed and appended to the global config at runtime with smol.add_runtime_configs(). It takes as an argument a list of paths to yaml files. These files also support the custom tags mentioned above.

Logging

  • A logger is intialized on from smol.core import smol. This logger can be accessed at smol.logger.

  • The logger has a file handler as well as console handler.

  • On creation logger creates a log file in the logs_dir specified in smol_config.yaml. By default the log level is INFO, if log_level is not specified in the config.

  • Errors and messages generated from smol internal functionalities are written to the log file (and printed in console), as well as user generated messages.

Registers

  • smol currently supports 3 types of registers: Data loaders, Architectures, Losses.

  • The aim of these 3 registers is to allow the user to label and declare properties of a block (data loader, architecture, loss) they wrote the code for. This makes this block discoverable and usable by smol and thus the user can build pipelines using their blocks in a textual format like yaml. Since data loaders, architectures and losses are the building blocks of machine learning and model training pipelines, a combination of them allows the user to solve almost all imaginable tasks in a declarative and reproducible manner.

  • Blocks can be registered with the @smol.register_dl(), @smol.register_architecture and @smol.register_loss() decorators. The to import all registered blocks and make them accessible to smol, call smol.register() in the main script.

Tasks

Trainer - BasicRuntimeConfigTrainer

  • Allows the user to train a model using their registered data loader, architecture and loss.
  • The user creates a experiment definition in the form of a yaml file, that is appended to the global smol config with smol.add_runtime_configs(["experiment_def.yaml"]).
  • The trainer is created by trainer = BasicRuntimeConfigTrainer(exp_name, outp), where exp_name is the name of the experiment definition yaml file (without the extension), and outp is the path to the output directory where the run output files will be saved.
  • Several callbacks can be attached to the trainer to call on every batch in training, every batch in validation, every epoch, and start/end of the run.
  • The experiment run is started with trainer.run().
  • The trainer saves the checkpoints of the model on every epoch in a sckp format, which can be opened as a .zip file.
  • The trainer saves a srun file which contains metadata about the run such as hyperparamets and best metrics (its contents are in the json format).
  • The trainer saves a copy of the experiment definition it was run with in a sxpd format (its contents are in the yaml format).

Version 1.5.0 should:

  1. Write unit and E2E tests for all smol functionalities.
  2. Add metadata to pyproject.py and give good description in README.
  3. Docs.
  4. Docstrings in the code.

Version 2.0.0 should:

  1. Register trained models (by branching).
  2. CLI - Print data about all experiments in folder. - Clone experiment. - Run tasks from CLI (like trainer). ...
  3. Deploy model to executable.

About

A simple, lightweight, and non-intrusive declarative MLOps framework.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages