smol is a simple, lightweight, and non-intrusive declarative MLOps framework designed to facilitate the development of reproducible ML models without interfering with the established workflows of researchers. All smol functionalities are additive, allowing users to adopt only the features they need without restructuring their codebase. It enhances reproducibility by using text-based declarative experiment definitions to track every experiment’s parameters. smol's experiment tracking is based on the principle that an experiment is defined by its data input, code, and hyperparameters. smol is compatible with Git and DVC and is recommended (but optional) to be used with them to ensure reproducibility.
-
YAML config parsing with custom tags:
!include- Include the contents of another yaml file as a sub-dict under this key.!root_include- Include the contents of another yaml file as root level keys.!tuple- Converts a sequence into a Python tuple.!link- A string in format "key1->key2->key3...", that points to another key in the parsed config. Links are resolved at the end of the yaml file parsing, so they can point to incldued files. A link cannot point to another link, this will cause the parsed config dict to contain a ConfigLink object as value under the first link key.
-
The YAML config files are parsed on
from smol.core import smol. By default it loads aconfig/smol_config.yaml, where the other "custom" config files are included with!include. The path to the default config can be changed by:
from smol import variables
variables.CONFIG_PATH = "/new/path/to/config.yaml"
from smol.core import smol- YAML config access with smol.get_config(). It takes a list of arguments that are treated as nested keys for the config dictionary. Example:
#config/smol_config.yaml
logs_dir: smol_tests/logs
paths:
ARCHITECTURES_PATH: smol_tests/architectures
LOSSES_PATH: smol_tests/losses
DATA_LOADERS_PATH: smol_tests/data_loadersfrom smol.core import smol
arch_path = smol.get_config("paths", "ARCHITECTURES_PATH") # returns smol_tests/architectures- smol expects a minimum structure of smol_config.yaml:
#config/smol_config.yaml
logs_dir: /path/to/log/dir
paths:
ARCHITECTURES_PATH: /path/to/architectures/dir
LOSSES_PATH: /path/to/losses/dir
DATA_LOADERS_PATH: /path/to/dataloaders/dirAdditionally a log_level entry can also be provided to specify the level of logging.
- Additional yaml files can be parsed and appended to the global config at runtime with
smol.add_runtime_configs(). It takes as an argument a list of paths to yaml files. These files also support the custom tags mentioned above.
-
A logger is intialized on
from smol.core import smol. This logger can be accessed atsmol.logger. -
The logger has a file handler as well as console handler.
-
On creation logger creates a log file in the
logs_dirspecified in smol_config.yaml. By default the log level isINFO, iflog_levelis not specified in the config. -
Errors and messages generated from smol internal functionalities are written to the log file (and printed in console), as well as user generated messages.
-
smol currently supports 3 types of registers: Data loaders, Architectures, Losses.
-
The aim of these 3 registers is to allow the user to label and declare properties of a block (data loader, architecture, loss) they wrote the code for. This makes this block discoverable and usable by smol and thus the user can build pipelines using their blocks in a textual format like yaml. Since data loaders, architectures and losses are the building blocks of machine learning and model training pipelines, a combination of them allows the user to solve almost all imaginable tasks in a declarative and reproducible manner.
-
Blocks can be registered with the
@smol.register_dl(),@smol.register_architectureand@smol.register_loss()decorators. The to import all registered blocks and make them accessible to smol, callsmol.register()in the main script.
- Allows the user to train a model using their registered data loader, architecture and loss.
- The user creates a experiment definition in the form of a yaml file, that is appended to the global smol config with
smol.add_runtime_configs(["experiment_def.yaml"]). - The trainer is created by
trainer = BasicRuntimeConfigTrainer(exp_name, outp), where exp_name is the name of the experiment definition yaml file (without the extension), and outp is the path to the output directory where the run output files will be saved. - Several callbacks can be attached to the trainer to call on every batch in training, every batch in validation, every epoch, and start/end of the run.
- The experiment run is started with
trainer.run(). - The trainer saves the checkpoints of the model on every epoch in a sckp format, which can be opened as a .zip file.
- The trainer saves a srun file which contains metadata about the run such as hyperparamets and best metrics (its contents are in the json format).
- The trainer saves a copy of the experiment definition it was run with in a sxpd format (its contents are in the yaml format).
- Write unit and E2E tests for all smol functionalities.
- Add metadata to pyproject.py and give good description in README.
- Docs.
- Docstrings in the code.
- Register trained models (by branching).
- CLI - Print data about all experiments in folder. - Clone experiment. - Run tasks from CLI (like trainer). ...
- Deploy model to executable.