Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 5 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@

This package is intended for running and benchmarking optimization algorithms in Pytorch. It could be used for

* retrieving training curves for standard methods (SGD, Adam) on standard benchmark problems (e.g. training Resnets for Cifar),
* testing new methods
* retrieving training curves for standard methods (SGD, Adam) on standard benchmark problems (e.g. training a ResNet on CIFAR10/100),
* testing and benchmarking new optimization algorithms


## Getting started
Expand All @@ -22,7 +22,9 @@ or in order to install in developer mode via

## Results

For the experiments we ran, we provide the code that generated the results (i.e. the model, dataset preprocessing and training setup) as well as the actual scores at the end of each epoch. An overview and all links are given in the table below.
Currently, the repo implements some standard image classification architectures (ResNet, VGG, ViT) and datasets (MNIST, CIFAR, Imagenet). It also contains a toy language model training setup (character-level Shakespeare dataset).

Below is an overview of experiments for which we provide the code that generated the results (i.e. the model, dataset preprocessing and training setup) as well as the actual scores at the end of each epoch. An overview and all links are given in the table below.

| ID | Model | Dataset | Results |
|-----|--------|----------|----------|
Expand Down
18 changes: 18 additions & 0 deletions configs/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,21 @@
## Test examples

The simplest example to run is

```
python run.py --id test --verbose
```

To train a Llama-style transformer on the character-level Shakespeare dataset, run

```
python run.py -i test_shakespeare --device mps --verbose
```

This config also shows how to use stepwise learning-rate schedulers, for example for warmup.

On Apple M3 Pro, using the ``mps`` device, this should take roughly five minutes, and reach a train loss of around 1.3-1.4 after 10 epochs.

## Remarks on config management

1) The simple option: Create a dict-type config (e.g. like [test.json](test.json)). The file name (in this example we use ``my_exp.json``) will serve as an identifier ``exp_id`` in the next steps. You can then run all entries of the config with one job.
Expand Down
12 changes: 12 additions & 0 deletions configs/test_shakespeare.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
{
"batch_size": 16,
"dataset": "shakespeare",
"dataset_kwargs": {},
"max_epoch": 10,
"model": "llama",
"model_kwargs": {"vocab_size": 92, "dim": 384, "expand": 4, "n_layers": 3, "n_heads": 2, "mlp": "mlp", "seq_len": 512},
"opt": [{"name": "adam", "lr": [1e-3], "lr_schedule": "constant", "warmup_steps": 100, "stepwise_schedule": true}],
"loss_func": "sequence_cross_entropy",
"score_func": "sequence_cross_entropy_accuracy",
"n_runs": 1
}
Loading