fabian-sp · fabian-sp · Jul 23, 2025 · Jun 26, 2025 · Jun 27, 2025 · Jun 27, 2025
diff --git a/README.md b/README.md
@@ -6,8 +6,8 @@
 
 This package is intended for running and benchmarking optimization algorithms in Pytorch. It could be used for 
 
-* retrieving training curves for standard methods (SGD, Adam) on standard benchmark problems (e.g. training Resnets for Cifar),
-* testing new methods 
+* retrieving training curves for standard methods (SGD, Adam) on standard benchmark problems (e.g. training a ResNet on CIFAR10/100),
+* testing and benchmarking new optimization algorithms
 
 
 ## Getting started
@@ -22,7 +22,9 @@ or in order to install in developer mode via
 
 ## Results
 
-For the experiments we ran, we provide the code that generated the results (i.e. the model, dataset preprocessing and training setup) as well as the actual scores at the end of each epoch. An overview and all links are given in the table below.
+Currently, the repo implements some standard image classification architectures (ResNet, VGG, ViT) and datasets (MNIST, CIFAR, Imagenet). It also contains a toy language model training setup (character-level Shakespeare dataset).
+
+Below is an overview of experiments for which we provide the code that generated the results (i.e. the model, dataset preprocessing and training setup) as well as the actual scores at the end of each epoch. An overview and all links are given in the table below.
 
 | ID  | Model  | Dataset  | Results  |  
 |-----|--------|----------|----------|

diff --git a/configs/README.md b/configs/README.md
@@ -1,3 +1,21 @@
+## Test examples
+
+The simplest example to run is 
+
+```
+python run.py --id test --verbose
+```
+
+To train a Llama-style transformer on the character-level Shakespeare dataset, run
+
+```
+python run.py -i test_shakespeare --device mps --verbose
+```
+
+This config also shows how to use stepwise learning-rate schedulers, for example for warmup.
+
+On Apple M3 Pro, using the ``mps`` device, this should take roughly five minutes, and reach a train loss of around 1.3-1.4 after 10 epochs.
+
 ## Remarks on config management
 
 1) The simple option: Create a dict-type config (e.g. like [test.json](test.json)). The file name (in this example we use ``my_exp.json``) will serve as an identifier ``exp_id`` in the next steps. You can then run all entries of the config with one job.

diff --git a/configs/test_shakespeare.json b/configs/test_shakespeare.json
@@ -0,0 +1,12 @@
+{
+    "batch_size": 16,
+    "dataset": "shakespeare",
+    "dataset_kwargs": {},
+    "max_epoch": 10,
+    "model": "llama",
+    "model_kwargs": {"vocab_size": 92, "dim": 384, "expand": 4, "n_layers": 3, "n_heads": 2, "mlp": "mlp", "seq_len": 512},
+    "opt": [{"name": "adam", "lr": [1e-3], "lr_schedule": "constant", "warmup_steps": 100, "stepwise_schedule": true}],
+    "loss_func": "sequence_cross_entropy",
+    "score_func": "sequence_cross_entropy_accuracy",
+    "n_runs": 1
+}