Skip to content

Latest commit

 

History

History
16 lines (14 loc) · 750 Bytes

File metadata and controls

16 lines (14 loc) · 750 Bytes

July 5th

  • Ran a sweep of hyperparameters on a default ResNet34 implementation and CIFAR-10 dataset.

  • Only one epoch was attempted, to try and find learning rate and batch size hyperparameters.

  • Low learning rate and small batch sizes seemed to perform the best - by a lot. Both correlate with longer runtime.

  • GPU usage was very low with the ideal parameters (small batch size), which suggests an H100 is probably not a good choice (which makes sense, given the dataset is only ~130MB compressed.)

  • Going to take the best learning rate (~0.0007) and a small batch size (~64) and try train across a lot of epochs.

  • Also going to try the same settings with a model that's twice as large and see how it performs.