Code for the paper : Generalization error rates in kernel regression: The crossover from the noiseless to noisy regime (link to paper)
(Figs. 2, 3, solid lines)
-
replica_curves.ipynb provides a Jupyter notebook implementing the theoretical characterization of equation (14) for the excess risk
$\epsilon_g-\sigma^2$ , for Gaussian data.
-
Real_KRR.py implements kernel ridge regression on a given dataset (to be loaded in a folder Datasets/), with the
$\ell_2$ regularization strength being optimized over. For instance, to run kernel ridge regression with additive noise$\sigma=0.5$ , RBF kernel with parameter$\gamma=0.7$ , runThe --v parameter can be looped over, it simply runs through a list of sample sizespython3 Real_KRR.py --p 0.5 --k rbf --r 0.7 --d MNIST --v 6$n$ . -
Real_KRR_noreg.py implements the same routine for
$\lambda=0$ . -
Real_KRR_decay.py provides the same routine, but for a regularization generically decaying with the number of samples as
$\lambda=n^{-\ell}$ . For instance, to run kernel ridge regression with additive noise$\sigma=0.5$ , RBF kernel with parameter$\gamma=0.7$ , and regularization decay$\ell=0.1$ , run
python3 Real_KRR.py --p 0.5 --k rbf --r 0.7 --d MNIST --v 6 --c 0.1
Versions: These notebooks employ Python 3.12 , and Pytorch 2.5. The numerical experiment use the scikit-learn GridSearchCV routine, which uses sklearn 0.22 onwards.
