CUDA L-BFGS Optimization

A CUDA implementation of the limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) algorithm for large-scale unconstrained optimization. The solver uses mixed-precision arithmetic and custom GPU kernels for vector operations, with emphasis on optimizing dot-product reductions in the two-loop recursion.

Features

Mixed-precision arithmetic (float32 for runtime variables, float64 for reductions).
Custom CUDA kernels:
- dot_partial_f32_to_f64, dot_atomic_f32, dotProduct
- axpy, mulVecScal, setVectorScalar
GPU-based line search with fallback strategies.
Benchmarks on Quadratic, Rosenbrock, Rastrigin, and Ackley functions.
Comparative analysis vs. CPU baselines and cuBLAS (with/without line search).
Scalability testing across dimensions up to 16M.

Benchmark Highlights (N = 4096)

Rosenbrock

CPU: 34,786 ms
CUDA L-BFGS: 157.9 ms (220× speedup, error 2.86e‑12)
cuBLAS: 31.6 ms (fails to converge, error ~1.50e+32)
cuBLAS+LS: 1153.6 ms (converges, error 9.59e‑13)

Ackley

CPU: 1067 ms
CUDA L-BFGS: 16.6 ms (64× speedup)
cuBLAS: 45.4 ms (23× speedup)
cuBLAS+LS: 24.0 ms (44× speedup)

Rastrigin

CPU: 1471 ms
CUDA L-BFGS: 17.5 ms (84× speedup)
cuBLAS+LS: 16.5 ms (89× speedup)

Quadratic

CUDA L-BFGS: 85.4 ms (error 2.47e‑13)
cuBLAS: 14.0 ms (error 0)

Requirements

NVIDIA GPU with CUDA support (tested on Turing architecture).
CUDA Toolkit 13.0+.
C++17 compiler.
Nsight Compute (optional, for profiling).

Build

git clone https://github.com/lilhast1/lbfgs.git
cd lbfgs
nvcc -O3 lbfgs_mixed_precision.cu -o lbfgs

Run

./lbfgs

Future Work

Extend testing to more benchmark functions.
Explore multi-GPU scaling.
Apply solver to real-world tasks (ML training, inverse problems, scientific simulation).

Citation

If you use this code in your research:

@article{lbfgs_cuda, 
  title={Mixed-Precision L-BFGS on CUDA: A Comparative Benchmark},
  author={Tarik Hastor and Ismar Muslić and Merjem Gutošić and Ivona Jozić and Kanita Kadušić}, 
  year={2026} 
}

Authors

Faculty of Electrical Engineering, University of Sarajevo

Contact:
{thastor1, imuslic1, mgutosic1, ijozic1, kkadusic2}@etf.unsa.ba

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
.vscode		.vscode
benchmark		benchmark
include		include
src		src
.clang-format		.clang-format
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
bench_cublas_cuda.py		bench_cublas_cuda.py
benchmark.py		benchmark.py
boo.cu		boo.cu
lbfgs_cublas.cu		lbfgs_cublas.cu
lbfgs_cublas_LS.cu		lbfgs_cublas_LS.cu
lbfgs_mixed_precision.cu		lbfgs_mixed_precision.cu
rucno.cu		rucno.cu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CUDA L-BFGS Optimization

Features

Benchmark Highlights (N = 4096)

Requirements

Build

Run

Future Work

Citation

Authors

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

lilhast1/lbfgs

Folders and files

Latest commit

History

Repository files navigation

CUDA L-BFGS Optimization

Features

Benchmark Highlights (N = 4096)

Requirements

Build

Run

Future Work

Citation

Authors

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages