Skip to content

imuslic1/bfgs-parallelization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Parallel BFGS Optimization in C++ with OpenMP and SIMD

Overview

This project implements a parallelized version of the BFGS (Broyden–Fletcher–Goldfarb–Shanno) optimization algorithm using C++, OpenMP, and SIMD (Single Instruction Multiple Data) vectorization.

The goal of this project is to accelerate the computation of gradient-based optimization in high-dimensional spaces ( e.g., 5,000+ dimensions) through a hybrid approach that combines shared-memory parallelism (OpenMP) and * data-level parallelism* (SIMD intrinsics).

This work was developed as part of the "Parallel computing systems" Master’s course at the Faculty of Electrical Engineering, University of Sarajevo.


Key Features

  • Full BFGS optimization implementation for multivariate functions
  • Parallel gradient and Hessian evaluations using OpenMP
  • SIMD acceleration for vector operations (dot products, matrix-vector multiplications)
  • Flexible dimensionality — tested at 5,000+ dimensions
  • Benchmarking suite comparing serial vs. parallel execution times on multiple benchmark functions
  • Continuous (differentiable) TSP relaxation for testing BFGS

Technical Details

1. Parallelization Strategy

  • OpenMP parallel regions are used to divide gradient and Hessian computations across threads.
  • Reduction clauses are applied for summation-heavy operations (e.g., dot products).
  • The algorithm maintains thread-safe updates to shared memory (Hessian approximation) through synchronized regions.

2. SIMD Optimization

  • Utilizes compiler auto-vectorization and manual SIMD intrinsics (e.g., SSE/AVX) for linear algebra routines.
  • Ensures memory alignment of vectors and matrices for maximum SIMD efficiency (alignas(32) or std::aligned_alloc).

Performance Results

The following table shows relative speedups (normalized to the sequential version = 1.0) for different compilation modes and problem sizes:

Problem Dimension Sequential OpenMP + Autovectorization OpenMP + SIMD OpenMP + SIMD + Pointer Array
1,000 1.00 1.95 1.88 2.02
3,000 1.00 9.18 9.07 9.27
5,000 1.00 6.49 6.49 7.15

(Exact results depend on CPU architecture and compiler optimizations.)


Hardware configuration for testing:

  • Intel Core i5-14600K
    • 14 cores (6 Performance + 8 Efficiency)
    • 20 threads
    • Maximum boost clock up to 5.3 GHz
    • 1.2 MB L1 cache, 20 MB L2 cache, 24 MB L3 cache
    • 32 GB DDR5 RAM, 6000 MT/s
  • Windows 11 Pro

Dependencies

  • C++23 or newer
  • OpenMP (usually included with GCC/Clang)
  • Optional: Intel or GCC SIMD intrinsics

License

Licensed under the Apache License, Version 2.0. See LICENSE for details.
© 2025 Ismar Muslić, Tarik Hastor, Ivona Jozić, Kanita Kadušić, Merjem Gutošić

About

OpenMP parallelization of BFGS optimization algorithm

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •