D-PDLP (Distributed PDLP) is a high-performance, distributed implementation of the Primal-Dual Hybrid Gradient (PDHG) algorithm designed for solving massive-scale Linear Programming (LP) problems on multi-GPU systems.
By leveraging 2D Grid Partitioning, D-PDLP scales the first-order PDHG method across GPU clusters, efficiently harnessing the aggregate computational power and memory of multiple devices. This implementation is built upon cuPDLPx, a GPU-accelerated LP solver described in cuPDLPx: A Further Enhanced GPU-Based First-Order Solver for Linear Programming.
For a detailed explanation of the methodology, please refer to our paper: D-PDLP: Scaling PDLP to Distributed Multi-GPU Systems.
Consistent with cuPDLPx, D-PDLP solves linear programs in the standard form:
To use the solver, you must compile the project using CMake.
- GPU: NVIDIA GPU with CUDA 12.4+.
- Build Tools: CMake (≥ 3.20), GCC, NVCC.
- Distributed Tolls: MPI, NCCL.
Clone the repository and compile the project using CMake.
git clone git@github.com:Lhongpei/D-PDLP.git
cd D-PDLP
cmake -B build
cmake --build build --clean-firstThis will create the solver binary at ./build/cupdlpx-dist.
The executable supports both single-GPU and multi-GPU distributed modes. It automatically detects the mode based on the MPI launcher.
Run the solver directly without MPI to use a single GPU.
./build/cupdlpx-dist <MPS_FILE> <OUTPUT_DIR> [OPTIONS]
Use mpirun to launch the solver across multiple GPUs.
mpirun -n <NUM_GPU> ./build/cupdlpx-dist <MPS_FILE> <OUTPUT_DIR> [OPTIONS]
Positional Arguments:
<MPS_FILE>: Path to the input LP (supports.mpsand.mps.gz).<OUTPUT_DIR>: Directory where solution files will be saved.
Distributed Options:
| Option | Type | Description | Default |
|---|---|---|---|
--grid_size <r>,<c> |
string |
2D Grid topology (Rows x Cols) | Auto-detect |
--partition_method |
string |
Partitioning strategy: uniform or nnz. |
nnz |
--permute_method |
string |
Matrix permutation: none, random, or block. |
none |
Solver Parameters:
| Option | Type | Description | Default |
|---|---|---|---|
-h, --help |
flag |
Display the help message. | N/A |
-v, --verbose |
flag |
Enable verbose logging. | false |
--time_limit |
double |
Time limit in seconds. | 3600.0 |
--iter_limit |
int |
Iteration limit. | 2147483647 |
--eps_opt |
double |
Relative optimality tolerance. | 1e-4 |
--eps_feas |
double |
Relative feasibility tolerance. | 1e-4 |
--eps_infeas_detect |
double |
Infeasibility detection tolerance. | 1e-10 |
--l_inf_ruiz_iter |
int |
Iterations for L-inf Ruiz rescaling | 10 |
--no_pock_chambolle |
flag |
Disable Pock-Chambolle rescaling | enabled |
--pock_chambolle_alpha |
float |
Value for Pock-Chambolle alpha | 1.0 |
--no_bound_obj_rescaling |
flag |
Disable bound objective rescaling | enabled |
--eval_freq |
int |
Termination evaluation frequency | 200 |
--sv_max_iter |
int |
Max iterations for singular value estimation | 5000 |
--sv_tol |
float |
Tolerance for singular value estimation | 1e-4 |
--no_presolve |
flag |
Disable presolve | enabled |
Upon successful completion, the solver generates three files in the specified output directory:
<PROBLEM>_summary.txt: Scalar metrics (Time, Iterations, Primal/Dual values, Residuals).<PROBLEM>_primal_solution.txt: The full primal solution vector (one float per line).<PROBLEM>_dual_solution.txt: The full dual solution vector (one float per line).
If you use this software or method in your research, please cite our paper:
@misc{li2026dpdlpscalingpdlpdistributed,
title={D-PDLP: Scaling PDLP to Distributed Multi-GPU Systems},
author={Hongpei Li and Yicheng Huang and Huikang Liu and Dongdong Ge and Yinyu Ye},
year={2026},
eprint={2601.07628},
archivePrefix={arXiv},
primaryClass={math.OC},
url={https://arxiv.org/abs/2601.07628},
}Copyright 2025-2026 Hongpei Li, Haihao Lu.
Licensed under the Apache License, Version 2.0. See the LICENSE file for details.