This repository contains the code, data, and solver modifications for the first RL-based Dynamic Algorithm Configuration (DAC) system for MaxSAT local search.
| Comparison | PPO Mean Cost | Baseline Mean Cost | Improvement | Wilcoxon p |
|---|---|---|---|---|
| PPO vs Random | 372.1 | 459.6 | −19.0% | 2.4 × 10⁻⁵ |
| PPO vs Best Static | 372.1 | 415.1 | −10.4% | 0.0069 |
| PPO vs BO Static | 372.1 | 321.1 | +15.9% | — |
| PPO vs BO (v500) | — | — | 7/10 wins | 0.019 |
All results from 3 seeds × 18 test instances = 54 evaluation points. BO static outperforms PPO on small instances, but PPO overtakes at 500-variable scale.
- DAC (Dynamic Algorithm Configuration): Adjust solver parameters during solving based on search state, not just before solving
- Control Layer: An external PPO controller observes NuWLS search state every 1,000 flips and adjusts 4 clause-weighting parameters
- Key Finding: The learned policy discovers an explore-then-exploit noise schedule, with
noise_probas the dominant control channel
learned-control-layers/
├── src/
│ ├── gym_env.py # Gymnasium DAC environment
│ ├── solver_wrapper.py # NuWLS-DAC subprocess interface
│ ├── configs.py # Shared configuration and path utilities
│ ├── evaluation.py # Shared evaluation and statistical testing
│ ├── train_500k.py # Main training script (3 seeds × 500K steps)
│ ├── generate_instances.py # Phase-transition benchmark generator
│ ├── ablation_study.py # State/action feature ablation
│ ├── scale_experiment.py # Scale transfer experiments (v200, v500)
│ └── ... # Additional experiment scripts
├── data/
│ ├── solvers/
│ │ ├── NuWLS/NuWLS-dac/ # Modified NuWLS with DAC protocol (C++)
│ │ └── USW-LS/USW-LS-dac/ # Modified USW-LS for cross-solver transfer
│ ├── benchmarks/
│ │ ├── generated/ # 100 phase-transition instances (v50/v75/v100)
│ │ ├── generated_hard/ # 100 hard partial MaxSAT instances (pt75)
│ │ └── generated_large/ # Scale transfer instances (v200/v500)
│ └── results/ # All experimental results (JSON)
├── configs/default.json # Project configuration
├── requirements.txt # Python dependencies
└── REPRODUCE.md # Step-by-step reproduction guide
Pre-trained PPO checkpoints are available in the v1.0.0 release, enabling instant reproduction of all evaluation results without retraining.
# 1. Install dependencies
pip install -r requirements.txt
# 2. Compile NuWLS-DAC solver
cd data/solvers/NuWLS/NuWLS-dac
make # Linux/Mac (g++)
# or on Windows with MSVC:
# cl.exe /O2 /EHsc pms.cpp /Fe:nuwls-dac.exe
# or download the pre-compiled binary from the release
cd ../../../..
# 3. Download pre-trained models
mkdir -p data/results/ppo_csolver_500k
cd data/results/ppo_csolver_500k
# Download model_seed{42,123,999}.zip from the release
cd ../../..
# 4. Reproduce Table 5 (main results)
python src/train_500k.pypip install -r requirements.txt
cd data/solvers/NuWLS/NuWLS-dac
make && cd ../../../..
python src/smoke_test.py # verify solver works
python src/train_500k.py # train 3 seeds × 500K stepsSee REPRODUCE.md for detailed step-by-step instructions.
The solver modification adds a --dac N flag that pauses every N variable flips:
- Solver → Controller: Emits 12-dimensional state vector (cost, improvement rate, plateau length, clause weight statistics, etc.)
- Controller → Solver: Sends 6 parameter values (weight increment, smoothing probability, noise probability, etc.) or
CONTINUE/STOP
This creates a standard Gymnasium interface compatible with any RL algorithm from Stable-Baselines3.
- Exploration parameter dominance:
noise_probalone accounts for nearly all DAC benefit - Scale-dependent DAC advantage: PPO overtakes optimal static at 5× training scale
- Automatic explore-then-exploit discovery: Policy independently discovers noise annealing
- Scale-dependent feature importance: Clause weight features hurt at 100K but help at 500K
- Decision frequency sensitivity: Finer checkpoints (250 vs 1000 flips) disproportionately benefit learned policies
- Python 3.10+
- g++ (Linux/Mac) or MSVC (Windows) for solver compilation
- 4+ CPU cores recommended
- ~4 hours for full training (500K steps × 3 seeds)
Learned Control Layers for MaxSAT Local Search Alex Chengyu Li, 2026
Please cite as: Li, Alex Chengyu (2026). "Learned Control Layers for MaxSAT Local Search: Dynamic Algorithm Configuration of Clause Weighting Parameters." Preprint.
@misc{li2026maxsatdac,
author = {Li, Alex Chengyu},
title = {Learned Control Layers for {MaxSAT} Local Search:
Dynamic Algorithm Configuration of Clause Weighting Parameters},
year = {2026},
note = {Preprint}
}This project is licensed under the MIT License. See LICENSE for details.