The structure of datasets are organized under datasets as follows
datasets/$DATASET_NAME
├── train
│ └── ins # training instances
└── test
└── ins # testing instances
where each dataset is placed in their own $DATASET_NAME directory
The instances (.mps or .lp files) of each dataset should be prepared and placed in $DATASET_NAME/train/ins and $DATASET_NAME/test/ins, respectively.
For IP dataset, instances can be downloaded here. One can use instances 0-299 for training and instances 9900-9999 for testing.
For SMSP dataset, please convert instances from the steelmillslab set into .mps format by running
python gen_smsp.pyThe (high-quality) solutions of each instance are collected using MILP solver Gurobi by
# IP
python collect_sols.py --dataDir ./datasets/IP/train --nWorkers 5 --maxTime 3600
# SMSP
python collect_sols.py --dataDir ./datasets/SMSP/train --nWorkers 5 --maxTime 3600
Note: The solution files for the IP and SMSP datasets are already available in their respective directories. Running this step from scratch can take multiple days.
The last step before training is converting each instance into a bipartite graph, so that it can be handled by GNNs. By running the following scripts, these biaprtite graphs will be created and stored in $DATASET_NAME/train/bg and $DATASET_NAME/test/bg.
cd ../
python dataset.py --dataset IP
python dataset.py --dataset SMSP