In /src,
hdpointis correponding to PtsHist, whileregion_treeis for QuadHist.driver_*.pyare the drivers for experiments, with different input parameters.*_estimator.pyare the estimators, withtrain()andevaluate()as interfaces for their drivers.utility.pyincludes various data loaders, error metrics, and other shared tools for the estimators.geometry.pyincludes some geometric computations, like rectangle intersection.
We present the pseudo-codes of our algorithms' frameworks in the following.
load_data()
estimator = build_estimator()
estimator.train()
estimator.evaluate()
get_results()
class RegionTreeEstimator:
...
def train():
tree = build_region_tree()
for train_data in train_list:
recursively_split(tree, train_data)
build_equation_system()
solve()
def evaluate():
for test_data in test_list:
calc(test_data)
...
class HDPointEstimator:
...
def train():
weighted_points = []
for train_data in train_list:
weighted_points.append(train_data.sample())
build_equation_system()
solve()
def evaluate():
for test_data in test_list:
calc(test_data)
...
# driver_region_tree.py
# Vary XXX in the instruction, or use '--help' for hints
python driver_region_tree.py --dataset XXX --query_type XXX --train_size XXX --threshold XXX --buckets_limit XXX --test_size XXX --solver XXX
# driver_hdpoint.py
# Vary XXX in the instruction, or use '--help' for hints
python driver_hdpoint.py --dataset XXX --query_type XXX --train_size XXX --threshold XXX --buckets_limit XXX --alpha XXX --test_size XXX
To test other workloads, firstly add path and filename for both workload and min_max_range for data loaders in utility.py, place them in the corresponding position, and then add the new item into --dataset []. We will give more concrete examples in the released version.
scipy >= 1.7.2
cvxopt >= 1.2.7 (if use)
cplex >= 20.1.0.1 (and a license, if use)
gurobipy >= 9.5.0 (and a license, if use)
trainsize_buckets_threshold = {
50 : [
[100, 0.052],
[500, 0.012],
[1000, 0.0061],
[5000, 0.0015],
[10000, 0.0007]
],
200 : [
[100, 0.08],
[500, 0.018],
[1000, 0.0096],
[5000, 0.0021],
[10000, 0.0013]
],
500 : [
[100, 0.08],
[500, 0.0205],
[1000, 0.011],
[5000, 0.00267],
[10000, 0.0014]
],
1000 : [
[100, 0.11],
[500, 0.025],
[1000, 0.014],
[5000, 0.003],
[10000, 0.0017]
],
2000 : [
[100, 0.125],
[500, 0.03],
[1000, 0.016],
[5000, 0.0033],
[10000, 0.0019]
]
}
Use triple (train_size, buckets_limit, threshold) as above in the following instruction
python3 drive_region_tree.py --dataset Power-2d-data --query_type rect --train_size XXX --threshold XXX --buckets_limit XXX --test_size 100 --solver nnls
trainsize_buckets_threshold = {
50 : [
[100, 0.052],
[500, 0.0105],
[1000, 0.0063],
[5000, 0.0015],
[10000, 0.0006],
[50000, 0.00015],
[100000, 0.00007]
],
200 : [
[100, 0.08],
[500, 0.018],
[1000, 0.0096],
[5000, 0.0021],
[10000, 0.001],
[50000, 0.0002],
[100000, 0.0001]
],
500 : [
[100, 0.08],
[500, 0.02],
[1000, 0.0105],
[5000, 0.0025],
[10000, 0.0014],
[50000, 0.0003],
[100000, 0.00015]
],
1000 : [
[100, 0.11],
[500, 0.027],
[1000, 0.015],
[5000, 0.0031],
[10000, 0.0016],
[50000, 0.0004],
[100000, 0.00016]
],
2000 : [
[100, 0.125],
[500, 0.031],
[1000, 0.016],
[5000, 0.0036],
[10000, 0.0019],
[50000, 0.0004],
[100000, 0.0002]
]
}
Use triple(train_size, buckets_limit, threshold) as above in the following instruction
python3 driver_region_tree.py --dataset Power-2d-data --query_type rect --train_size XXX --buckets_limit XXX --threshold XXX --test_size 1000 --solver gurobi_linf