Now that we have BOINC integration, it is much clearer how unfit the existing workflow is for running jobs with large scale parallelization. The job workflow in LIReC must be rewritten, partly to make its use in distributed systems such as BOINC easier, and to pave the way to a generic BOINC wrapper.
On the highest level, the BOINC wrapper can be run in multiple ways, depending on its second argument (the first is reserved for the job configuration file):
- (no second argument): Takes a job configuration and runs it as is.
- (any integer
n): "Multicore mode": Further splits the search space to make use of n cores. Equivalent to no second argument if n < 2. Mostly intended for running code locally.
-s or --space-size: Instead of running the job, returns the size of the search space specified by the job configuration.
-t or --timing: Instead of running the job, returns an estimate for how much time each item in the search space will need. Roughly speaking, this should be the longest time needed from 50 or so items in the search space, or however many items could be tested within 5 seconds (whichever takes less time).
The wrapper should handle a generic search space, its segmentation, and its result collection. This part of the code should look like:
options = module.get_search_space(config)
results = []
minimal = getattr(module, 'MINIMAL_MODE', False)
for i, item in enumerate(options):
if i < first: continue
if i >= last:
logging.info(f'surpassed end of search space slice, terminating')
break
if i == first:
logging.info(f'found start of search space slice, beginning search')
res = module.process_item(item, options, results, config)
if res:
results.extend([i] if minimal else res)
write_results_to_file(results, 'output.json') # same as in job_poly_pslq_v1
Here, config is the job configuration (config.py), first and last specify a search space slice, module is the specific python code to run (for instance job_poly_pslq_v1), and logging is initialized like in job_poly_pslq_v1.
The module used in the above code must have 2 functions, and optionally has 2 more things:
get_search_space(config): Generates the search space. In terms of job_poly_pslq_v1, this would be the final product(*subsets) inside execute_job.
process_item(item, options, results, config): Processes a single item, returning a list of results from it. Said list may be empty, in which case the item produced nothing. In terms of job_poly_pslq_v1, this would be the contents of the loop inside execute_job after the search space segmentation logic.
MINIMAL_MODE: An optional field which, when set to True, treats res as a boolean and instead turns results into a list of "good indices". This is intended to shorten development time in case each run of process_item is independent from every other run (e.g. factorial reduction searches, but not job_poly_pslq_v1).
get_search_space_size(config): Not shown here, equivalent to len(list(get_search_space(config))). Intended for cases where options can be huge, and module can provide a faster way to compute just its size. Eitherway, this value is what's returned when running the code with -S.
Additionally, the task_avail_workgen on the BOINC machine needs to be rewritten so it can automatically segment a search space given to it by a config. The details for this are left vague intentionally so we can figure out what works best with BOINC.
Now that we have BOINC integration, it is much clearer how unfit the existing workflow is for running jobs with large scale parallelization. The job workflow in LIReC must be rewritten, partly to make its use in distributed systems such as BOINC easier, and to pave the way to a generic BOINC wrapper.
On the highest level, the BOINC wrapper can be run in multiple ways, depending on its second argument (the first is reserved for the job configuration file):
n): "Multicore mode": Further splits the search space to make use ofncores. Equivalent to no second argument ifn < 2. Mostly intended for running code locally.-sor--space-size: Instead of running the job, returns the size of the search space specified by the job configuration.-tor--timing: Instead of running the job, returns an estimate for how much time each item in the search space will need. Roughly speaking, this should be the longest time needed from 50 or so items in the search space, or however many items could be tested within 5 seconds (whichever takes less time).The wrapper should handle a generic search space, its segmentation, and its result collection. This part of the code should look like:
Here,
configis the job configuration (config.py),firstandlastspecify a search space slice,moduleis the specific python code to run (for instancejob_poly_pslq_v1), andloggingis initialized like injob_poly_pslq_v1.The
moduleused in the above code must have 2 functions, and optionally has 2 more things:get_search_space(config): Generates the search space. In terms ofjob_poly_pslq_v1, this would be the finalproduct(*subsets)insideexecute_job.process_item(item, options, results, config): Processes a single item, returning a list of results from it. Said list may be empty, in which case the item produced nothing. In terms ofjob_poly_pslq_v1, this would be the contents of the loop insideexecute_jobafter the search space segmentation logic.MINIMAL_MODE: An optional field which, when set to True, treatsresas a boolean and instead turnsresultsinto a list of "good indices". This is intended to shorten development time in case each run ofprocess_itemis independent from every other run (e.g. factorial reduction searches, but notjob_poly_pslq_v1).get_search_space_size(config): Not shown here, equivalent tolen(list(get_search_space(config))). Intended for cases whereoptionscan be huge, andmodulecan provide a faster way to compute just its size. Eitherway, this value is what's returned when running the code with-S.Additionally, the
task_avail_workgenon the BOINC machine needs to be rewritten so it can automatically segment a search space given to it by a config. The details for this are left vague intentionally so we can figure out what works best with BOINC.