Skip to content

Job workflow rewrite (generic BOINC wrapper) #21

@itaybthl

Description

@itaybthl

Now that we have BOINC integration, it is much clearer how unfit the existing workflow is for running jobs with large scale parallelization. The job workflow in LIReC must be rewritten, partly to make its use in distributed systems such as BOINC easier, and to pave the way to a generic BOINC wrapper.

On the highest level, the BOINC wrapper can be run in multiple ways, depending on its second argument (the first is reserved for the job configuration file):

  • (no second argument): Takes a job configuration and runs it as is.
  • (any integer n): "Multicore mode": Further splits the search space to make use of n cores. Equivalent to no second argument if n < 2. Mostly intended for running code locally.
  • -s or --space-size: Instead of running the job, returns the size of the search space specified by the job configuration.
  • -t or --timing: Instead of running the job, returns an estimate for how much time each item in the search space will need. Roughly speaking, this should be the longest time needed from 50 or so items in the search space, or however many items could be tested within 5 seconds (whichever takes less time).

The wrapper should handle a generic search space, its segmentation, and its result collection. This part of the code should look like:

options = module.get_search_space(config)
results = []
minimal = getattr(module, 'MINIMAL_MODE', False)
for i, item in enumerate(options):
    if i < first: continue
    if i >= last:
        logging.info(f'surpassed end of search space slice, terminating')
        break
    if i == first:
        logging.info(f'found start of search space slice, beginning search')
    
    res = module.process_item(item, options, results, config)
    if res:
        results.extend([i] if minimal else res)
write_results_to_file(results, 'output.json') # same as in job_poly_pslq_v1

Here, config is the job configuration (config.py), first and last specify a search space slice, module is the specific python code to run (for instance job_poly_pslq_v1), and logging is initialized like in job_poly_pslq_v1.

The module used in the above code must have 2 functions, and optionally has 2 more things:

  • get_search_space(config): Generates the search space. In terms of job_poly_pslq_v1, this would be the final product(*subsets) inside execute_job.
  • process_item(item, options, results, config): Processes a single item, returning a list of results from it. Said list may be empty, in which case the item produced nothing. In terms of job_poly_pslq_v1, this would be the contents of the loop inside execute_job after the search space segmentation logic.
  • MINIMAL_MODE: An optional field which, when set to True, treats res as a boolean and instead turns results into a list of "good indices". This is intended to shorten development time in case each run of process_item is independent from every other run (e.g. factorial reduction searches, but not job_poly_pslq_v1).
  • get_search_space_size(config): Not shown here, equivalent to len(list(get_search_space(config))). Intended for cases where options can be huge, and module can provide a faster way to compute just its size. Eitherway, this value is what's returned when running the code with -S.

Additionally, the task_avail_workgen on the BOINC machine needs to be rewritten so it can automatically segment a search space given to it by a config. The details for this are left vague intentionally so we can figure out what works best with BOINC.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions