Job workflow rewrite (generic BOINC wrapper)

Now that we have BOINC integration, it is much clearer how unfit the existing workflow is for running jobs with large scale parallelization. The job workflow in LIReC must be rewritten, partly to make its use in distributed systems such as BOINC easier, and to pave the way to a generic BOINC wrapper.

On the highest level, the BOINC wrapper can be run in multiple ways, depending on its second argument (the first is reserved for the job configuration file):
* (no second argument): Takes a job configuration and runs it as is.
* (any integer `n`): "Multicore mode": Further splits the search space to make use of `n` cores. Equivalent to no second argument if `n < 2`. Mostly intended for running code locally.
* `-s` or `--space-size`: Instead of running the job, returns the size of the search space specified by the job configuration.
* `-t` or `--timing`: Instead of running the job, returns an estimate for how much time each item in the search space will need. Roughly speaking, this should be the longest time needed from 50 or so items in the search space, or however many items could be tested within 5 seconds (whichever takes less time).

The wrapper should handle a generic search space, its segmentation, and its result collection. This part of the code should look like:
```python
options = module.get_search_space(config)
results = []
minimal = getattr(module, 'MINIMAL_MODE', False)
for i, item in enumerate(options):
    if i < first: continue
    if i >= last:
        logging.info(f'surpassed end of search space slice, terminating')
        break
    if i == first:
        logging.info(f'found start of search space slice, beginning search')
    
    res = module.process_item(item, options, results, config)
    if res:
        results.extend([i] if minimal else res)
write_results_to_file(results, 'output.json') # same as in job_poly_pslq_v1
```
Here, `config` is the job configuration (`config.py`), `first` and `last` specify a search space slice, `module` is the specific python code to run (for instance `job_poly_pslq_v1`), and `logging` is initialized like in `job_poly_pslq_v1`.

The `module` used in the above code must have 2 functions, and optionally has 2 more things:
* `get_search_space(config)`: Generates the search space. In terms of `job_poly_pslq_v1`, this would be the final `product(*subsets)` inside `execute_job`.
* `process_item(item, options, results, config)`: Processes a single item, returning a list of results from it. Said list may be empty, in which case the item produced nothing. In terms of `job_poly_pslq_v1`, this would be the contents of the loop inside `execute_job` after the search space segmentation logic.
* `MINIMAL_MODE`: An optional field which, when set to True, treats `res` as a boolean and instead turns `results` into a list of "good indices". This is intended to shorten development time in case each run of `process_item` is independent from every other run (e.g. factorial reduction searches, but not `job_poly_pslq_v1`).
* `get_search_space_size(config)`: Not shown here, equivalent to `len(list(get_search_space(config)))`. Intended for cases where `options` can be huge, and `module` can provide a faster way to compute just its size. Eitherway, this value is what's returned when running the code with `-S`.


Additionally, the `task_avail_workgen` on the BOINC machine needs to be rewritten so it can automatically segment a search space given to it by a config. The details for this are left vague intentionally so we can figure out what works best with BOINC.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Job workflow rewrite (generic BOINC wrapper) #21

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Job workflow rewrite (generic BOINC wrapper) #21

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions