Skip to content

add_compute_plan - which batch size to use #282

@Esadruhn

Description

@Esadruhn

Summary

When we add a compute plan with N tasks, we can set autobatching to True and set the batch size.
This submits the tasks to the backend by batches of size batch_size. The fastest option is to increase the batch size as much as possible without getting backend errors.

The default batch size is 500, the question here is: how to find the maximal batch size we can use?

What happens when the batch size is too big

When the batch size is too big (451 tasks * 400 data samples per task), we get the following error

Requests error status 429: {"message":"grpc: received message larger than max (6228668 vs. 4194304)"}
Traceback (most recent call last):
  File "HIDDEN/substra/sdk/backends/remote/rest_client.py", line 114, in __request
    r.raise_for_status()
  File "HIDDEN/requests/models.py", line 960, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 429 Client Error: Too Many Requests for url: HIDDEN/task/bulk_create/

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions