Improved GPU selection mechanism #8

denseishin · 2025-02-19T13:55:02Z

GPU scheduling

GPU's have their own binary tree-like data structure now, which helps in selecting neighbouring GPUs.
The algorithm preferrably selects neighbouring GPUs in pairs of 2 if the requested amount is even. If it's uneven, it selects the last GPU indepedently. If multiple pairs are needed, it'll preferrably choose neighbouring pairs.
The algorithm assumes that either each half of the GPUs is on the same PCIe segment (assuming 2 PCIe segments) or that all GPUs are on the same PCIe segment and that GPUs might be paired up through NVLink. These cases apply quite often. In these cases, this selection method accelerates the computation by taking advantage of better inter-GPU comms between the chosen GPUs when using multiple GPUs in one task.
If no GPU is already reserved at the start of a job, the scheduler will randomly pick a traversal direction (left or right). That will determine which PCIe segment (or half-segment of only one segment is used) the GPUs will be picked from.
If one or more GPUs on one segment are used, the scheduler will start from there (setting the traversal direction to the respective segment) and try to fill up that segment first if there's enough GPUs left on that segment. The outermost available GPU (pairs) in the traversal direction will be assigned first. If that segment does not have enough space left for the request, it'll use the other segment instead. If the amount of requested GPUs is bigger than a segment, it'll automatically include both in the search for pairs. In case both segments can't provide neighbouring GPU's for the request, it'll pick leftover GPUs regardless of their segment and pairing if there are enough GPUs left in total.
If any of the assumptions are incorrect, the program will still work, but it wont improve inter-GPU comms.
There's also an option to reserve GPU's for outside applications. Internally these are marked as eternally occupied then and wont be assigned by the queue system.

Structuring

A lot of scheduling-related code from qserver gets moved to scheduler.py. The job classes are now unified. Type annotations are added. I also added some test job scripts

The algorithm is added to improve inter-GPU communications and therefore to improve speed of multi-GPU jobs compared to the previous version. The algorithm preferrably selects neighbouring GPUs in pairs of 2 if the requested amount is even. If it's uneven, it selects the last GPU indepedently. The algorithm assumes that either each half of the GPUs are on the same NUMA node or that all GPUs are on the same NUMA node and that GPUs might be paired up through NVLink. These cases apply quite often. In these cases, it accelerates the computation by taking advantage of better inter-GPU comms when using multiple GPUs in one task. If any of the assumptions are incorrect, the program will still work, but it wont improve inter-GPU comms. Move scheduling-related code in qserver to scheduler.py Add test jobs Further refactoring might follow

refactor Job classes

…plete type annotations where necessary.

mgarbade and others added 7 commits January 14, 2025 14:32

Fix elapsed time display to use integer division for hours and minutes

543a2ed

add type annotations to qserver and scheduler.py

abe3f96

refactor Job classes

add feature to ignore a GPU / reserve it for outside application, com…

9fdb2cd

…plete type annotations where necessary.

documentation update

38b3be1

quick fix

e60dce6

Merge branch 'fixes' into main

1c76719

denseishin force-pushed the main branch from c6f81cb to 41572f2 Compare February 21, 2025 09:28

prevent ignored GPUs in available GPU list

befa32a

denseishin force-pushed the main branch from 41572f2 to befa32a Compare February 21, 2025 09:30

denseishin added 4 commits July 18, 2025 18:16

add option to make a job dependent on a job that is already running

a51eff8

bugfix for when no GPU configured

1cd6559

Merge branch 'refs/heads/dyndep'

435010d

bugfix

ff2a4de

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved GPU selection mechanism #8

Improved GPU selection mechanism #8

Uh oh!

denseishin commented Feb 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Improved GPU selection mechanism #8

Are you sure you want to change the base?

Improved GPU selection mechanism #8

Uh oh!

Conversation

denseishin commented Feb 19, 2025

GPU scheduling

Structuring

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants