Skip to content

Conversation

@ZhengW22
Copy link

What type of PR is this?

What this PR does / why we need it:
This PR adds the capability to disable GPUs at the node level by applying annotations to nodes. GPUs matching the specified UUIDs will no longer be allocated to any pods.

The implementation works by setting the used count of the corresponding node GPUs to their maximum capacity when calculating nodeUsage, effectively occupying those resources. This approach maintains compatibility with scheduling logic for different types of GPU cards.

Which issue(s) this PR fixes:
No.

Special notes for your reviewer:
No.

Does this PR introduce a user-facing change?:
No.

archlitchi and others added 30 commits December 24, 2024 17:02
* update documents for config

Signed-off-by: limengxuan <391013634@qq.com>
* Setting devicePlugin.compatWithCPUManager=true will set PASS_DEVICE_SPECS=true as an environment variable.

Signed-off-by: 张 驰 <919474320@qq.com>

* Change the parameter compatWithCPUManager for setting the PASS_DEVICE_SPECS ENV to passDeviceSpecsEnabled, and set the default value to true.

Signed-off-by: 张 驰 <919474320@qq.com>

---------

Signed-off-by: 张 驰 <919474320@qq.com>
Signed-off-by: learner0810 <zhongjun.li@daocloud.io>
…ler. (Project-HAMi#746)

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: bin <bin.pan@daocloud.io>
Signed-off-by: elrondwong <gfengwong@gmail.com>
Signed-off-by: bin <bin.pan@daocloud.io>
* add star history to readme, fix typos and add more contributors and maintainers.

Signed-off-by: yangshiqi <yangshiqi1089@gmail.com>

* add spaces

Signed-off-by: yangshiqi <yangshiqi1089@gmail.com>

---------

Signed-off-by: yangshiqi <yangshiqi1089@gmail.com>
Signed-off-by: Fengyang <495160589@qq.com>
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
Signed-off-by: KubeKyrie <shaolong.qin@daocloud.io>
Signed-off-by: yxxhero <aiopsclub@163.com>
Signed-off-by: bin <bin.pan@daocloud.io>
…t-HAMi#767)

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: KubeKyrie <shaolong.qin@daocloud.io>
Signed-off-by: KubeKyrie <shaolong.qin@daocloud.io>
Signed-off-by: learner0810 <zhongjun.li@daocloud.io>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: wen.rui <wen.rui@daocloud.io>
Signed-off-by: Rei1010 <wen.rui@daocloud.io>
Signed-off-by: wen.rui <wen.rui@daocloud.io>
Signed-off-by: jinye <jinye.shi@daocloud.io>
Signed-off-by: jinye <jinye.shi@daocloud.io>
Bumps [docker/build-push-action](https://github.com/docker/build-push-action) from 6.10.0 to 6.11.0.
- [Release notes](https://github.com/docker/build-push-action/releases)
- [Commits](docker/build-push-action@v6.10.0...v6.11.0)

---
updated-dependencies:
- dependency-name: docker/build-push-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: jinye <jinye.shi@daocloud.io>
…anagement (Project-HAMi#778)

Signed-off-by: haitwang-cloud <haitao_wht@outlook.com>
Signed-off-by: wen.rui <wen.rui@daocloud.io>
Signed-off-by: lixd <xueduan.li@gmail.com>
Signed-off-by: wen.rui <wen.rui@daocloud.io>
Signed-off-by: phoenixwu0229 <phoenixwu0229@icloud.com>
@hami-robot
Copy link
Contributor

hami-robot bot commented Jan 29, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Shouren, ZhengW22

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ZhengW22
Copy link
Author

/gemini review

@ZhengW22
Copy link
Author

@ZhengW22 Could you please take a look at the critical comments in gemini-code-assist?

I have already fix the critical comments and update code to new version.

@ZhengW22
Copy link
Author

@wawa0210 Hello, please help to review this pr.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a feature to disable GPUs at the node level using annotations, which is a valuable addition. However, my review has identified a few issues.

First, the PR is very large and bundles several unrelated changes. Besides the main feature, it includes a significant amount of new code for the NVIDIA device plugin and OCI runtime handling, as well as constants for scheduler policies. These additions are not mentioned in the PR description, making it difficult to review and understand their purpose and impact. It's highly recommended to split these unrelated changes into separate pull requests to follow best practices for code review.

Second, there's a discrepancy between the PR description and the implementation. The description states that disabled GPUs are handled by 'setting the used count... to their maximum capacity', but the code actually filters them out from the scheduler's list of available devices. Please update the description to match the implementation.

My review comments below focus on the 'disable GPU' feature. I've found a bug in the implementation and a major inconsistency in the added documentation that should be addressed.

Comment on lines 78 to 96
## Device UUID Selection

You can specify which GPU devices to use or exclude using annotations:

```yaml
apiVersion: v1
kind: Pod
metadata:
name: poddemo
annotations:
# Use specific GPU devices (comma-separated list)
enflame.com/use-gpuuuid: "node1-enflame-0,node1-enflame-1"
# Or exclude specific GPU devices (comma-separated list)
enflame.com/nouse-gpuuuid: "node1-enflame-2,node1-enflame-3"
spec:
# ... rest of pod spec
```

> **NOTE:** The device ID format is `{node-name}-enflame-{index}`. You can find the available device IDs in the node status.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This "Device UUID Selection" section seems to describe a different feature than what is implemented in this pull request. The documentation describes pod-level annotations (enflame.com/use-gpuuuid, enflame.com/nouse-gpuuuid) for selecting GPUs, while the PR implements a node-level annotation for disabling GPUs from being scheduled. This is confusing. The documentation should be updated to accurately describe the new node-level feature. Additionally, the implementation in pkg/scheduler/nodes.go does not seem to include support for Enflame devices in the vendorNoUseAnnoKeyMap, so this feature wouldn't work for Enflame devices as documented.

} else {
m.nodes[nodeID] = nodeInfo
}
nodeInfo.Devices = rmDeviceByNodeAnnotation(m.nodes[nodeID])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There's a bug here. The nodeInfo.Devices is updated, but nodeInfo is the function argument. If the node already exists in m.nodes, this change will not be persisted in the node manager's state because you are modifying the argument, not the value stored in the map m.nodes. The change should be applied to m.nodes[nodeID].Devices.

Suggested change
nodeInfo.Devices = rmDeviceByNodeAnnotation(m.nodes[nodeID])
m.nodes[nodeID].Devices = rmDeviceByNodeAnnotation(m.nodes[nodeID])

}
newDeviceMap := make(map[string][]device.DeviceInfo)
for deviceName, deviceList := range nodeInfo.Devices {
newDeviceList := make([]device.DeviceInfo, 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better performance, it's a good practice to pre-allocate the slice with a known capacity to avoid multiple re-allocations when appending elements in the loop. You can initialize newDeviceList with a capacity equal to the length of deviceList.

Suggested change
newDeviceList := make([]device.DeviceInfo, 0)
newDeviceList := make([]device.DeviceInfo, 0, len(deviceList))

@hami-robot hami-robot bot added size/L and removed size/XXL labels Jan 29, 2026
@ZhengW22
Copy link
Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a feature to disable specific GPUs on a node using annotations. The implementation filters out the disabled GPUs from the scheduler's node cache. While the overall approach is sound, I've found a bug in how disabled GPU UUIDs are processed when there are spaces in the annotation value. I've also included suggestions to improve the test coverage and fix a redundant test case. Lastly, please note that the implementation differs from the PR description, which states that it works by maxing out the GPU's used resources; the code actually removes the device from the scheduler's view.

if vendorWithDisableGPUUUIDMap[vendor] == nil {
vendorWithDisableGPUUUIDMap[vendor] = make(map[string]bool)
}
vendorWithDisableGPUUUIDMap[vendor][disableGPUUUID] = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There is a bug here. You are using disableGPUUUID as the map key, which is the raw value from splitting the annotation string. However, you should be using id, which is the result of strings.TrimSpace(disableGPUUUID). If the annotation value has spaces around the UUID (e.g., "uuid1, uuid2"), this will cause the lookup to fail.

Suggested change
vendorWithDisableGPUUUIDMap[vendor][disableGPUUUID] = true
vendorWithDisableGPUUUIDMap[vendor][id] = true

}
}

func Test_rmDeviceByNodeAnnotation(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current tests don't cover cases where the comma-separated UUID list in the annotation contains spaces (e.g., "uuid1, uuid2"). This could hide a bug where strings.TrimSpace is not being used correctly when populating the map of disabled UUIDs. Please consider adding a test case to cover this scenario to make the tests more robust.

Comment on lines 411 to 421
name: "Test removing metax device, case2",
args: args{
nodeInfo: &device.NodeInfo{
Node: &corev1.Node{ObjectMeta: metav1.ObjectMeta{Annotations: map[string]string{metax.MetaxNoUseUUID: id1}}},
// Devices: []device.DeviceInfo{{DeviceVendor: metax.MetaxSGPUDevice, ID: id1}},
Devices: map[string][]device.DeviceInfo{metax.MetaxGPUDevice: []device.DeviceInfo{{DeviceVendor: metax.MetaxGPUDevice, ID: id1}}},
},
},
want: map[string][]device.DeviceInfo{metax.MetaxGPUDevice: []device.DeviceInfo{}},
},
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This test case, Test removing metax device, case2, is a duplicate of Test removing metax device, case1. Based on the vendorNoUseAnnoKeyMap and the commented-out code, the intention was likely to test the removal of a metax.MetaxSGPUDevice. The current test does not cover this. I've suggested a corrected test case to properly test this scenario.

		{
			name: "Test removing metax sgpu device",
			args: args{
				nodeInfo: &device.NodeInfo{
					Node:    &corev1.Node{ObjectMeta: metav1.ObjectMeta{Annotations: map[string]string{metax.MetaxNoUseUUID: id1}}},
					Devices: map[string][]device.DeviceInfo{metax.MetaxSGPUDevice: {{DeviceVendor: metax.MetaxSGPUDevice, ID: id1}}},
				},
			},
			want: map[string][]device.DeviceInfo{metax.MetaxSGPUDevice: []device.DeviceInfo{}},
		},

@ZhengW22
Copy link
Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a feature to disable specific GPUs on a node using annotations. The implementation filters out the disabled devices from the scheduler's cache when a node is added or updated. This is achieved by introducing a new function rmDeviceByNodeAnnotation and a map vendorNoUseAnnoKeyMap to handle vendor-specific annotations. The changes are accompanied by a comprehensive set of unit tests.

My review has identified a couple of areas for improvement:

  • A potential nil pointer dereference in rmDeviceByNodeAnnotation if it were ever called with a nil nodeInfo.
  • Some commented-out dead code in the new tests that should be cleaned up.

Also, please note that the implementation in the code (removing devices from the scheduler's cache) differs from the approach described in the PR description ("setting the used count... to their maximum capacity"). The implemented approach seems more effective, but it would be good to align the description with the code for future reference.

}

func rmDeviceByNodeAnnotation(nodeInfo *device.NodeInfo) map[string][]device.DeviceInfo {
vendorWithDisableGPUUUIDMap := make(map[string]map[string]bool)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The function rmDeviceByNodeAnnotation does not check if nodeInfo is nil. If a nil nodeInfo is passed, it could cause a panic at nodeInfo.Node. Although the current call site in addNode seems to prevent this, adding a nil check at the beginning of the function is a good practice for robustness and to make the function safer for future use.

	if nodeInfo == nil {
		return nil
	}
	vendorWithDisableGPUUUIDMap := make(map[string]map[string]bool)

Comment on lines 347 to 416
// Devices: []device.DeviceInfo{{DeviceVendor: nvidia.NvidiaGPUDevice, ID: id1}, {DeviceVendor: nvidia.NvidiaGPUDevice, ID: id2}},
Devices: map[string][]device.DeviceInfo{nvidia.NvidiaGPUDevice: []device.DeviceInfo{{DeviceVendor: nvidia.NvidiaGPUDevice, ID: id1}, {DeviceVendor: nvidia.NvidiaGPUDevice, ID: id2}}},
},
},
want: map[string][]device.DeviceInfo{nvidia.NvidiaGPUDevice: []device.DeviceInfo{}},
},
{
name: "Test remove one device",
args: args{
nodeInfo: &device.NodeInfo{
Node: &corev1.Node{ObjectMeta: metav1.ObjectMeta{Annotations: map[string]string{nvidia.GPUNoUseUUID: id1}}},
Devices: map[string][]device.DeviceInfo{nvidia.NvidiaGPUDevice: []device.DeviceInfo{{DeviceVendor: nvidia.NvidiaGPUDevice, ID: id1}}},
},
},
want: map[string][]device.DeviceInfo{nvidia.NvidiaGPUDevice: []device.DeviceInfo{}},
},
{
name: "Test remove two devices",
args: args{
nodeInfo: &device.NodeInfo{
Node: &corev1.Node{ObjectMeta: metav1.ObjectMeta{Annotations: map[string]string{nvidia.GPUNoUseUUID: strings.Join([]string{id1, id2}, ",")}}},
// Devices: []device.DeviceInfo{{DeviceVendor: nvidia.NvidiaGPUDevice, ID: id1}, {DeviceVendor: nvidia.NvidiaGPUDevice, ID: id2}},
Devices: map[string][]device.DeviceInfo{nvidia.NvidiaGPUDevice: []device.DeviceInfo{{DeviceVendor: nvidia.NvidiaGPUDevice, ID: id1}, {DeviceVendor: nvidia.NvidiaGPUDevice, ID: id2}}},
},
},
want: map[string][]device.DeviceInfo{nvidia.NvidiaGPUDevice: []device.DeviceInfo{}},
},
{
name: "Test remove one device and keep one device",
args: args{
nodeInfo: &device.NodeInfo{
Node: &corev1.Node{ObjectMeta: metav1.ObjectMeta{Annotations: map[string]string{nvidia.GPUNoUseUUID: strings.Join([]string{id2}, ",")}}},
// Devices: []device.DeviceInfo{{DeviceVendor: nvidia.NvidiaGPUDevice, ID: id1}, {DeviceVendor: nvidia.NvidiaGPUDevice, ID: id2}},
Devices: map[string][]device.DeviceInfo{nvidia.NvidiaGPUDevice: []device.DeviceInfo{{DeviceVendor: nvidia.NvidiaGPUDevice, ID: id1}, {DeviceVendor: nvidia.NvidiaGPUDevice, ID: id2}}},
},
},
// want: []device.DeviceInfo{{DeviceVendor: nvidia.NvidiaGPUDevice, ID: id1}},
want: map[string][]device.DeviceInfo{nvidia.NvidiaGPUDevice: []device.DeviceInfo{{DeviceVendor: nvidia.NvidiaGPUDevice, ID: id1}}},
},
{
name: "Test no removing device, case1",
args: args{
nodeInfo: &device.NodeInfo{
Node: &corev1.Node{ObjectMeta: metav1.ObjectMeta{Annotations: map[string]string{"test-key": ""}}},
// Devices: []device.DeviceInfo{{DeviceVendor: nvidia.NvidiaGPUDevice, ID: id1}},
Devices: map[string][]device.DeviceInfo{nvidia.NvidiaGPUDevice: []device.DeviceInfo{{DeviceVendor: nvidia.NvidiaGPUDevice, ID: id1}}},
},
},
// want: []device.DeviceInfo{{DeviceVendor: nvidia.NvidiaGPUDevice, ID: id1}},
want: map[string][]device.DeviceInfo{nvidia.NvidiaGPUDevice: []device.DeviceInfo{{DeviceVendor: nvidia.NvidiaGPUDevice, ID: id1}}},
},
{
name: "Test no removing device, case2",
args: args{
nodeInfo: &device.NodeInfo{
Node: &corev1.Node{ObjectMeta: metav1.ObjectMeta{Annotations: map[string]string{nvidia.GPUNoUseUUID: id2}}},
// Devices: []device.DeviceInfo{{DeviceVendor: nvidia.NvidiaGPUDevice, ID: id1}},
Devices: map[string][]device.DeviceInfo{nvidia.NvidiaGPUDevice: []device.DeviceInfo{{DeviceVendor: nvidia.NvidiaGPUDevice, ID: id1}}},
},
},
// want: []device.DeviceInfo{{DeviceVendor: nvidia.NvidiaGPUDevice, ID: id1}},
want: map[string][]device.DeviceInfo{nvidia.NvidiaGPUDevice: []device.DeviceInfo{{DeviceVendor: nvidia.NvidiaGPUDevice, ID: id1}}},
},
{
name: "Test removing metax device, case1",
args: args{
nodeInfo: &device.NodeInfo{
Node: &corev1.Node{ObjectMeta: metav1.ObjectMeta{Annotations: map[string]string{metax.MetaxNoUseUUID: id1}}},
// Devices: []device.DeviceInfo{{DeviceVendor: metax.MetaxGPUDevice, ID: id1}},
Devices: map[string][]device.DeviceInfo{metax.MetaxGPUDevice: []device.DeviceInfo{{DeviceVendor: metax.MetaxGPUDevice, ID: id1}}},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There are several commented-out lines of code within the new test cases (e.g., lines 347, 368, 379, 383, 415). These lines appear to be remnants from previous refactoring. To improve code clarity and maintainability, please remove this dead code.

Signed-off-by: ZhengW22 <s123zz123@gmail.com>
@ZhengW22
Copy link
Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a useful feature for disabling specific GPUs on a node via annotations. The implementation is clean and is supported by a good set of unit tests. The core logic in rmDeviceByNodeAnnotation correctly filters out devices based on the nouse-gpuuuid annotations for various vendors.

However, there is a discrepancy between the implementation and the pull request description. The description states that the feature works by setting the used count of disabled GPUs to their maximum capacity. The actual code removes these GPUs from the scheduler's cached list of devices for the node. While the outcome is similar (the GPU is not schedulable), the mechanism is different. Please update the PR description to accurately reflect the implementation, which will help future developers understand the code.

@ZhengW22
Copy link
Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a feature to disable specific GPUs on a node using annotations. The implementation works by filtering out the disabled devices from the scheduler's cache, preventing them from being allocated. This is a clean and effective approach. The code is well-tested with various scenarios.

I have a couple of suggestions for improvement:

  • The PR description seems to describe a different implementation strategy (marking devices as fully used). It would be beneficial to update it to accurately reflect the current implementation, which removes devices from the cache.
  • I've pointed out a design consideration regarding the new vendorNoUseAnnoKeyMap that could improve modularity and maintainability in the long run.

Overall, this is a solid contribution that adds valuable functionality.

Comment on lines +37 to +44
var vendorNoUseAnnoKeyMap = map[string][]string{
nvidia.GPUNoUseUUID: {nvidia.NvidiaGPUDevice},
cambricon.MLUNoUseUUID: {cambricon.CambriconMLUDevice},
hygon.DCUNoUseUUID: {hygon.HygonDCUDevice},
mthreads.MthreadsNoUseUUID: {mthreads.MthreadsGPUDevice},
metax.MetaxNoUseUUID: {metax.MetaxGPUDevice, metax.MetaxSGPUDevice},
kunlun.KunlunNoUseUUID: {kunlun.KunlunGPUDevice},
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While this map works, it introduces a tight coupling between the scheduler package and all specific device packages (nvidia, cambricon, etc.). Every time a new device type with a 'no-use' annotation is added, this central map will need to be modified.

A more decoupled and maintainable approach would be to have each device implementation provide this information. For example, you could add a method to the device.Devices interface:

// In pkg/device/devices.go
type Devices interface {
    // ...
    GetNoUseAnnotation() (key string, vendors []string)
}

The scheduler could then build this map dynamically at startup by iterating over the registered device implementations. This would make the system more extensible.

args: args{
nodeInfo: &device.NodeInfo{
Node: &corev1.Node{ObjectMeta: metav1.ObjectMeta{Annotations: map[string]string{metax.MetaxNoUseUUID: id1}}},
// Devices: []device.DeviceInfo{{DeviceVendor: metax.MetaxGPUDevice, ID: id1}},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This commented-out line appears to be a remnant from development and can be safely removed to improve code clarity.

@ZhengW22
Copy link
Author

@wawa0210 Hello, I have already fixed all critical problems.

Signed-off-by: ZhengW22 <s123zz123@gmail.com>
@hami-robot
Copy link
Contributor

hami-robot bot commented Jan 30, 2026

Thanks for your pull request. Before we can look at it, you'll need to add a 'DCO signoff' to your commits.

📝 Please follow instructions in the contributing guide to update your commits with the DCO

Full details of the Developer Certificate of Origin can be found at developercertificate.org.

The list of commits missing DCO signoff:

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@ZhengW22
Copy link
Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a useful feature to disable specific GPUs on a node via annotations. The implementation is clean and integrates well with the existing node management logic. The accompanying tests are thorough and cover important edge cases.

I've noticed a small discrepancy between the PR description and the code. The description states that disabled GPUs are marked as fully utilized, but the implementation filters them out from the scheduler's device list entirely. The implemented approach is cleaner and more efficient, so this is great!

I have a couple of suggestions for minor improvements to enhance performance and maintainability.

Comment on lines +109 to +122
newDeviceMap := make(map[string][]device.DeviceInfo)
for deviceName, deviceList := range nodeInfo.Devices {
newDeviceList := make([]device.DeviceInfo, 0, len(deviceList))
for _, d := range deviceList {
if disableGPUUUIDMap, ok := vendorWithDisableGPUUUIDMap[d.DeviceVendor]; ok {
if disabled := disableGPUUUIDMap[d.ID]; disabled {
klog.V(5).Infof("Disable gpu uuid is : %s", d.ID)
continue
}
}
newDeviceList = append(newDeviceList, d)
}
newDeviceMap[deviceName] = newDeviceList
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for filtering devices is correct. However, for better performance and readability, the check for whether a vendor has devices to disable can be hoisted out of the inner loop. Since d.DeviceVendor will be the same as deviceName for all devices in deviceList, you can check for vendorWithDisableGPUUUIDMap[deviceName] once before iterating through the deviceList.

 newDeviceMap := make(map[string][]device.DeviceInfo, len(nodeInfo.Devices))
 for deviceName, deviceList := range nodeInfo.Devices {
  disableGPUUUIDMap, ok := vendorWithDisableGPUUUIDMap[deviceName]
  if !ok {
   newDeviceMap[deviceName] = deviceList
   continue
  }
  newDeviceList := make([]device.DeviceInfo, 0, len(deviceList))
  for _, d := range deviceList {
   if disabled := disableGPUUUIDMap[d.ID]; disabled {
    klog.V(5).Infof("Disable gpu uuid is : %s", d.ID)
    continue
   }
   newDeviceList = append(newDeviceList, d)
  }
  newDeviceMap[deviceName] = newDeviceList
 }

Comment on lines +426 to +428
if got := rmDeviceByNodeAnnotation(tt.args.nodeInfo); !reflect.DeepEqual(got, tt.want) {
t.Errorf("rmDeviceByNodeAnnotation() = %v, want %v", got, tt.want)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency with other tests in this file (e.g., Test_addNode_ListNodes), it's better to use assert.DeepEqual for comparing the expected and actual results. This improves the maintainability and uniformity of the test suite.

   got := rmDeviceByNodeAnnotation(tt.args.nodeInfo)
   assert.DeepEqual(t, tt.want, got)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.