Try to join the cgroup of the init process of the parent container when apply_cgroup for a tenant container fails due to a "Device or resource busy" error by logica0419 · Pull Request #3347 · youki-dev/youki

logica0419 · 2026-01-03T14:03:30Z

Description

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Refactoring (no functional changes)
Performance improvement
Test updates
CI/CD related changes
Other (please describe):

Testing

Added new unit tests
Added new integration tests
Ran existing test suite
Tested manually (please provide steps)
- Followed the Steps to Reproduce of [Bug]: Docker(moby) + youki cannot launch Dev Container with DinD #3342 and got the expected result

Related Issues

Fixes #3342

Additional Context

tommady · 2026-01-03T21:18:14Z

Thanks for opening this PR — I ran into the same issue while working on
#3210 and also needed a retry to re-join the cgroup for exec.

This problem is not about the init process, but about exec processes under cgroup v2 when domain controllers are enabled. Once a controller is turned on, the container’s configured cgroup may no longer be joinable (kernel returns EBUSY / EPERM), and exec is expected to fall back to joining the init process’s cgroup.

This behavior is explicitly documented by runc:

Note for cgroup v2: in case the process can’t join the top level cgroup, runc exec fallback is to try joining the cgroup of container’s init.
https://github.com/opencontainers/runc/blob/main/man/runc-exec.8.md

Importantly, this fallback is exec-only:

init process cgroup placement must still fail hard
only exec processes may retry using the init process’s leaf cgroup

Because this is policy, not cgroup mechanism, runc implements it in the container execution path, not inside the cgroup manager itself. This avoids:

accidentally applying fallback to init
duplicating logic across systemd vs cgroupfs managers
diverging behavior depending on the cgroup backend

For youki, the correct place to implement, I think, is here:

// crates/libcontainer/src/process/container_intermediate_process.rs
fn apply_cgroups<
    C: CgroupManager<Error = E> + ?Sized,
    E: std::error::Error + Send + Sync + 'static,
>(
    cmanager: &C,
    resources: Option<&LinuxResources>,
    init: bool,
) -> Result<()> { ... }

where we know:

whether the process is init or exec
the init PID
and can enforce exec-only fallback semantics

Handling this inside libcgroups (or only for systemd) is insufficient and environment-dependent. The expected behavior should be:

init process: no fallback, fail on cgroup join error
exec process + cgroup v2 + EBUSY/EPERM: retry by joining init’s cgroup
all other errors: fail as before

Without implementing this retry at the libcontainer level (as runc does), exec under cgroup v2 with domain controllers enabled will continue to fail for cgroupfs users.

WDYT? Thanks again.

utam0k · 2026-01-03T22:16:10Z

crates/libcgroups/src/systemd/manager.rs

+    /// The init process PID of the parent container if the container is created as a tenant.
+    parent_init_pid: Option<Pid>,


ContainerType should have parent_init_pid.

utam0k · 2026-01-03T22:25:20Z

crates/libcgroups/src/systemd/manager.rs

+                Err(e) => {
+                    // If adding the process to the cgroup fails due to a "Device or resource busy" error,
+                    // manager tries to join the cgroup of the init process of the tenant container.
+                    if e.to_string().contains("Device or resource busy")


How about getting the error(EBUSY) from the debug client instead of parsing the error message?

I really wanted to, but I couldn't achieve that just by putting the following code here.

impl From<nix::Error> for SystemdClientError { fn from(err: nix::Error) -> SystemdClientError { match err { nix::Error::EBUSY => DbusError::DeviceOrResourceBusy(err.to_string()).into(), _ => DbusError::ConnectionError(err.to_string()).into(), } } }

Seems like socket::sendmsg in dbus_native::DbusConnection::send_message() doesn't emit nix::error::EBUSY. Rather, it puts out an error message with no error in Result.

Could you give me some advice on what I should do here?

To clarify, the goal of this PR’s initial implementation is to serve as a conceptual demo, providing a starting point for discussing a more suitable implementation.

Since you mentioned it, I'll stop focusing on the detailed code of this PR for now. I'll review the detailed code once we've clarified and implemented the non-demo aspects.

utam0k · 2026-01-03T22:31:29Z

crates/libcgroups/src/common.rs

+// is empty string ("") and the value is the cgroup path the <pid> is in.
+//
+// ref: https://github.com/opencontainers/cgroups/blob/main/utils.go#L171-L219
+pub fn parse_proc_cgroup_file(path: &str) -> Result<HashMap<String, String>, ParseProcCgroupError> {


Could we use the procfs crate? Be careful: if it reads inside the container, please use ProcfsHandle for safety.

logica0419 · 2026-01-06T08:33:47Z

@utam0k @tommady
Thanks for the quick feedback! I didn’t expect comments to come in so fast 😅
I was planning to write the explanation today (I was pretty exhausted last night), so this was a nice surprise.

To clarify, the goal of this PR’s initial implementation is to serve as a conceptual demo, providing a starting point for discussing a more suitable implementation.

I’m still not very familiar with youki, or even Rust itself.
Please feel free to point out any issues, including basic ones or anything related to “Rust-ish” coding style.

Also, this may be out of context, but I want to clarify the wording here. I made a table of the wording I imagine.

Perspective	Container A	Container B	A's init process	B's init process
Container A	self (InitContainer)	child	init_process	child_init_process
Container B	parent	self (TenantContainer)	parent_init_process	init_process
tommady's comment	-	-	init process	exec process
runc	initProcess (containerProcess)	setnsProcess (containerProcess)	linuxStandardInit	linuxSetnsInit

What confused me here is that the word init process used in Container B's context can mean Container A's init process or B's init process. That's why I used the name parent_init_process for Container A's init process in the implementation.

FYI: in runc, Container A's init process is called initProcessPid even in the context of Container B.
https://github.com/opencontainers/runc/blob/main/libcontainer/process_linux.go#L175

WDYT about this? Should I use the name init process as runc does?

logica0419 · 2026-01-06T08:42:23Z

@tommady
Thank you too for finding this PR! I'm happy that I can help you solve the issue.
And, thanks again for the precise explanation of what's happening. I managed to get an abstract understanding, but your explanation helped me strengthen it so much.

For youki, the correct place to implement, I think, is here:

I strongly agree with that. I'll re-implement the logic there.
Thank you so much for the advice.

Signed-off-by: Takuto Nagami <logica0419@gmail.com>

…en add_process_to_unit fails Signed-off-by: Takuto Nagami <logica0419@gmail.com>

logica0419 · 2026-01-06T08:48:48Z

Just to clarify, since I've forgotten to put sign-offs on the previous commits and I've pushed a complete re-implementation now, I force-pushed the branch.

Signed-off-by: Takuto Nagami <logica0419@gmail.com>

utam0k · 2026-01-06T10:35:36Z

Please set it to "ready for review" when you are ready to review the detailed codes after the discussion.

tommady · 2026-01-06T11:04:24Z

@utam0k @tommady Thanks for the quick feedback! I didn’t expect comments to come in so fast 😅 I was planning to write the explanation today (I was pretty exhausted last night), so this was a nice surprise.

To clarify, the goal of this PR’s initial implementation is to serve as a conceptual demo, providing a starting point for discussing a more suitable implementation.

I’m still not very familiar with youki, or even Rust itself. Please feel free to point out any issues, including basic ones or anything related to “Rust-ish” coding style.

Also, this may be out of context, but I want to clarify the wording here. I made a table of the wording I imagine.
Perspective Container A Container B A's init process B's init process Container A self (InitContainer) child init_process child_init_process Container B parent self (TenantContainer) parent_init_process init_process tommady's comment - - init process exec process runc initProcess (containerProcess) setnsProcess (containerProcess) linuxStandardInit linuxSetnsInit What confused me here is that the word `init process` used in Container B's context **can** mean Container A's init process or B's init process. That's why I used the name `parent_init_process` for Container A's init process in the implementation.

FYI: in runc, Container A's init process is called initProcessPid even in the context of Container B.
https://github.com/opencontainers/runc/blob/main/libcontainer/process_linux.go#L175

WDYT about this? Should I use the name init process as runc does?

Thanks for the table — that actually helped me realize part of the confusion is on my side too 😅 I think I’ve been a bit sloppy with naming.

Referring to your table, when I said “init process” I meant Container B’s init process (the TenantContainer being exec’d into), not Container A’s init. In your terms, this is the exec case for Container B: if joining the configured cgroup fails under cgroup v2, exec should fall back to B’s init process cgroup, not the parent’s.

Sorry about the naming confusion 🤪 that’s on me. I’d really appreciate hearing others’ opinions on whether using runc-style naming.

utam0k · 2026-01-06T11:09:34Z

This isn't a separate “Container B”; it's an exec/tenant process joining the existing container's cgroup. So calling it “parent” is confusing. How about landlord_init_pid (landlord = parent init in your context)?

tommady added the kind/bug label Jan 3, 2026

utam0k requested a review from tommady January 3, 2026 21:43

utam0k requested changes Jan 3, 2026

View reviewed changes

logica0419 mentioned this pull request Jan 4, 2026

[Bug]: Docker(moby) + youki cannot launch Dev Container with DinD #3342

Open

logica0419 added 2 commits January 6, 2026 17:43

add parent_init_pid field in ContainerType

21d48d5

Signed-off-by: Takuto Nagami <logica0419@gmail.com>

try to join the cgroup of the init process of the parent container wh…

e1e62b0

…en add_process_to_unit fails Signed-off-by: Takuto Nagami <logica0419@gmail.com>

logica0419 force-pushed the retyry-systemd-cgroup-EBUSY branch from 2b24f7f to e1e62b0 Compare January 6, 2026 08:44

fix lint

eb63bdb

Signed-off-by: Takuto Nagami <logica0419@gmail.com>

utam0k marked this pull request as draft January 6, 2026 10:34

tommady mentioned this pull request Jan 6, 2026

fix(3207, 3209) Difference between the exec command in runc and youki #3210

Open

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Try to join the cgroup of the init process of the parent container when apply_cgroup for a tenant container fails due to a "Device or resource busy" error#3347

Try to join the cgroup of the init process of the parent container when apply_cgroup for a tenant container fails due to a "Device or resource busy" error#3347
logica0419 wants to merge 3 commits intoyouki-dev:mainfrom
logica0419:retyry-systemd-cgroup-EBUSY

logica0419 commented Jan 3, 2026

Uh oh!

tommady commented Jan 3, 2026 •

edited

Loading

Uh oh!

utam0k Jan 3, 2026

Uh oh!

utam0k Jan 3, 2026

Uh oh!

logica0419 Jan 6, 2026

Uh oh!

utam0k Jan 6, 2026

Uh oh!

utam0k Jan 3, 2026

Uh oh!

logica0419 commented Jan 6, 2026 •

edited

Loading

Uh oh!

logica0419 commented Jan 6, 2026

Uh oh!

logica0419 commented Jan 6, 2026

Uh oh!

utam0k commented Jan 6, 2026

Uh oh!

tommady commented Jan 6, 2026

Uh oh!

utam0k commented Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

		/// The init process PID of the parent container if the container is created as a tenant.
		parent_init_pid: Option<Pid>,

Conversation

logica0419 commented Jan 3, 2026

Description

Type of Change

Testing

Related Issues

Additional Context

Uh oh!

tommady commented Jan 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

utam0k Jan 3, 2026

Choose a reason for hiding this comment

Uh oh!

utam0k Jan 3, 2026

Choose a reason for hiding this comment

Uh oh!

logica0419 Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

utam0k Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

utam0k Jan 3, 2026

Choose a reason for hiding this comment

Uh oh!

logica0419 commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

logica0419 commented Jan 6, 2026

Uh oh!

logica0419 commented Jan 6, 2026

Uh oh!

utam0k commented Jan 6, 2026

Uh oh!

tommady commented Jan 6, 2026

Uh oh!

utam0k commented Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

tommady commented Jan 3, 2026 •

edited

Loading

logica0419 commented Jan 6, 2026 •

edited

Loading