Ensure we can always terminate the parent process on error#4355
Ensure we can always terminate the parent process on error#4355lifubang wants to merge 1 commit intoopencontainers:mainfrom
Conversation
77069ba to
df3a94f
Compare
| if process.Init { | ||
| if err := ignoreTerminateErrors(process.ops.terminate()); err != nil { | ||
| logrus.WithError(err).Warn("unable to terminate initProcess") | ||
| } | ||
| if err := c.cgroupManager.Destroy(); err != nil { | ||
| logrus.WithError(err).Warn("unable to destroy cgroupManager") | ||
| } | ||
| if c.intelRdtManager != nil { | ||
| if err := c.intelRdtManager.Destroy(); err != nil { | ||
| logrus.WithError(err).Warn("unable to destroy intelRdtManager") | ||
| } | ||
| } |
There was a problem hiding this comment.
With the info in the PR and this github diff, it's not obvious why this makes sense. Can you please elaborate why before in many places we were not calling all of this and now we do? For example, the cgroup manager destroy wasn't called in all the code-paths before.
Was it missing? It wasn't needed but it's idempotent so it is fine?
There was a problem hiding this comment.
I think whether these destroy methods are called or not both are OK.
We should not have different results in one function without a specific reason, for example: in container.Run().
There was a problem hiding this comment.
And I think we should not destroy them in before, because runc doesn’t delete the failure container automatically, users must have to use ‘runc delete’ to destroy the failure container created by ‘runc create’ or ‘runc run’. How about remove these destroy methods call in here? I think it has no compatibility problems. WDYT?
There was a problem hiding this comment.
Sorry, not sure I followed. Why do you want to remove these methods?
Also, what is the state the container is left? The delete operation must only work for stopped containers: https://github.com/opencontainers/runtime-spec/blob/main/runtime.md#delete
There was a problem hiding this comment.
Sorry, not sure I followed. Why do you want to remove these methods?
Yes, these destroy methods can't be removed, because we should destroy the cgroup & intelRdt manager manually if we haven't saved container's state yet.
| } | ||
|
|
||
| // terminate is to kill the container's init/exec process when got failure. | ||
| func (c *Container) terminate(process *Process) { |
There was a problem hiding this comment.
Is it simple to add tests? Maybe for this function. We can fake the process interface and make it return an error, and test everything that needs to happen, indeed happens?
This might not add a lot of value now, but it will if we refactor this code in the future.
8dcd42b to
5a06eb6
Compare
I think this is really a bug, but we will hit it with a very very tiny probability, do we think we want this PR in next 1.2.0 release candidate? |
5a06eb6 to
150c32f
Compare
libcontainer/container_linux.go
Outdated
| func (c *Container) Start(process *Process) error { | ||
| func (c *Container) Start(process *Process) (retErr error) { | ||
| c.m.Lock() | ||
| defer c.m.Unlock() | ||
| defer func() { | ||
| if retErr != nil { | ||
| c.terminate(process) | ||
| } | ||
| }() |
There was a problem hiding this comment.
To me, this looks overcomplicated, given that this function only calls c.start.
An alternative would be
func (c *Container) Start(process *Process) error {
c.m.Lock()
defer c.m.Unlock()
if err := c.start(process); err != nil {
c.terminate(process)
return err
}
}
libcontainer/container_linux.go
Outdated
| c.terminate(process) | ||
| } | ||
| }() | ||
|
|
There was a problem hiding this comment.
Similar to 150c32f#r1866835966, there's no need for a defer here. Something like this would work:
if !process.Init {
return nil
}
err := c.exec()
if err != nil {
c.terminate(process)
}
return errAs we all know, we should terminate the parent process if there is an error when starting the container process, but these terminate function are called in many places, for example: `initProcess`, `setnsProcess`, and `Container`, if we forget this terminate call for some errors, it will let the container in unknown state, so we should change to call it in some final places. Signed-off-by: lifubang <lifubang@acmcoder.com>
150c32f to
8ff0b71
Compare
As we all know, we should terminate the parent process if there is an error when starting the container process,
but these terminate function are called in many places, for example:
initProcess,setnsProcess, andContainer,if we forget this terminate call for some errors, it will let the container in unknown state, so we should change to
call it in some final places.
One possible place that missing terminate action:
https://github.com/opencontainers/runc/blob/v1.2.0-rc.2/libcontainer/container_linux.go#L357-L360