Skip to content

fix: daemon: state of stopped container visible to other queries when container is stopped #50136

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 13, 2025

Conversation

mabrarov
Copy link
Contributor

@mabrarov mabrarov commented Jun 3, 2025

daemon: ensuring state of stopped container is visible to other queries when container is stopped using /containers/{id}/stop API and before API response is sent back to the client. Fixes #50133.

- What I did

Made state of stopped container visible to other queries when container is stopped using /containers/{id}/stop Docker Engine API and before response is sent back to the API client.

- How I did it

Waiting until state of stopped container is committed to daemon database (only when container is stopped successfully), i.e. until container stop handler of container monitor completes, before replying to client of /containers/{id}/stop Docker Engine API.

- How to verify it

Refer to "Reproduce" section of #50133.

- Human readable description for the release notes

Ensure that the state of the container in the daemon database (used by [/containers/json](https://docs.docker.com/reference/api/engine/version/v1.49/#tag/Container/operation/ContainerList) API) is up to date when the container is stopped using the [/containers/{id}/stop](https://docs.docker.com/reference/api/engine/version/v1.49/#tag/Container/operation/ContainerStop) API (before response of API).

daemon/stop.go Outdated
Comment on lines 99 to 101
if err := ctr.CommitInMemory(daemon.containersReplica); err != nil {
log.G(ctx).WithError(err).Warn("could not make state of container visible ot other queries")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CommitToMemory function was added specifically for health-checks, as the healthcheck status is ephemeral, and doesn't have to be persisted to disk;

The other function to persist container state, both to disk and in memory is container.CheckpointTo

moby/container/container.go

Lines 223 to 236 in ee69d8a

// CheckpointTo makes the Container's current state visible to queries, and persists state.
// Callers must hold a Container lock.
func (container *Container) CheckpointTo(ctx context.Context, store *ViewDB) error {
ctx, span := otel.Tracer("").Start(ctx, "container.CheckpointTo", trace.WithAttributes(
attribute.String("container.ID", container.ID),
attribute.String("container.Name", container.Name)))
defer span.End()
deepCopy, err := container.toDisk()
if err != nil {
return err
}
return store.Save(deepCopy)
}

I see that's called in various places, such as on container start, pause/unpause, update, and some operations such as connecting and disconnecting from networks (for pause and unpause, I'm actually wondering if that's state to be persisted on disk, but that's a separate discussion).

For container stop it looks like handling ia a bit more involved; container.CheckpointTo is called in daemon.handleContainerExit (which is called in daemon.killWithSignal and daemon.ProcessEvent), and called in daemon.ProcessEvent itself;

func (daemon *Daemon) handleContainerExit(c *container.Container, e *libcontainerdtypes.EventInfo) error {

Within daemon.ProcessEvent, that code is handled conditionally, based on whether the event we received from containerd was for the container's main process (not for exec processes inside the container);

moby/daemon/monitor.go

Lines 190 to 193 in ee69d8a

case libcontainerdtypes.EventExit:
if ei.ProcessID == ei.ContainerID {
return daemon.handleContainerExit(c, &ei)
}

I wonder (but would have to look at traces, events or logs) if there's anything in the events processing that could explain why none of those paths were called by the moment the function returns, or if there's possibly some concurrency where multiple calls to container.CheckpointTo are racing (possibly overwriting state with the wrong state?)

Copy link
Contributor Author

@mabrarov mabrarov Jun 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, current state of this PR is quick & dirty.

I'm curious if there is a chance that daemon.Daemon.ContainerStop completes (and respective API endpoint replies. so that subsequent request is sent by the client) before call of (before return from) container.CheckpointTo in daemon.Daemon.handleContainerExit - refer to

c.SetStopped(&ctrExitStatus)

and

checkpointErr := c.CheckpointTo(context.TODO(), daemon.containersReplica)

where the first call includes:

s.notifyAndClear(&s.stopWaiters)

which should trigger:

if status := <-ctr.Wait(subCtx, container.WaitConditionNotRunning); status.Err() == nil {

The lock acquired at daemon/stop.go:97 just helps to enforce ordering b/w daemon.Daemon.handleContainerExit (which holds lock of the same mutex from beginning of method) and daemon.Daemon.containerStop.

Copy link
Contributor Author

@mabrarov mabrarov Jun 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... I can remove daemon/stop.go:99-101 and check if it still works to test my hypothesis (note that fix for race condition issue can not be 100% proved that way 😄)

Copy link
Contributor Author

@mabrarov mabrarov Jun 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed CommitToMemory from this PR because it seems to be not needed (refer to previous comments).

Here are results of test ("Reproduce" section from description of #50133) on master branch.

Debugger console output (details about breakpoints where these messages are logged into debugger console are provided at the end of this comment):

Before Container.SetStop in Daemon.handleContainerExit
After container stopped in Daemon.containerStop
Start of containerRouter.getContainersJSON
After Daemon.containersReplica in Daemon.Containers
After Container.CheckpointTo in Daemon.handleContainerExit
Start of containerRouter.getContainersJSON
After Daemon.containersReplica in Daemon.Containers

Test output (test failed):

*   Trying /var/run/docker.sock:0...
* Connected to localhost (/var/run/docker.sock) port 80 (#0)
> POST /containers/3bd6f9fa938bbac06b546d6a7e8513736a0a1be66ef7a481d71d24f9a5f9075a/stop HTTP/1.1
> Host: localhost
> User-Agent: curl/7.88.1
> Accept: */*
> 
< HTTP/1.1 204 No Content
< Api-Version: 1.50
< Docker-Experimental: false
< Ostype: linux
< Server: Docker/dev (linux)
< Date: Tue, 03 Jun 2025 16:58:27 GMT
< 
* Connection #0 to host localhost left intact
* Found bundle for host: 0x5636e4b76440 [serially]
* Can not multiplex, even if we wanted to
* Re-using existing connection #0 with host localhost
> GET /containers/json?all=1&filters=%7B%22name%22%3A%7B%22%5Emy-nginx%24%22%3Atrue%7D%7D HTTP/1.1
> Host: localhost
> User-Agent: curl/7.88.1
> Accept: */*
> 
< HTTP/1.1 200 OK
< Api-Version: 1.50
< Content-Type: application/json
< Docker-Experimental: false
< Ostype: linux
< Server: Docker/dev (linux)
< Date: Tue, 03 Jun 2025 16:58:27 GMT
< Content-Length: 954
< 
[{"Id":"3bd6f9fa938bbac06b546d6a7e8513736a0a1be66ef7a481d71d24f9a5f9075a","Names":["/my-nginx"],"Image":"nginx:alpine","ImageID":"sha256:6769dc3a703c719c1d2756bda113659be28ae16cf0da58dd5fd823d6b9a050ea","Command":"/docker-entrypoint.sh nginx -g 'daemon off;'","Created":1748969897,"Ports":[{"PrivatePort":80,"Type":"tcp"}],"Labels":{"maintainer":"NGINX Docker Maintainers <[email protected]>"},"State":"running","Status":"Up 10 seconds","HostConfig":{"NetworkMode":"bridge"},"NetworkSettings":{"Networks":{"bridge":{"IPAMConfig":null,"Links":null,"Aliases":null,"MacAddress":"72:d0:71:04:d1:e9","DriverOpts":null,"GwPriority":0,"NetworkID":"b563c5c9b89c462cf82a852a7075743a80e491b15d5c6fc270cd73cd9ef97245","EndpointID":"391be0e2c6ef0fdaaa60af41cfb13937f66c8700c7d10e19313162b1b9639864","Gateway":"172.18.0.1","IPAddress":"172.18.0.2","IPPrefixLen":16,"IPv6Gateway":"","GlobalIPv6Address":"","GlobalIPv6PrefixLen":0,"DNSNames":null}}},"Mounts":[]}]
* Connection #0 to host localhost left intact
CONTAINER ID   IMAGE          COMMAND                  CREATED          STATUS                              PORTS     NAMES
3bd6f9fa938b   nginx:alpine   "/docker-entrypoint.…"   10 seconds ago   Exited (0) Less than a second ago             my-nginx
3bd6f9fa938bbac06b546d6a7e8513736a0a1be66ef7a481d71d24f9a5f9075a

Here are results of test ("Reproduce" section from description of #50133) on source branch of this pull request.

Debugger console output (details about breakpoints where these messages are logged into debugger console are provided at the end of this comment):

Before Container.SetStop in Daemon.handleContainerExit
After Container.CheckpointTo in Daemon.handleContainerExit
After container stopped in Daemon.containerStop
Start of containerRouter.getContainersJSON
After Daemon.containersReplica in Daemon.Containers
Start of containerRouter.getContainersJSON
After Daemon.containersReplica in Daemon.Containers

Test output (test succeeded):

*   Trying /var/run/docker.sock:0...
* Connected to localhost (/var/run/docker.sock) port 80 (#0)
> POST /containers/d18b4b0aff9fe6b577a34050492782f41267ad1d36cf78e96ba7b197842d5968/stop HTTP/1.1
> Host: localhost
> User-Agent: curl/7.88.1
> Accept: */*
> 
< HTTP/1.1 204 No Content
< Api-Version: 1.50
< Docker-Experimental: false
< Ostype: linux
< Server: Docker/dev (linux)
< Date: Tue, 03 Jun 2025 17:01:52 GMT
< 
* Connection #0 to host localhost left intact
* Found bundle for host: 0x5621f5ac4440 [serially]
* Can not multiplex, even if we wanted to
* Re-using existing connection #0 with host localhost
> GET /containers/json?all=1&filters=%7B%22name%22%3A%7B%22%5Emy-nginx%24%22%3Atrue%7D%7D HTTP/1.1
> Host: localhost
> User-Agent: curl/7.88.1
> Accept: */*
> 
< HTTP/1.1 200 OK
< Api-Version: 1.50
< Content-Type: application/json
< Docker-Experimental: false
< Ostype: linux
< Server: Docker/dev (linux)
< Date: Tue, 03 Jun 2025 17:01:52 GMT
< Content-Length: 840
< 
[{"Id":"d18b4b0aff9fe6b577a34050492782f41267ad1d36cf78e96ba7b197842d5968","Names":["/my-nginx"],"Image":"nginx:alpine","ImageID":"sha256:6769dc3a703c719c1d2756bda113659be28ae16cf0da58dd5fd823d6b9a050ea","Command":"/docker-entrypoint.sh nginx -g 'daemon off;'","Created":1748970102,"Ports":[],"Labels":{"maintainer":"NGINX Docker Maintainers <[email protected]>"},"State":"exited","Status":"Exited (0) Less than a second ago","HostConfig":{"NetworkMode":"bridge"},"NetworkSettings":{"Networks":{"bridge":{"IPAMConfig":null,"Links":null,"Aliases":null,"MacAddress":"","DriverOpts":null,"GwPriority":0,"NetworkID":"e6072ab17d3e68d3b47d9c6935a69b0133e47c9fb61f2864646c899de1ba29d8","EndpointID":"","Gateway":"","IPAddress":"","IPPrefixLen":0,"IPv6Gateway":"","GlobalIPv6Address":"","GlobalIPv6PrefixLen":0,"DNSNames":null}}},"Mounts":[]}]
* Connection #0 to host localhost left intact
CONTAINER ID   IMAGE          COMMAND                  CREATED          STATUS                              PORTS     NAMES
d18b4b0aff9f   nginx:alpine   "/docker-entrypoint.…"   10 seconds ago   Exited (0) Less than a second ago             my-nginx
d18b4b0aff9fe6b577a34050492782f41267ad1d36cf78e96ba7b197842d5968

Breakpoints (without suspend, just output to debugger console):

Tested branch Location of breakpoint Message written to debugger console
master daemon/monitor.go:120 Before Container.SetStop in Daemon.handleContainerExit
master daemon/monitor.go:130 After Container.CheckpointTo in Daemon.handleContainerExit
master daemon/stop.go:97 After container stopped in Daemon.containerStop
master api/server/router/container/container_routes.go:77 Start of containerRouter.getContainersJSON
master daemon/list.go:111 After Daemon.containersReplica in Daemon.Containers
50133-container_stop_state_sync daemon/monitor.go:120 Before Container.SetStop in Daemon.handleContainerExit
50133-container_stop_state_sync daemon/monitor.go:130 After Container.CheckpointTo in Daemon.handleContainerExit
50133-container_stop_state_sync daemon/stop.go:100 After container stopped in Daemon.containerStop
50133-container_stop_state_sync api/server/router/container/container_routes.go:77 Start of containerRouter.getContainersJSON
50133-container_stop_state_sync daemon/list.go:111 After Daemon.containersReplica in Daemon.Containers

I understand that this kind of test is far from the best one, but it is the one I can do quickly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, feel free to use this PR for experimentation and to push commits for testing.

I know there's historically been quite some changes around these things; unfortunately some handling has become more complex than desirable (not surprising with 10+ Years of development); possibly some bits should be rewritten from scratch, but there's many corner cases that may have found their way into the existing code, so changing is risky.

Some of the refactoring resulted in the "in memory" store, which was added to prevent some gnarly situations with locking (which could cause, e.g., "docker ps" to get blocked by "docker info" calls), so we need to take care with such cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(marked as unresolved to keep the discussion visible)

@mabrarov mabrarov force-pushed the 50133-container_stop_state_sync branch from bdc5cac to b71bac4 Compare June 3, 2025 17:08
@mabrarov mabrarov requested a review from thaJeztah June 3, 2025 17:25
@mabrarov mabrarov force-pushed the 50133-container_stop_state_sync branch 2 times, most recently from 7c6fcac to e7ba4cb Compare June 4, 2025 10:41
@mabrarov mabrarov marked this pull request as ready for review June 4, 2025 11:16
@mabrarov mabrarov force-pushed the 50133-container_stop_state_sync branch from e7ba4cb to 3d44846 Compare June 4, 2025 23:05
@mabrarov
Copy link
Contributor Author

mabrarov commented Jun 7, 2025

Hi @thaJeztah,

Could you please take a look at this PR? Feel free to check #50136 (comment) for details about the root cause of issue which this PR fixes (#50133).

I'm not sure why some CI tests fail. If it's possible I appreciate if someone could trigger them one more time. On my local environment (I use development container) some unit and integration tests fail even for the master branch (and the results are different from time to time), so I can not rely on them for regression testing.

Thank you.

@mabrarov mabrarov force-pushed the 50133-container_stop_state_sync branch from 3d44846 to ed75954 Compare June 7, 2025 15:53
Comment on lines 95 to +98
if status := <-ctr.Wait(subCtx, containertypes.WaitConditionNotRunning); status.Err() == nil {
// Ensure container status changes are committed by handler of container exit before returning control to the caller
ctr.Lock()
defer ctr.Unlock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a quick blurb - I started to dig into this PR again to check code-paths, but didn't finish, so let me jot down some thoughts I had while looking;

I started to wonder if there's a bug in ctr.Wait, or at least it looks odd that we have a function that's returning a channel (with likely the purpose to handle synchronisation), but that contract to be "broken" (i.e., it returns before it's fully done).

Perhaps it's "ok" to use the ctr.Lock() -> Unlock to work around that issue as a short-term solution; mostly wondering if we're still tapering over the underlying issue, and if this is something ideally fixed in Wait (or elsewhere).

Copy link
Contributor Author

@mabrarov mabrarov Jun 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @thaJeztah,

I was thinking about the same, but I didn't find a better solution for container.State.Wait.

Ideally, all code which wants to access (e.g. read) state of container should use ctr.Lock() + ctr.Unlock(). It's pretty clear requirement which is already met (even before this PR). daemon.Daemon.containerStop method (within its current implementation) doesn't need to access state of container directly - all it needs is to wait for container to be stopped and this requirement is in place even before this PR (please note that with this PR we make /containers/{id}/stop Docker Engine API slower by introducing additional synchronization - that's the reason I'm worried about regression).

IMHO, the issue is located not in container.State.Wait, but in daemon.Daemon.Containers method and in daemon.Daemon.containersReplica field ("read-only view", which is used by daemon.Daemon.Containers method). I understand the need of daemon.Daemon.containersReplica field, but it looks like this read-only view introduced some cases which lack synchronization with daemon.Daemon.containers field. #50133 is just one of these cases, other can be in /containers/{id}/kill and /containers/{id}/wait API (just replace /containers/{id}/stop with these APIs in #50133). Please take a look at testcontainers/testcontainers-go#3197 which uses /containers/{id}/json API (which reads state of container from daemon.Daemon.containers and not from daemon.Daemon.containersReplica) - this workaround demonstrates that there is no issue (with container.State.Wait) when using daemon.Daemon.containers field with ctr.Lock() + ctr.Unlock().

Maybe we should realize that all Docker Engine API which uses daemon.Daemon.containersReplica field can get outdated (delayed) state of containers and highlight that in API documentation (e.g. split read-only API from read-write API, then describe synchronization b/w API calls). Otherwise, even without concurrent access to Docker Engine API we leave users of this API without understanding of:

  1. Synchronization points (i.e. where slowness / lock can happen).
  2. Visibility of state changes.

Alternative solution (the way this PR goes, but only for /containers/{id}/stop API endpoint) is to ensure that daemon.Daemon.containersReplica is updated before completion of any API call which modifies state of container(s).

Thank you.

…es when container is stopped and before API response is sent (fix for moby#50133).

Signed-off-by: Marat Abrarov <[email protected]>
@mabrarov mabrarov force-pushed the 50133-container_stop_state_sync branch from ed75954 to 8e6cd44 Compare June 11, 2025 15:08
Copy link
Contributor

@vvoland vvoland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good to have a fix for that but... we should really take a closer look at that because this is horrifying 🙈 😅

Thanks for digging and investigating that!

@thaJeztah thaJeztah added this to the 28.3.0 milestone Jun 13, 2025
Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

we discussed this one in the maintainers call Yesterday; looks like the PR for now should be ok to bring in, but we should probably create a tracking ticket to dig deeper into the flow(s) and see where improvements / fixes can be applied.

@thaJeztah thaJeztah merged commit 5dc1ccd into moby:master Jun 13, 2025
288 of 306 checks passed
Comment on lines +128 to +130
// Ensure container status changes are committed by handler of container exit before returning control to the caller
ctr.Lock()
defer ctr.Unlock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this caused a regression; see docker/compose#12950 (comment)

The ctr.Lock() makes this function wait for the container to die, but because it produces the stop event in a defer, the stop event now gets produced AFTER the die event;

moby/daemon/stop.go

Lines 74 to 78 in 8e6cd44

defer func() {
if retErr == nil {
daemon.LogContainerEvent(ctr, events.ActionStop)
}
}()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the defer was introduced by me in 952902e - but that change doesn't look critical; mostly cleanup; we can probably inline it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

/containers/{id}/stop doesn't commit state of stopped container
3 participants