Description
- This is a bug reportThis is a feature requestI searched existing issues before opening this one
We're seeing two bad behaviors. For some reason dockerd is failing (crashing?) when first installed. Second when dockerd crashes it is unable to restart due to the containerd task "dockerd" still running.
Expected behavior
apt-get install docker-ce
version 2:18.09.0ce0.4.tp4-0~debian installed
docker ps -aq
nothing
systemctl stop docker.service
success
systemctl is-active docker.service
inactive
docker info
fails
systemctl start docker.service
systemctl is-active docker.service
active
Actual behavior
docker ps -aq
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
systemctl is-active docker.service
failed
docker info
still works!
systemctl stop docker.service
systemctl start docker.service
systemctl is-active docker.service
activating (NOT activated, the daemon process doesn't exist yet)
/usr/bin/dockerd
container dockerd already has a running process
ctr -n docker tasks list
TASK PID STATUS
dockerd NNNNN RUNNING
ctr -n docker tasks kill dockerd
ctr -n docker tasks list
TASK PID STATUS
dockerd NNNNN STOPPED
systemctl is-active docker.service
activating
ctr -n docker tasks delete dockerd
systemctl is-active docker.service
active // The daemon successfully restarted once containerd was unblocked.
Steps to reproduce the behavior
On a clean vm install the latest docker-ce version
immediately try to use docker (in our case docker ps).
The socket is bad so we attempt to restart the daemon.
We can manually reproduce the problem killing the dockerd daemon with SIGKILL.
kill -9 <PID of /usr/bin/dockerd>
Output of docker version
:
Client:
Version: 18.09.0-ce-tp4
API version: 1.39
Go version: go1.10.3
Git commit: 33764aa
Built: Fri Aug 24 23:19:58 2018
OS/Arch: linux/amd64
Experimental: false
Server:
Engine:
Version: 18.09.0-ce-tp4
API version: 1.39 (minimum version 1.12)
Go version: go1.10.3
Git commit: 33764aa
Built:
OS/Arch: linux/amd64
Experimental: false
Output of docker info
:
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 0
Server Version: 18.09.0-ce-tp4
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc containerd
Default Runtime: containerd
Init Binary: docker-init
containerd version: 6f13ff3ea48a6bc2fb9b47c0acce24cf274dafd9 (expected: 468a545b9edcd5932818eb9de8e72413e616e86e)
runc version: 459bfaec1fc6c17d8bfb12d0a0f69e7e7271ed2a (expected: 69663f0bd4b60df09991c08812a60108003fa340)
init version: fec3683
Kernel Version: 4.9.0-8-amd64
Operating System: Debian GNU/Linux 9 (stretch)
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 3.617GiB
Name: docker-roundtrip-test-8e22218e2cfd48f1
ID: CTS2:JUUA:WELS:4TIL:HPJ3:4P2B:JVL5:SYCD:PS2I:DOJO:XHBA:MBXV
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: No swap limit support
Additional environment details (AWS, VirtualBox, physical, etc.)
systemctl status docker.service
● docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
Active: activating (auto-restart) (Result: exit-code) since Tue 2018-08-28 18:10:06 UTC; 78ms ago
Docs: https://docs.docker.com
Process: 17679 ExecStart=/usr/bin/dockerd (code=exited, status=1/FAILURE)
Process: 17673 ExecStartPre=/usr/libexec/containerd-offline-installer /var/lib/containerd-offline-installer/containerd-shim-process.tar docker.io/docker/containerd-shim-process (code=exited, status=0/SUCCESS)
Main PID: 17679 (code=exited, status=1/FAILURE)
CPU: 109ms
Aug 28 18:10:06 docker-roundtrip-test-8e22218e2cfd48f1 systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Aug 28 18:10:06 docker-roundtrip-test-8e22218e2cfd48f1 systemd[1]: docker.service: Unit entered failed state.
Aug 28 18:10:06 docker-roundtrip-test-8e22218e2cfd48f1 systemd[1]: docker.service: Failed with result 'exit-code'.
systemctl status containerd.service
● containerd.service - containerd container runtime
Loaded: loaded (/lib/systemd/system/containerd.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2018-08-28 00:50:47 UTC; 17h ago
Docs: https://containerd.io
Main PID: 25324 (containerd)
Tasks: 20 (limit: 4915)
Memory: 170.6M
CPU: 17min 36.789s
CGroup: /system.slice/containerd.service
├─25324 /usr/bin/containerd
└─26372 /opt/containerd/bin/containerd-shim-process-v1 -namespace docker -address /run/containerd/containerd.sock -publish-binary /usr/bin/containerd
Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable
ctr -n docker tasks list
TASK PID STATUS
dockerd 26382 RUNNING
I've attached the whole out of the commands to we when we encountered the problem. Much of the file is just noise. However You can can see that docker and containerd were not previously installed and that immediately after install docker commands could not find the socket.
If we manually recover the VM it works fine thereafter (e.g. we can't manually reproduce the issue). I suspect it there is something of a race between docker.service and containerd's dockerd task.
Activity
deft-code commentedon Aug 28, 2018
Possibly related to the initial crash.
While trying to repro the error state I found that
systemctl restart containerd
can sometimes cause the dockerd daemon to fail. The command sequencesystemctl stop containerd; systemctl start containerd
always caused the daemon to fail.Dockerd spams the logs with:
dockerd[28681]: time="2018-08-28T20:39:22.495913117Z" level=error msg="failed to get event" error="rpc error: code = Unavailable desc = transport is closing" module=libcontainerd namespace=plugins.moby
Adding
containerd.service
to theAfter=
clause avoids the problem.The docs recommend always pairing
BindsTo=
withAfter=
.m1x0n commentedon Sep 4, 2018
@deft-code Thanks this solution helped.
Docker version 18.09.0-ce-tp5, build 9eb3d36
deft-code commentedon Sep 14, 2018
I can confirm the fix in tp5 fixed the docker portion of the issue. We're still seeing problems but it looks like
post-stop
has found a bug in containerd. containerd/containerd#2646kaizendeveloper commentedon Dec 24, 2018
After updating docker-ce + kernel on my Fedora 28 box, docker stopped working.
Using
journalctl -fu docker
I found out that the executable runc wasn't reachable,This was one of the messages in the log:
failed to find runc binary
I launched a find command and found the runc executable under
/opt/containerd/bin/runc
so I created a symbolic link to one of the directories specified in my PATH environment variable
sudo ln -s /opt/containerd/bin/runc /usr/local/bin/runc
After doing this the service could be started using systemctl
[docker-engine] fix systemd shutdown hang
[docker-engine] fix systemd shutdown hang (#2451)
[docker-engine] fix systemd shutdown hang (#2451)
Fix Docker systems service start/stop order
Fix Docker systemd service start/stop order
Fix Docker systemd service start/stop order