Description
- This is a bug reportThis is a feature requestI searched existing issues before opening this one
Expected behavior
Containers should start and remain running.
Actual behavior
I suspect that the update to docker-ce-18.09.2
and/or containerd.io-1.2.2
crashed my running containers and prevents the creation of new ones. Both actions lead to following error:
Cannot start service redis: OCI runtime create failed: container_linux.go:344: starting container process caused "process_linux.go:293: copying bootstrap data to pipe caused \"write init-p: broken pipe\"": unknown
All running containers (which were managed by docker-compose
) show the exit status Exited (128)
(postgres, redis, nginx and django) or Exited (137)
(celery-worker and celery-beat), which from what I've gathered points to an OOM error? However, the container logs show that my processes received shutdown requests (SIGTERM
) and don't mention memory issues.
postgres_container | 2019-02-12T01:29:30.089053475Z 2019-02-12 01:29:30.088 UTC [1] LOG: received smart shutdown request
postgres_container | 2019-02-12T01:29:30.111902444Z 2019-02-12 01:29:30.111 UTC [1] LOG: worker process: logical replication launcher (PID 27) exited with exit code 1
postgres_container | 2019-02-12T01:29:30.111937077Z 2019-02-12 01:29:30.111 UTC [22] LOG: shutting down
redis_container | 2019-02-12T01:29:30.123951274Z 1:signal-handler (1549934970) Received SIGTERM scheduling shutdown...
nginx_container | 2019-02-12T01:29:30.169354823Z 2019/02/12 01:29:30 [alert] 1#1: unlink() "/var/run/nginx.pid" failed (13: Permission denied)
postgres_container | 2019-02-12T01:29:30.173129602Z 2019-02-12 01:29:30.171 UTC [1] LOG: database system is shut down
redis_container | 2019-02-12T01:29:30.181581336Z 1:M 12 Feb 01:29:30.177 # User requested shutdown...
\\\ # omitted full redis shutdown logs
celery_worker_container | 2019-02-12T01:29:30.257633495Z [2019-02-12 01:29:30,224: WARNING/MainProcess] consumer: Connection to broker lost. Trying to re-establish the connection...
celery_worker_container | 2019-02-12T01:29:30.257691039Z Traceback (most recent call last):
celery_worker_container | 2019-02-12T01:29:30.257697634Z File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 177, in _read_from_socket
celery_worker_container | 2019-02-12T01:29:30.257701676Z raise socket.error(SERVER_CLOSED_CONNECTION_ERROR)
celery_worker_container | 2019-02-12T01:29:30.257705303Z OSError: Connection closed by server.
\\\ # omitted python traceback
celery_worker_container | 2019-02-12T01:29:32.415761098Z [2019-02-12 01:29:32,415: ERROR/MainProcess] consumer: Cannot connect to redis://redis:6379/0: Error -2 connecting to redis:6379. Name or service not known..
celery_worker_container | 2019-02-12T01:29:32.415812976Z Trying again in 4.00 seconds...
celery_worker_container | 2019-02-12T01:29:32.415819541Z
celery_worker_container | 2019-02-12T01:29:36.478919701Z [2019-02-12 01:29:36,478: ERROR/MainProcess] consumer: Cannot connect to redis://redis:6379/0: Error -2 connecting to redis:6379. Name or service not known..
celery_worker_container | 2019-02-12T01:29:36.478969366Z Trying again in 6.00 seconds...
celery_worker_container | 2019-02-12T01:29:36.478974635Z
# final message
docker inspect container
"State": {
"Status": "exited",
"Running": false,
"Paused": false,
"Restarting": false,
"OOMKilled": false,
"Dead": false,
"Pid": 0,
"ExitCode": 128,
"Error": "OCI runtime create failed: container_linux.go:344: starting container process caused \"process_linux.go:293: copying bootstrap data to pipe caused \\\"write init-p: broken pipe\\\"\": unknown",
"StartedAt": "2019-02-05T02:01:26.287177994Z",
"FinishedAt": "2019-02-12T01:29:31.16757356Z"
},
The same error appears when I try to spin-up a new container.
Update log
/var/cpanel/updatelogs/update.1549934941.log
[2019-02-12 02:29:47 +0100] [/usr/local/cpanel/scripts/rpmup] Updating : 1:docker-ce-cli-18.09.2-3.el7.x86_64 1/6
[2019-02-12 02:29:47 +0100] [/usr/local/cpanel/scripts/rpmup] Updating : containerd.io-1.2.2-3.3.el7.x86_64 2/6
[2019-02-12 02:29:47 +0100] [/usr/local/cpanel/scripts/rpmup] Updating : 3:docker-ce-18.09.2-3.el7.x86_64 3/6
Note that there's a 1 hour difference due to the configuration of the timezone on my OS compared to the containers. It also reports the same error as the inspect command showed:
var/log/messages
I added this as an attachment (var-log-messages.txt). It shows that an update was initiated right before the containers crashed.
Feb 12 02:29:45 server1 dockerd: time="2019-02-12T02:29:45.867408717+01:00" level=error msg="Failed to start containerr da183f015a4163ac9826971ade22b2ecc27a8cf661f4982c7a130d3cc5c3d268: OCI runtime create failed: container_linux.go:344: starting container process caused \"process_linux.go:293: copying bootstrap data to pipe caused \\\"write init-p: broken pipe\\\"\": unknown"
OS and kernel
$ cat /etc/centos-release
CentOS Linux release 7.6.1810 (Core)
$ uname -s -r
Linux 3.10.0-229.20.1.el7.centos.plus.x86_6
$ cat /proc/version
Linux version 3.10.0-229.20.1.el7.centos.plus.x86_64 (builder@kbuilder.dev.centos.org) (gcc version 4.8.3 20140911 (Red Hat 4.8.3-9) (GCC) ) #1 SMP Wed Nov 4 01:06:14 UTC 2015
Downgrade attempt
yum downgrade docker-ce
(3:18.09.1-3.el7
) still results in the same error message when I try to recreate my containers.
$ docker-compose up --force-recreate -d
Removing postgres_container
Removing redis_container
Recreating 5a76e1750c7b_redis_container ...
Recreating bdb2d0d65cd8_postgres_container ... error
Recreating 5a76e1750c7b_redis_container ... error
bootstrap data to pipe caused \"write init-p: broken pipe\"": unknown
ERROR: for 5a76e1750c7b_redis_container Cannot start service redis: OCI runtime create failed: container_linux.go:344: starting container process caused "process_linux.go:293: copying bootstrap data to pipe caused \"write init-p: broken pipe\"": unknown
ERROR: for postgres Cannot start service postgres: OCI runtime create failed: container_linux.go:344: starting container process caused "process_linux.go:293: copying bootstrap data to pipe caused \"write init-p: broken pipe\"": unknown
ERROR: for redis Cannot start service redis: OCI runtime create failed: container_linux.go:344: starting container process caused "process_linux.go:293: copying bootstrap data to pipe caused \"write init-p: broken pipe\"": unknown
ERROR: Encountered errors while bringing up the project.
make: *** [run] Error 1
Downgrading containerd.io
(containerd.io.x86_64 0:1.2.2-3.el7
) in addition to docker-ce
does allow me to recreate the containers.
EDIT: upgrading the kernel to 3.10.0-957.5.1.el7.centos.plus.x86_64
also fixes the issue.
Activity
thaJeztah commentedon Feb 18, 2019
Note that CentOS uses a rolling release model, which means that older versions (including their kernels) reach EOL if a newer version is released.
Kernel 3.10.0-229 is a really old version of the CentOS kernel, so definitely not recommended to be running.
Also make sure you don't have a custom
MountFlags
option set in your systemd unit file if you're running docker 18.09 or up (see #485 (comment))pmoris commentedon Feb 19, 2019
There's no
MountFlags
option specified in/lib/systemd/system/docker.service
. Is that the correct file or should I be looking somewhere else?thaJeztah commentedon Feb 19, 2019
Easiest way to find is to use
systemctl cat docker.service
- that will show the contents of all unit-files and any possible override/drop-in file that is loaded for the service.pmoris commentedon Feb 19, 2019
Thanks for the swift reply! I see no mention of
MountFlags
in there. Here's the full output if that helps.thaJeztah commentedon Feb 19, 2019
ok, thanks! looks like there's indeed no
MountFlags
set for the service, so that's not the problem.Looking at the error again (
write init-p: broken pipe "": unknown
), this looks pretty similar to #597trapier commentedon Feb 20, 2019
There is no new information in this comment. Only condensed recreate steps and confirmation of previous observations that the issue goes away upon containerd downgrade or kernel upgrade.
Minimal Recreate
Result:
Confirmed relief via containerd downgrade or kernel upgrade
As observed by @pmoris (thanks!), issue goes away when downgrading
containerd
:... or when upgrading to latest kernel (
3.10.0-957.5.1.el7
).trapier commentedon Feb 20, 2019
Pasted the install steps into a vagrant shell provisioner and bisected by vagrant box version:
test:
docker run --rm alpine
-327
corresponds to a RHEL 7.2 kernel.thaJeztah commentedon Feb 20, 2019
Thanks @trapier - so, the runc fix requires a kernel feature that was added in kernel 3.17, but was backported in RHEL kernels.
I wonder if kernel
-514
was the first kernel they backported it to.For Docker Engine Community, this is not an issue (as it is not supported on RHEL, only on CentOS, so only the latest kernel version is supported), but for Docker Engine Enterprise, we need to check if there's still versions of docker that are supported on RHEL 7.2 (if so, an alternative fix is needed)
andrewhsu commentedon Feb 20, 2019
Docker EE does not have anymore versions supported on RHEL 7.2 https://success.docker.com/article/compatibility-matrix
leeningli commentedon Feb 22, 2019
Maybe my test is helpful:
My Test is as follow:(docker 18.09.2,centos7.6)
1.kernel:3.10.215
docker command :docker run -d -it -e MYSQL_ROOT_PASSWORD="lee" -mysql:5.7
then ,I got the error.
2.kernel:3.10.215
docker command :docker run -d -it --net=host -e MYSQL_ROOT_PASSWORD="lee" -mysql:5.7
then ,I got the error.
3.kernel 3.10.0.927
docker command :docker run -d -it -e MYSQL_ROOT_PASSWORD="lee" -mysql:5.7
then ,I got the error
4.kernel:3.10.0.927
docker command:docker run -d -it --net=host -e MYSQL_ROOT_PASSWORD="lee" -mysql:5.7
then ,It is OK.
5.kernel:4.20
docker command:docker run -d -it -e MYSQL_ROOT_PASSWORD="lee" -mysql:5.7
then ,I gotthe error.
6.kernel:4.20
docker command:docker run -d -it --net=host -e MYSQL_ROOT_PASSWORD="lee" -mysql:5.7
then,It is ok.
thaJeztah commentedon Feb 22, 2019
@leeningli so in each case, you start two MySQL containers; one with its own networking namespace, and one with
--net=host
(so using the host's networking namespace) correct?Is there anything in the system- or daemon logs? (also might want to check audit logs to see if SELinux is involved)
YumeMichi commentedon Mar 25, 2019
So how can I downgrade my docker-ce on centos 7? My production server cannot be restarted.
2 remaining items