Description
Description
In a container which has a TTY allocated (docker run -t
), if a process whose stdout/stderr is being collected by docker-containerd-shim
sends a line several thousand characters long (observed to be anywhere in the range of about 5k-9k), docker-containerd-shim
stops reading from the PTY master. This causes processes writing to the terminal to hang in the write
(2) call, which quickly renders the container unserviceable.
Steps to reproduce the issue:
Here's how I can get a reliable reproduction:
-
Install
docker-ce
version17.11.0~ce-0~ubuntu
on an Ubuntu 16.04 x64 system. -
Start a container like this:
docker run -t busybox /bin/sh -c 's="a"; while true; do echo $s | wc -c; echo $s; s="a$s"; done'
-
The screen will quickly start scrolling successively longer lines of
a
s, however within a few seconds, the terminal will stop scrolling, having printed a number in the mid-to-high thousands and a long line ofa
characters. While the number (indicating the length of the string printed immediately afterwards) will vary on each run, I've never seen it reach 10k.
Describe the results you received:
The container's logs stop displaying lines.
Describe the results you expected:
The container's logs continue to display lines until the container is terminated.
Additional information you deem important (e.g. issue happens only occasionally):
The exact line length that the demo program stops at varies between runs by a considerable margin (5k to 9k in my observations), which suggests this isn't a buffer size problem, but is instead a race condition somewhere.
The problem only occurs when a TTY is allocated, suggesting this is not a general problem with log I/O. If you re-run the same docker run
command but without the -t
option (which, I've noticed, feeds the container output via a pipe, rather than a pty), you can let this run as long as your patience will last. I've let it go to line lengths of 42k before getting bored and hitting ^C.
Downgrading to 17.10.0~ce-0~ubuntu
(commit ID f4ffd25
) and re-running the problematic command (with -t
) also does not exhibit the unhelpful behaviour, suggesting this problem is a recent addition to the pantheon, and (if it is a problem in, say, the kernel) it's being triggered by recent changes in Docker.
Output of docker version
:
Client:
Version: 17.11.0-ce
API version: 1.34
Go version: go1.8.3
Git commit: 1caf76c
Built: Mon Nov 20 18:36:37 2017
OS/Arch: linux/amd64
Server:
Version: 17.11.0-ce
API version: 1.34 (minimum version 1.12)
Go version: go1.8.3
Git commit: 1caf76c
Built: Mon Nov 20 18:35:10 2017
OS/Arch: linux/amd64
Experimental: false
Output of docker info
:
Containers: 6
Running: 4
Paused: 0
Stopped: 2
Images: 1
Server Version: 17.11.0-ce
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 11
Dirperm1 Supported: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 992280e8e265f491f7a624ab82f3e238be086e49
runc version: 0351df1c5a66838d0c392b4ac4cf9450de844e2d
init version: 949e6fa
Security Options:
apparmor
Kernel Version: 3.13.0-135-generic
Operating System: Ubuntu 14.04.5 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 1.955GiB
Name: docker-test
ID: MF2K:XQZL:SO7O:ZU6O:JIAR:AJZC:3EE2:YILN:PKN2:IXST:7KMB:J7K5
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: true
WARNING: No swap limit support
Activity
thaJeztah commentedon Dec 24, 2017
ping @mlaventure PTAL
mpalmer commentedon Dec 29, 2017
I've just tested the above reproduction on
and cannot cause the problem to occur.
Dayanand-Chinchure commentedon Jan 5, 2018
Getting the same for docker version 17.06
Client:
Version: 17.06.0-ce
API version: 1.30
Go version: go1.8.3
Git commit: 02c1d87
Built: Fri Jun 23 21:15:15 2017
OS/Arch: linux/amd64
Server:
Version: dev
API version: 1.36 (minimum version 1.12)
Go version: go1.9.2
Git commit: 145fced
Built: Fri Jan 5 07:18:50 2018
OS/Arch: linux/amd64
Experimental: false
mpalmer commentedon Jan 16, 2018
After further reports of problems from our customers running Docker 18.01, I managed to get a repro with the above steps on
However it took a lot longer this time (got up to
78619
) before bombing out. So the bug is still there, it's just sneakier now.docker service logs
stops showing logs from containers on different nodes #33183erikh commentedon May 22, 2018
successfully repro'd on all current and edge 18.0x releases; this was a fun one to chase down. :P
thaJeztah commentedon May 22, 2018
Thanks @erikh - I was able to reproduce as well (tried a while back). I think @cpuguy83 was looking into this, and possibly narrowed it down to an issue in containerd (not sure if he came further)
cpuguy83 commentedon May 22, 2018
Definitely an issue with how the pty is setup.
I've found that this can be reproduced by just sending lots of data to stdout/stderr, line length doesn't matter.
I'm not familiar (really at all) with how these things are allocated so it's been a bit of a slog to track down.
As far as I can tell everything in containerd/docker is working correctly. The copy routines continue to run (side note, would be nice if we could use something other than userspace copying here) correctly, something is just getting stuck.
erikh commentedon May 22, 2018
erikh commentedon May 22, 2018
61 remaining items