Skip to content

Logging long lines breaks container #35865

Closed
@mpalmer

Description

@mpalmer

Description

In a container which has a TTY allocated (docker run -t), if a process whose stdout/stderr is being collected by docker-containerd-shim sends a line several thousand characters long (observed to be anywhere in the range of about 5k-9k), docker-containerd-shim stops reading from the PTY master. This causes processes writing to the terminal to hang in the write(2) call, which quickly renders the container unserviceable.

Steps to reproduce the issue:

Here's how I can get a reliable reproduction:

  1. Install docker-ce version 17.11.0~ce-0~ubuntu on an Ubuntu 16.04 x64 system.

  2. Start a container like this:

     docker run -t busybox /bin/sh -c 's="a"; while true; do echo $s | wc -c; echo $s; s="a$s"; done'
    
  3. The screen will quickly start scrolling successively longer lines of as, however within a few seconds, the terminal will stop scrolling, having printed a number in the mid-to-high thousands and a long line of a characters. While the number (indicating the length of the string printed immediately afterwards) will vary on each run, I've never seen it reach 10k.

Describe the results you received:

The container's logs stop displaying lines.

Describe the results you expected:

The container's logs continue to display lines until the container is terminated.

Additional information you deem important (e.g. issue happens only occasionally):

The exact line length that the demo program stops at varies between runs by a considerable margin (5k to 9k in my observations), which suggests this isn't a buffer size problem, but is instead a race condition somewhere.

The problem only occurs when a TTY is allocated, suggesting this is not a general problem with log I/O. If you re-run the same docker run command but without the -t option (which, I've noticed, feeds the container output via a pipe, rather than a pty), you can let this run as long as your patience will last. I've let it go to line lengths of 42k before getting bored and hitting ^C.

Downgrading to 17.10.0~ce-0~ubuntu (commit ID f4ffd25) and re-running the problematic command (with -t) also does not exhibit the unhelpful behaviour, suggesting this problem is a recent addition to the pantheon, and (if it is a problem in, say, the kernel) it's being triggered by recent changes in Docker.

Output of docker version:

Client:
 Version:      17.11.0-ce
 API version:  1.34
 Go version:   go1.8.3
 Git commit:   1caf76c
 Built:        Mon Nov 20 18:36:37 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.11.0-ce
 API version:  1.34 (minimum version 1.12)
 Go version:   go1.8.3
 Git commit:   1caf76c
 Built:        Mon Nov 20 18:35:10 2017
 OS/Arch:      linux/amd64
 Experimental: false

Output of docker info:

Containers: 6
 Running: 4
 Paused: 0
 Stopped: 2
Images: 1
Server Version: 17.11.0-ce
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 11
 Dirperm1 Supported: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 992280e8e265f491f7a624ab82f3e238be086e49
runc version: 0351df1c5a66838d0c392b4ac4cf9450de844e2d
init version: 949e6fa
Security Options:
 apparmor
Kernel Version: 3.13.0-135-generic
Operating System: Ubuntu 14.04.5 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 1.955GiB
Name: docker-test
ID: MF2K:XQZL:SO7O:ZU6O:JIAR:AJZC:3EE2:YILN:PKN2:IXST:7KMB:J7K5
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: true

WARNING: No swap limit support

Activity

thaJeztah

thaJeztah commented on Dec 24, 2017

@thaJeztah
Member

ping @mlaventure PTAL

mpalmer

mpalmer commented on Dec 29, 2017

@mpalmer
Author

I've just tested the above reproduction on

Client:
 Version:       17.12.0-ce
 API version:   1.35
 Go version:    go1.9.2
 Git commit:    c97c6d6
 Built: Wed Dec 27 20:11:19 2017
 OS/Arch:       linux/amd64

Server:
 Engine:
  Version:      17.12.0-ce
  API version:  1.35 (minimum version 1.12)
  Go version:   go1.9.2
  Git commit:   c97c6d6
  Built:        Wed Dec 27 20:09:53 2017
  OS/Arch:      linux/amd64
  Experimental: false

and cannot cause the problem to occur.

Dayanand-Chinchure

Dayanand-Chinchure commented on Jan 5, 2018

@Dayanand-Chinchure

Getting the same for docker version 17.06

Client:
Version: 17.06.0-ce
API version: 1.30
Go version: go1.8.3
Git commit: 02c1d87
Built: Fri Jun 23 21:15:15 2017
OS/Arch: linux/amd64

Server:
Version: dev
API version: 1.36 (minimum version 1.12)
Go version: go1.9.2
Git commit: 145fced
Built: Fri Jan 5 07:18:50 2018
OS/Arch: linux/amd64
Experimental: false

mpalmer

mpalmer commented on Jan 16, 2018

@mpalmer
Author

After further reports of problems from our customers running Docker 18.01, I managed to get a repro with the above steps on

Client:
 Version:       18.01.0-ce
 API version:   1.35
 Go version:    go1.9.2
 Git commit:    03596f5
 Built: Wed Jan 10 20:11:05 2018
 OS/Arch:       linux/amd64
 Experimental:  false
 Orchestrator:  swarm

Server:
 Engine:
  Version:      18.01.0-ce
  API version:  1.35 (minimum version 1.12)
  Go version:   go1.9.2
  Git commit:   03596f5
  Built:        Wed Jan 10 20:09:37 2018
  OS/Arch:      linux/amd64
  Experimental: false

However it took a lot longer this time (got up to 78619) before bombing out. So the bug is still there, it's just sneakier now.

erikh

erikh commented on May 22, 2018

@erikh
Contributor

successfully repro'd on all current and edge 18.0x releases; this was a fun one to chase down. :P

thaJeztah

thaJeztah commented on May 22, 2018

@thaJeztah
Member

Thanks @erikh - I was able to reproduce as well (tried a while back). I think @cpuguy83 was looking into this, and possibly narrowed it down to an issue in containerd (not sure if he came further)

cpuguy83

cpuguy83 commented on May 22, 2018

@cpuguy83
Member

Definitely an issue with how the pty is setup.
I've found that this can be reproduced by just sending lots of data to stdout/stderr, line length doesn't matter.

I'm not familiar (really at all) with how these things are allocated so it's been a bit of a slog to track down.
As far as I can tell everything in containerd/docker is working correctly. The copy routines continue to run (side note, would be nice if we could use something other than userspace copying here) correctly, something is just getting stuck.

erikh

erikh commented on May 22, 2018

@erikh
Contributor
erikh

erikh commented on May 22, 2018

@erikh
Contributor

61 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

    Development

    No branches or pull requests

      Participants

      @mpalmer@erikh@ferrouswheel@gerasiov@jugglinmike

      Issue actions

        Logging long lines breaks container · Issue #35865 · moby/moby