Description
Description
Docker run command with option --security-opt=no-new-privileges
gets stuck
This issue is originated from k8s where kubelet reports PLEG unhealthy message when CoreDNS starts up.
While checking the spec of CoreDNS, we found that there is a security context set for the coredns container.
securityContext:
allowPrivilegeEscalation: false
This option will add flag no_new_priv to the container.
When we remove the security context, the container starts up fine.
Steps to reproduce the issue:
- Deploy a node with Ubuntu 18 with latest docker and containerd version 1.4.4.-1
- Run a loop to create containers
while true; do docker run -itd --security-opt=no-new-privileges leodotcloud/swiss-army-knife ; done
- Wait for docker run command to get stuck (it get stuck after creating ~170 containers)
Describe the results you received:
The "runc init" command is the one that gets stuck
Doing an strace on "runc init" causes the docker run command to exit . The strace output will end like below
strace: Process 3423 attached
futex(0x560028b1dbd0, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x560028b1dad0, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0xc000040848, FUTEX_WAKE_PRIVATE, 1) = 1
epoll_ctl(8, EPOLL_CTL_DEL, 6, 0xc0001d6ca4) = 0
...
futex(0x560028b1dbd0, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x560028b1dad0, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0xc000040848, FUTEX_WAKE_PRIVATE, 1) = 1
...
read(4, "", 8) = 0
epoll_ctl(8, EPOLL_CTL_DEL, 4, 0xc0001d6c7c) = 0
close(4) = 0
futex(0x560028b1e508, FUTEX_WAIT_PRIVATE, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable)
seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_LOG, NULL) = -1 EFAULT (Bad address)
seccomp(SECCOMP_GET_ACTION_AVAIL, 0, [SECCOMP_RET_LOG]) = 0
prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, {len=526, filter=0xc0002a2000}) = -1 EINVAL (Invalid argument)
write(2, "standard_init_linux.go:207: init"..., 132) = 132
futex(0x560028b1e508, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
exit_group(1) = ?
+++ exited with 1 +++
Describe the results you expected:
No stuck containers during the creation
What version of containerd are you using:
$ containerd --version
containerd containerd.io 1.4.4 05f951a3781f4f2c1911b05e61c160e9c30eaa8e
Any other relevant information (runC version, CRI configuration, OS/Kernel version, etc.):
pprof_goroutines attached here
runc --version
runc version 1.0.0-rc93
commit: 12644e614e25b05da6fd08a38ffa0cfe1903fdec
spec: 1.0.2-dev
go: go1.13.15
libseccomp: 2.4.3
crictl info
$ crictl infoWARN[0000] runtime connect using default endpoints: [unix:///var/run/dockershim.sock unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock]. As the default settings are now deprecated, you should set the endpoint instead.
ERRO[0002] connect endpoint 'unix:///var/run/dockershim.sock', make sure you are running as root and the endpoint has been started: context deadline exceeded
FATA[0002] getting status of runtime: rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeServiceAfter commenting #disabled_plugins = ["cri"] in config.toml
{
"status": {
"conditions": [
{
"type": "RuntimeReady",
"status": true,
"reason": "",
"message": ""
},
{
"type": "NetworkReady",
"status": false,
"reason": "NetworkPluginNotReady",
"message": "Network plugin returns error: cni plugin not initialized"
}
]
},
"cniconfig": {
"PluginDirs": [
"/opt/cni/bin"
],
"PluginConfDir": "/etc/cni/net.d",
"PluginMaxConfNum": 1,
"Prefix": "eth",
"Networks": [
{
"Config": {
"Name": "cni-loopback",
"CNIVersion": "0.3.1",
"Plugins": [
{
"Network": {
"type": "loopback",
"ipam": {},
"dns": {}
},
"Source": "{"type":"loopback"}"
}
],
"Source": "{\n"cniVersion": "0.3.1",\n"name": "cni-loopback",\n"plugins": [{\n "type": "loopback"\n}]\n}"
},
"IFName": "lo"
}
]
},
"config": {
"containerd": {
"snapshotter": "overlayfs",
"defaultRuntimeName": "runc",
"defaultRuntime": {
"runtimeType": "",
"runtimeEngine": "",
"PodAnnotations": null,
"ContainerAnnotations": null,
"runtimeRoot": "",
"options": null,
"privileged_without_host_devices": false,
"baseRuntimeSpec": ""
},
"untrustedWorkloadRuntime": {
"runtimeType": "",
"runtimeEngine": "",
"PodAnnotations": null,
"ContainerAnnotations": null,
"runtimeRoot": "",
"options": null,
"privileged_without_host_devices": false,
"baseRuntimeSpec": ""
},
"runtimes": {
"runc": {
"runtimeType": "io.containerd.runc.v2",
"runtimeEngine": "",
"PodAnnotations": null,
"ContainerAnnotations": null,
"runtimeRoot": "",
"options": {},
"privileged_without_host_devices": false,
"baseRuntimeSpec": ""
}
},
"noPivot": false,
"disableSnapshotAnnotations": true,
"discardUnpackedLayers": false
},
"cni": {
"binDir": "/opt/cni/bin",
"confDir": "/etc/cni/net.d",
"maxConfNum": 1,
"confTemplate": ""
},
"registry": {
"mirrors": {
"docker.io": {
"endpoint": [
"https://registry-1.docker.io"
]
}
},
"configs": null,
"auths": null,
"headers": null
},
"imageDecryption": {
"keyModel": ""
},
"disableTCPService": true,
"streamServerAddress": "127.0.0.1",
"streamServerPort": "0",
"streamIdleTimeout": "4h0m0s",
"enableSelinux": false,
"selinuxCategoryRange": 1024,
"sandboxImage": "k8s.gcr.io/pause:3.2",
"statsCollectPeriod": 10,
"systemdCgroup": false,
"enableTLSStreaming": false,
"x509KeyPairStreaming": {
"tlsCertFile": "",
"tlsKeyFile": ""
},
"maxContainerLogSize": 16384,
"disableCgroup": false,
"disableApparmor": false,
"restrictOOMScoreAdj": false,
"maxConcurrentDownloads": 3,
"disableProcMount": false,
"unsetSeccompProfile": "",
"tolerateMissingHugetlbController": true,
"disableHugetlbController": true,
"ignoreImageDefinedVolumes": false,
"containerdRootDir": "/var/lib/containerd",
"containerdEndpoint": "/run/containerd/containerd.sock",
"rootDir": "/var/lib/containerd/io.containerd.grpc.v1.cri",
"stateDir": "/run/containerd/io.containerd.grpc.v1.cri"
},
"golang": "go1.13.15",
"lastCNILoadStatus": "cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to load cni config"
}
uname -a
Linux k8s-u18-worker01 4.15.0-139-generic #143-Ubuntu SMP Tue Mar 16 01:30:17 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
After downgrading to 1.4.3-1 , the issue won't appear
Activity
Oats87 commentedon Mar 24, 2021
opencontainers/runc#2865 is possibly related to this
ansilh commentedon Mar 24, 2021
Thanks @Oats87
I couldn’t repro the issue after replacing runc
1.0.0-rc93
with1.0.0-rc92
https://github.com/opencontainers/runc/releases/tag/v1.0.0-rc92
AkihiroSuda commentedon Mar 25, 2021
Is this issue closable then?
ansilh commentedon Mar 25, 2021
Bug is on runc, hence closing this issue.
cpuguy83 commentedon Mar 29, 2021
I can repro easily in an AKS cluster with rc93. rc92 works just fine. However I've only ever seen the issue with
io.containerd.runtime.linux.v1
, notio.containerd.runc.v2
.As soon as I
strace
therunc init
it exits.We do not use
no-new-privileges
.cpuguy83 commentedon Mar 29, 2021
@ansilh What version of Docker is that?
ansilh commentedon Mar 30, 2021
@cpuguy83
Docker Version:
19.03.15
docker info
cpuguy83 commentedon Mar 30, 2021
Ok yeah that's the v1 shim as well.
2 remaining items