Description
when we upgrade the k8s from 1.6.4 to 1.9.0, after a few days, the product environment report the machine is hang and jvm crash in container randomly , we found the cgroup memory css id is not release, when cgroup css id is large than 65535, the machine is hang, we must restart the machine.
we had found runc/libcontainers/memory.go in k8s 1.9.0 had delete the if condition, which cause the kernel memory open by default, but we are using kernel 3.10.0-514.16.1.el7.x86_64, on this version, kernel memory limit is not stable, which leak the cgroup memory leak and application crash randomly
when we run "docker run -d --name test001 --kernel-memory 100M " , docker report
WARNING: You specified a kernel memory limit on a kernel older than 4.0. Kernel memory limits are experimental on older kernels, it won't work as expected and can cause your system to be unstable.
k8s.io/kubernetes/vendor/github.com/opencontainers/runc/libcontainer/cgroups/fs/memory.go
- if d.config.KernelMemory != 0 {
+ // Only enable kernel memory accouting when this cgroup
+ // is created by libcontainer, otherwise we might get
+ // error when people use `cgroupsPath` to join an existed
+ // cgroup whose kernel memory is not initialized.
if err := EnableKernelMemoryAccounting(path); err != nil {
return err
}
I want to know why kernel memory open by default? can k8s consider the different kernel version?
Is this a BUG REPORT or FEATURE REQUEST?: BUG REPORT
Uncomment only one, leave it on its own line:
/kind bug
/kind feature
What happened:
application crash and cgroup memory leak
What you expected to happen:
application stable and cgroup memory doesn't leak
How to reproduce it (as minimally and precisely as possible):
install k8s 1.9.x on kernel 3.10.0-514.16.1.el7.x86_64 machine, and create and delete pod repeatedly, when create more than 65535/3 times , the kubelet report "cgroup no space left on device" error, when the cluster run a few days , the container will crash.
Anything else we need to know?:
Environment: kernel 3.10.0-514.16.1.el7.x86_64
- Kubernetes version (use
kubectl version
): k8s 1.9.x - Cloud provider or hardware configuration:
- OS (e.g. from /etc/os-release):
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
- Kernel (e.g.
uname -a
): 3.10.0-514.16.1.el7.x86_64 - Install tools: rpm
- Others:
Activity
qkboy commentedon Mar 30, 2018
Use below test case can reproduce this error:
first, make cgroup memory to be full:
then release 99 cgroup memory that can be used next to create:
second, create a new pod on this node.
Each pod will create 3 cgroup memory directory. for example:
So when we recreate 100 cgroup memory directory, there will be 4 item failed:
third, delete the test pod. Recreate 100 cgroup memory directory before confirm all test pod's container are already destroy.
The correct result that we expected is only number 100 cgroup memory directory can not be create:
But the incorrect result is all cgroup memory directory created by pod are leaked:
Notice that cgroup memory count already reduce 3 , but they occupy space not release.
wzhx78 commentedon Mar 30, 2018
/sig container
/kind bug
wzhx78 commentedon Mar 30, 2018
@kubernetes/sig-cluster-container-bugs
feellifexp commentedon Mar 30, 2018
This bug seems to be related: opencontainers/runc#1725
Which docker version are you using?
qkboy commentedon Mar 30, 2018
@feellifexp with docker 1.13.1
frol commentedon Mar 30, 2018
There is indeed a kernel memory leak up to 4.0 kernel release. You can follow this link for details: moby/moby#6479 (comment)
wzhx78 commentedon Mar 31, 2018
@feellifexp the kernel log also have this message after upgrade to k8s 1.9.x
wzhx78 commentedon Mar 31, 2018
I want to know why k8s 1.9 delete this line
if d.config.KernelMemory != 0 {
in k8s.io/kubernetes/vendor/github.com/opencontainers/runc/libcontainer/cgroups/fs/memory.gofeellifexp commentedon Mar 31, 2018
I am not an expert here, but this seems to be change from runc, and the change was introduced to k8s since v1.8.
After reading the code, it seems it impacts cgroupfs cgroup driver, while systemd driver is not changed. But I did not test the theory yet.
Maybe experts from kubelet and container can chime in further.
kevin-wangzefeng commentedon Mar 31, 2018
/sig node
152 remaining items
gjkim42 commentedon May 20, 2021
cc @ehashman @bobbypage @dims
Does sig-node aware of this issue?
I think every cluster hosted by CentOS 7 has had this issue.
ehashman commentedon May 20, 2021
CentOS 7 is a much older kernel than what we test CI on in SIG Node/upstream Kubernetes (currently the 5.4.x series). People are welcome to experiment with kernel parameters and share workarounds for their own distributions/deployments but any support will be best effort.
kolyshkin commentedon May 20, 2021
I strongly suggest employing a workaround described at #61937 (comment)
Also, since runc v1.0.0-rc94 runc never sets kernel memory (so upgrading to runc >= v1.0.0-rc94 should solve the problem).
ffromani commentedon Jun 24, 2021
Kubernetes does not use issues on this repo for support requests. If you have a question on how to use Kubernetes or to debug a specific issue, please visit our forums.
/remove-kind bug
/kind support
/close
Extra rationale: this issue affects Centos 7, which is indeed much older than what we test in CI, and because workaround exists (see runc v1.0.0-rc94):
k8s-ci-robot commentedon Jun 24, 2021
@fromanirh: Closing this issue.
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
pxp531 commentedon Mar 27, 2024
yes, i also want to know cgroup.memory=nokmem will cause bad results? and cgroup.kmem desgins
libcontainer: ability to compile without kmem
libcontainer: ability to compile without kmem
libcontainer: ability to compile without kmem
libcontainer: ability to compile without kmem