docker/kubernetes创建memory cgroup失败

## 环境

**版本和配置信息**

- kubernetes版本：
```
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.0", GitCommit:"91e7b4fd31fcd3d5f436da26c980becec37ceefe", GitTreeState:"clean", BuildDate:"2018-06-27T20:17:28Z", GoVersion:"go1.10.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.0", GitCommit:"91e7b4fd31fcd3d5f436da26c980becec37ceefe", GitTreeState:"clean", BuildDate:"2018-06-27T20:08:34Z", GoVersion:"go1.10.2", Compiler:"gc", Platform:"linux/amd64"}
```
- docker版本：
```
Containers: 21
 Running: 10
 Paused: 0
 Stopped: 11
Images: 33
Server Version: 18.06.0-ce
Storage Driver: overlay2
 Backing Filesystem: xfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: nvidia runc
Default Runtime: nvidia
Init Binary: docker-init
containerd version: d64c661f1d51c48782c9cec8fda7604785f93587
runc version: 69663f0bd4b60df09991c08812a60108003fa340-dirty (expected: 69663f0bd4b60df09991c08812a60108003fa340)
init version: fec3683
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-693.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 32
Total Memory: 251.6GiB
Name: xxxxx
ID: Q2GS:I7UQ:GDDB:QICE:Z5E5:KXXQ:CWZT:EN6F:6GWR:AZ6P:UEWY:XTSQ
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Registry Mirrors:
 https://kuamavit.mirror.aliyuncs.com/
 https://registry.docker-cn.com/
 https://docker.mirrors.ustc.edu.cn/
Live Restore Enabled: false
```
- 网络插件：
```
flannel
calico
```
- 存储类型：
```
xfs+overlay2
xfs+dm
```
- 操作系统及内核：
```
CentOS Linux release 7.4.1708 (Core)
3.10.0-693.el7.x86_64
```

## 操作

**导致该问题的操作或现象**

    集群在运行一段时间后（几天），会经常性的报不能创建memory cgroup子系统，如下:
```
cgroup configuration for process caused \"mkdir
/sys/fs/cgroup/memory/kubepods/burstable/podf1bd9e87-1ef2-11e8-afd3-fa163ecf2dce/8710c146b3c8b52f5da62e222273703b1e3d54a6a6270a0ea7ce1b194f1b5053:
no space left on device\""

```
  也无法手动创建cgroup文件
```
[root@k8s07 ~]# mkdir /sys/fs/cgroup/memory/test
mkdir: cannot create directory ‘/sys/fs/cgroup/memory/test’: No space left on device
```

  查看cgroup信息的时候发现，num_cgroups的数量很少，远远没有达到不能创建的64K大小，如下：

```
[root@k8s07 ~]# cat /proc/cgroups 
#subsys_name    hierarchy       num_cgroups     enabled
cpuset  9       23      1
cpu     8       110     1
cpuacct 8       110     1
memory  6       106     1
devices 3       111     1
freezer 7       22      1
net_cls 10      22      1
blkio   4       110     1
perf_event      5       22      1
hugetlb 2       22      1
pids    11      19      1
net_prio        10      22      1
```

   为了避免/proc/cgroups的信息有误，又人工查了一下实际数量如下：

[root@k8s07 ~]# find /sys/fs/cgroup/memory/ -type d |wc -l
106

  跟/proc/cgroups内数据一致。

  有文章指出该问题和kernel版本有关（链接：[http://dockone.io/article/4797](http://dockone.io/article/4797)，他上面描述的信息为创建的memory cgroup 在3.10及以下版本内核无法正常释放，导致最终达到cgroup的创建上限64K，但实际我这边情况是完全没有达到上限。

 为了验证上述结果我特意将一个节点升级内核到了4.4版本，反而出现他文章上面描述的无法释放memory cgroup 问题，如下：
```
#subsys_name    hierarchy       num_cgroups     enabled
cpuset  10      56      1
cpu     2       251     1
cpuacct 2       251     1
blkio   3       251     1
memory  11      4913    1
devices 5       251     1
freezer 7       56      1
net_cls 8       56      1
perf_event      6       56      1
net_prio        8       56      1
hugetlb 9       56      1
pids    4       252     1
```

## 备注：
- 整个环境装有匹配docker-ce的Nvidia-docker和nvidia-container-runtime,默认--runtime 为nvidia-container-runtime。   
- 其它相关参考的issues:
https://github.com/kubernetes/kubernetes/issues/61937
https://github.com/opencontainers/runc/issues/1725
https://github.com/moby/moby/issues/29638


## 日志

**日志或报错信息**
```
cgroup configuration for process caused \"mkdir
/sys/fs/cgroup/memory/kubepods/burstable/podf1bd9e87-1ef2-11e8-afd3-fa163ecf2dce/8710c146b3c8b52f5da62e222273703b1e3d54a6a6270a0ea7ce1b194f1b5053:
no space left on device\""

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docker/kubernetes创建memory cgroup失败 #313

环境

操作

备注：

日志

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Participants

docker/kubernetes创建memory cgroup失败 #313

Description

环境

操作

备注：

日志

Activity

bloodstars commented on Dec 3, 2018

rootsongjc commented on Dec 3, 2018

dahsing commented on Dec 3, 2018

Nebulazhang commented on Dec 6, 2018

dahsing commented on Dec 6, 2018

Nebulazhang commented on Dec 6, 2018

dahsing commented on Dec 7, 2018

bloodstars commented on Dec 14, 2018

melvynpan commented on Feb 14, 2019

melvynpan commented on Feb 14, 2019

rootsongjc commented on Feb 14, 2019

melvynpan commented on Feb 14, 2019

wyfaq commented on Jun 4, 2019

imroc commented on Jun 17, 2019

chen-joe1015 commented on Jul 15, 2019

a33151 commented on Jul 18, 2019

melvynpan commented on Jul 18, 2019

ajasonwang commented on Jul 22, 2019

likakuli commented on Aug 22, 2019

sleep0nTime commented on Sep 17, 2019

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Participants

Issue actions