-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add KEP for cgroups v2 support #1370
Conversation
Welcome @giuseppe! |
Hi @giuseppe. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's unshare cgroupns by default
Do you think we should open a separate KEP for rootless? |
yes let's keep it separate from cgroups v2, which is already touching enough components :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to add more detail on kubelet changes where it interacts with cgroups directly
keps/sig-node/20191118-cgroups-v2.md
Outdated
|
||
## User Stories | ||
|
||
* The Kubelet can run on a cgroups v2 host |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarify that kubelet will support v1 or v2
keps/sig-node/20191118-cgroups-v2.md
Outdated
|
||
Kubelet PR: [https://github.com/kubernetes/kubernetes/pull/85218](https://github.com/kubernetes/kubernetes/pull/85218) | ||
|
||
### Option 2: Use cgroups v2 throughout the stack |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For option 1, kubelet still has to write cgroups v2 for its interactions with cgroup. Please add some detail around kubelet / cm package and eviction manager.
keps/sig-node/20191118-cgroups-v2.md
Outdated
|
||
## Graduation Criteria | ||
|
||
- Alpha: Basic support for running Kubernetes on a cgroups v2 host. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth discussing if we should add test coverage on fedora as a good candidate upstream.
/assign |
83faa61
to
c0c871d
Compare
keps/sig-node/20191118-cgroups-v2.md
Outdated
_net_prio_ are not available with the new version. The alternative to | ||
these controllers is to use eBPF. | ||
|
||
The lack of the network controllers probably affects some CNI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't currently know of any CNI plugins that use cgroups for network management. (My list is not exhaustive, of course...). Bandwidth limiting, if implemented, is done via a tc filter / qdisc for all of the plugins I'm aware of.
|
||
|
||
With this approach cAdvisor would have to read back values from | ||
cgroups v2 files (already done). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, with this option, CRI implementations will need to map the cgroup v1 fields to cgroup v2 values, right?
Could you list what each layer needs to do for each option to make it clear?
- user (pod spec)
- kubelet
- CRI implementation
- OCI runtime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the changes in the CRI implementation and OCI runtime might be implemented in a different way, deciding where to draw the line.
For CRI-O+crun, most of the logic is in the OCI runtime itself, but that is not the only way to achieve it.
For the Kubelet, the changes in this patch should be enough (at least until we have not hugetlb available): kubernetes/kubernetes#85218
In the second phase though, when cgroup v2 is fully supported through the stack there is need to change both pod specs+CRI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you clarify why you feel the pod spec needs to change? i see no major reason to change the pod spec or resource representation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was assuming we'd like to expose all/most of the cgroup v2 features.
e.g. the memory controller on cgroup v1 allows to configure:
memory.soft_limit_in_bytes
memory.limit_in_bytes
while on cgroup v2 we have:
memory.high
memory.low
memory.max
memory.min
but that is probably out of the scope as each new feature (if needed) must go through its own KEP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@derekwaynecarr are future improvements based on what cgroup 2 offers out of scope for the current KEP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@giuseppe future improvements for what cgroup v2 offers should be out of scope of this kep. i would keep this kep focused on kubelet is tolerant of cgroup v2 host. adding new cgroup v2 specific features to resource model would be a separate enhancement.
To address @yujuhong question, I think we are saying the following:
- no change required to pod spec
- kubelet cgroup manager uses v1 or v2 mode based on its introspection of sys/fs on startup
- kubelet qos and pod level cgroup creation on a v2 host uses the mapping table specified
- cri implementers that support cgroup v2 hosts must use a similar mapping to ensure that pod bounding cgroup values are consistent with the container cgroup values.
- oci runtime spec is not required to change as cri implementer provides mapping in the transitional period.
@giuseppe agree with above?
/cc @Random-Liu for containerd |
@yujuhong: GitHub didn't allow me to request PR reviews from the following users: for, containerd. Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
23ddf34
to
a87c972
Compare
The rootless patches has been merged into the upstream Docker since early 2019 and no longer needs to be incubated in this repo. Another reason to remove Docker from this repo is that dockershim might not be going to support cgroup v2: kubernetes/enhancements#1370 (comment) Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
The rootless patches has been merged into the upstream Docker since early 2019 and no longer needs to be incubated in this repo. Another reason to remove Docker from this repo is that dockershim might not be going to support cgroup v2: kubernetes/enhancements#1370 (comment) Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
/retest |
per dawn's comment and discussion on sig-node, we can merge this kep, and update as we find issues with implementation in 1.19. there is no disagreement on need to iterate here. see: #1370 (comment) /approve |
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: derekwaynecarr, giuseppe, mrunalp The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
awesome work! |
LGTM :) |
| period | cpu.max | y = x| period and quota are written together| | ||
| quota | cpu.max | y = x| period and quota are written together| | ||
|
||
**blkio controller** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any kuberente versions and plans to support blkio controller ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any kuberente versions and plans to support blkio controller ?
kubernetes/enhancements#1907 the KEP is WIP.
Signed-off-by: Giuseppe Scrivano gscrivan@redhat.com