Skip to content

Kubelet needs to allow configuration of container memory-swap #7294

Closed
@derekwaynecarr

Description

@derekwaynecarr

I would like the Kubelet to support a flag like the following:

--container-memory-swap-limit-factor=0.1  (defaults to 0, meaning no memory-swap)

To control the default amount of MemorySwap that is given to a container as a percentage of their requested memory limit.

See https://docs.docker.com/reference/run/#runtime-constraints-on-cpu-and-memory for the third-row which is the behavior we show today when the user sets a memory-limit, but the kubelet does not specify memory-swap

(specify memory without memory-swap) The container is not allowed to use more than L bytes of memory, swap plus memory usage is double of that.

We think using memory-swap at all is a big step to take, and if the administrator wanted to allow for memory swap, it should not default to 2L where (L is the memory specified).

For affected area of Kubelet, see:
https://github.com/GoogleCloudPlatform/kubernetes/blob/master/pkg/kubelet/dockertools/manager.go#L442

For value we should explicitly set, see:
https://github.com/GoogleCloudPlatform/kubernetes/blob/master/Godeps/_workspace/src/github.com/fsouza/go-dockerclient/container.go#L177

Activity

derekwaynecarr

derekwaynecarr commented on Apr 24, 2015

@derekwaynecarr
MemberAuthor

/cc @thockin this aligns with our IRC discussion.

derekwaynecarr

derekwaynecarr commented on Apr 24, 2015

@derekwaynecarr
MemberAuthor

/cc @danmcp per our chat as well.

added
sig/nodeCategorizes an issue or PR as relevant to SIG Node.
priority/awaiting-more-evidenceLowest priority. Possibly useful, but not yet enough support to actually get it done.
on Apr 24, 2015
nkwangleiGIT

nkwangleiGIT commented on Jun 25, 2015

@nkwangleiGIT
Contributor

Hi all,
For this issue, so we want to support memory swap at both kubelet and API level?

  1. Add option --container-memory-swap-limit-factor=0 to kubelet, and it'll be a global default settings for memory swap
  2. In the container's resource limits, we can also specify something like container.resources.limits.memoryswap=1, then it'll use 1L as memory swap(L is the memory specified). And it'll map to 2L of docker memory swap, as this value in docker is the summary of memory and memory swap.

We're also customizing k8s to support memory swap in our project, so it can leverage docker memory swap. I'd like to contribute code using this issue, is my understanding above correct?
Let me know if any design change, thanks!

erictune

erictune commented on Jun 25, 2015

@erictune
Member

I'd rather not put factors into a ResourceList, as is implied in item 2 of previous comment. All the items in resources.limits should be additive. If you want to allow users to specify this from the API, then they can just specify an absolute amount of swap, or if they leave it empty, then some admission controller could default it, and they could set it to an explicit value of 0 or maybe 1 if they don't want any swap.

nkwangleiGIT

nkwangleiGIT commented on Jun 26, 2015

@nkwangleiGIT
Contributor

Hi erictune,
I agree with you on option 2, we'd better to let the user specify an absolute value for memory swap instead of factors in API approach, it'll make it consistent with Docker, and sure we should have default value if the user doesn't use it. How about the question below, the memory swap here should be only memory swap or the summary?

And it'll map to 2L of docker memory swap, as this value in docker is the summary of memory and memory swap.

How about kubelet level support, do we still need it?
Thanks.

erictune

erictune commented on Jun 26, 2015

@erictune
Member

The default value should be to not use swap at all.

I think docker chose their format to be just like the cgroups interface. But, I don't think we need to copy that pattern. I'd rather that the resources list does not double-count resources or mix resources. It should be easy to add up resource lists like you add up vectors, to get aggregate amounts of resources.

So, I think swap should just be the amount of swap, not swap + memory. And it should be called "swap" not "memoryswap", to reflect that.

One case to think about: Swapfiles can be on rotating disk or on SSD. This gives very different performance characteristics. Also, the scheduler needs to subtract any swap space from the storage space of the node when a pod is bound to it. It needs to know whether to subtract that from rotating disk or SSD. Should, should we have separate "diskswap" and "ssdswap" resources?

nkwangleiGIT

nkwangleiGIT commented on Jun 29, 2015

@nkwangleiGIT
Contributor

Hi erictune,
that's a good point, but I'm not sure why we need separate resource for "diskswap" and "ssdswap", the user can use normal disk or SSD memory swap as they need from performance perspective, but we only need to substract the size of swap when calculate the space size of a node.
Do you mean the swap size maybe differenent depending on it's normal disk or SSD?
Thanks for help!

erictune

erictune commented on Jun 29, 2015

@erictune
Member

I was thinking of the case where swap files are added dynamically by
kubelet in response to pods being started. In this model, the scheduler
needs to subtract the swap from the resources of the machine, and it needs
to know which resources to reduce..

However, if nodes are statically provisioned with swap, and they exclude
the provisioned swap resources from the reported capacity, then I agree we
can do it like you say.

On Sun, Jun 28, 2015 at 11:20 PM, WangLei notifications@github.com wrote:

Hi erictune,
that's a good point, but I'm not sure why we need separate resource for
"diskswap" and "ssdswap", the user can use normal disk or SSD memory swap
as they need from performance perspective, but we only need to substract
the size of swap when calculate the space size of a node.
Do you mean the swap size maybe differenent depending on it's normal disk
or SSD?
Thanks for help!


Reply to this email directly or view it on GitHub
#7294 (comment)
.

nkwangleiGIT

nkwangleiGIT commented on Jun 29, 2015

@nkwangleiGIT
Contributor

Hi erictune,
OK, I see, then we may also want to use 'memory swap' as a factor when the scheduler tries to calculate the fit node and schedule the pod, and we also need to know the total size of swap on each node someway.
Do we already have the swap size or capability added in current scheduler and kubelet implementation? it'll probably need more change if we want to add them all, thanks!

48 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/isolationpriority/awaiting-more-evidenceLowest priority. Possibly useful, but not yet enough support to actually get it done.sig/nodeCategorizes an issue or PR as relevant to SIG Node.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @munkyboy@jonmoter@mikedanese@yoshuawuyts@huggsboson

        Issue actions

          Kubelet needs to allow configuration of container memory-swap · Issue #7294 · kubernetes/kubernetes