Description
I would like the Kubelet to support a flag like the following:
--container-memory-swap-limit-factor=0.1 (defaults to 0, meaning no memory-swap)
To control the default amount of MemorySwap that is given to a container as a percentage of their requested memory limit.
See https://docs.docker.com/reference/run/#runtime-constraints-on-cpu-and-memory for the third-row which is the behavior we show today when the user sets a memory-limit, but the kubelet does not specify memory-swap
(specify memory without memory-swap) The container is not allowed to use more than L bytes of memory, swap plus memory usage is double of that.
We think using memory-swap at all is a big step to take, and if the administrator wanted to allow for memory swap, it should not default to 2L where (L is the memory specified).
For affected area of Kubelet, see:
https://github.com/GoogleCloudPlatform/kubernetes/blob/master/pkg/kubelet/dockertools/manager.go#L442
For value we should explicitly set, see:
https://github.com/GoogleCloudPlatform/kubernetes/blob/master/Godeps/_workspace/src/github.com/fsouza/go-dockerclient/container.go#L177
Activity
derekwaynecarr commentedon Apr 24, 2015
/cc @thockin this aligns with our IRC discussion.
derekwaynecarr commentedon Apr 24, 2015
/cc @danmcp per our chat as well.
nkwangleiGIT commentedon Jun 25, 2015
Hi all,
For this issue, so we want to support memory swap at both kubelet and API level?
We're also customizing k8s to support memory swap in our project, so it can leverage docker memory swap. I'd like to contribute code using this issue, is my understanding above correct?
Let me know if any design change, thanks!
erictune commentedon Jun 25, 2015
I'd rather not put factors into a ResourceList, as is implied in item 2 of previous comment. All the items in resources.limits should be additive. If you want to allow users to specify this from the API, then they can just specify an absolute amount of swap, or if they leave it empty, then some admission controller could default it, and they could set it to an explicit value of 0 or maybe 1 if they don't want any swap.
nkwangleiGIT commentedon Jun 26, 2015
Hi erictune,
I agree with you on option 2, we'd better to let the user specify an absolute value for memory swap instead of factors in API approach, it'll make it consistent with Docker, and sure we should have default value if the user doesn't use it. How about the question below, the memory swap here should be only memory swap or the summary?
How about kubelet level support, do we still need it?
Thanks.
erictune commentedon Jun 26, 2015
The default value should be to not use swap at all.
I think docker chose their format to be just like the cgroups interface. But, I don't think we need to copy that pattern. I'd rather that the resources list does not double-count resources or mix resources. It should be easy to add up resource lists like you add up vectors, to get aggregate amounts of resources.
So, I think swap should just be the amount of swap, not swap + memory. And it should be called "swap" not "memoryswap", to reflect that.
One case to think about: Swapfiles can be on rotating disk or on SSD. This gives very different performance characteristics. Also, the scheduler needs to subtract any swap space from the storage space of the node when a pod is bound to it. It needs to know whether to subtract that from rotating disk or SSD. Should, should we have separate "diskswap" and "ssdswap" resources?
nkwangleiGIT commentedon Jun 29, 2015
Hi erictune,
that's a good point, but I'm not sure why we need separate resource for "diskswap" and "ssdswap", the user can use normal disk or SSD memory swap as they need from performance perspective, but we only need to substract the size of swap when calculate the space size of a node.
Do you mean the swap size maybe differenent depending on it's normal disk or SSD?
Thanks for help!
erictune commentedon Jun 29, 2015
I was thinking of the case where swap files are added dynamically by
kubelet in response to pods being started. In this model, the scheduler
needs to subtract the swap from the resources of the machine, and it needs
to know which resources to reduce..
However, if nodes are statically provisioned with swap, and they exclude
the provisioned swap resources from the reported capacity, then I agree we
can do it like you say.
On Sun, Jun 28, 2015 at 11:20 PM, WangLei notifications@github.com wrote:
nkwangleiGIT commentedon Jun 29, 2015
Hi erictune,
OK, I see, then we may also want to use 'memory swap' as a factor when the scheduler tries to calculate the fit node and schedule the pod, and we also need to know the total size of swap on each node someway.
Do we already have the swap size or capability added in current scheduler and kubelet implementation? it'll probably need more change if we want to add them all, thanks!
48 remaining items