Description
Is this a BUG REPORT or FEATURE REQUEST?:
Uncomment only one, leave it on its own line:
/kind bug
/kind feature
What happened:
Kubelet/Kubernetes 1.8 does not work with Swap enabled on Linux Machines.
I have found this original issue #31676
This PR #31996
and last change which enabled it by default 71e8c8e
If Kubernetes does not know how to handle memory eviction when Swap is enabled - it should find a way how to do that, but not asking to get rid of swap.
Please follow kernel.org Chapter 11 Swap Management, for example
The casual reader may think that with a sufficient amount of memory, swap is unnecessary but this brings us to the second reason. A significant number of the pages referenced by a process early in its life may only be used for initialisation and then never used again. It is better to swap out those pages and create more disk buffers than leave them resident and unused.
In case of running a lot of node/java applications I have seen always a lot of pages are swapped, just because they aren't used anymore.
What you expected to happen:
Kubelet/Kubernetes should work with Swap enabled. I believe instead of disabling swap and giving users no choices kubernetes should support more use cases and various workloads, some of them can be an applications which might rely on caches.
I am not sure how kubernetes decided what to kill with memory eviction, but considering that Linux has this capability, maybe it should align with how Linux does that? https://www.kernel.org/doc/gorman/html/understand/understand016.html
I would suggest to rollback the change for failing when swap is enabled, and revisit how the memory eviction works currently in kubernetes. Swap can be important for some workloads.
How to reproduce it (as minimally and precisely as possible):
Run kubernetes/kublet with default settings on linux box
Anything else we need to know?:
Environment:
- Kubernetes version (use
kubectl version
): - Cloud provider or hardware configuration**:
- OS (e.g. from /etc/os-release):
- Kernel (e.g.
uname -a
): - Install tools:
- Others:
/sig node
cc @mtaufen @vishh @derekwaynecarr @dims
Activity
derekwaynecarr commentedon Oct 7, 2017
Support for swap is non-trivial. Guaranteed pods should never require swap. Burstable pods should have their requests met without requiring swap. BestEffort pods have no guarantee. The kubelet right now lacks the smarts to provide the right amount of predictable behavior here across pods.
We discussed this topic at the resource mgmt face to face earlier this year. We are not super interested in tackling this in the near term relative to the gains it could realize. We would prefer to improve reliability around pressure detection, and optimize issues around latency before trying to optimize for swap, but if this is a higher priority for you, we would love your help.
derekwaynecarr commentedon Oct 7, 2017
/kind feature
outcoldman commentedon Oct 9, 2017
@derekwaynecarr thank you for explanation! It was hard to get any information/documentation why swap should be disabled for kubernetes. This was the main reason why I opened this topic. At this point I do not have high priority for this issue, just wanted to be sure that we have a place where it can be discussed.
matthiasr commentedon Oct 9, 2017
There is more context in the discussion here: #7294 – having swap available has very strange and bad interactions with memory limits. For example, a container that hits its memory limit would then start spilling over into swap (this appears to be fixed since f4edaf2 – they won't be allowed to use any swap whether it's there or not).
fieryorc commentedon Jan 2, 2018
This is critical use case for us too. We have a cron job that occasionally runs into high memory usage (>30GB) and we don't want to permanently allocate 40+GB nodes. Also, given that we run in three zones (GKE), this will allocate 3 such machines (1 in each zone). And this configuration has to be repeated in 3+ production instances and 10+ test instances making this super expensive to use K8s. We are forced to have 25+ 48GB nodes which incurs huge cost!.
Please enable swap!.
hjwp commentedon Jan 5, 2018
A workaround for those who really want swap. If you
--fail-swap-on=false
That's what we're doing. Or at least, I'm pretty sure it is, I didn't actually implement it personally, but that's what I gather.
This might only really be a viable strategy if none of your containers ever specify an explicit memory requirement...
fieryorc commentedon Jan 6, 2018
We run in GKE, and I don't know of a way to set those options.
161 remaining items