-
Notifications
You must be signed in to change notification settings - Fork 40.6k
Make OOM not be a SIGKILL #40157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@kubernetes/sig-node-feature-requests |
It is not possible to change OOM behavior currently. Kubernetes (or runtime) could provide your container a signal whenever your container is close to its memory limit. This will be on a best effort basis though because memory spikes might not be handled on time. |
FYI using this crutch atm https://github.com/grosser/preoomkiller any idea what would need to change to make OOM behavior configureable ? |
Issues go stale after 90d of inactivity. Prevent issues from auto-closing with an If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or |
/remove-lifecycle stale |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Was this meant to be closed? It seems like @yujuhong meant to say /remove-lifecycle rotten? |
/remove-lifecycle rotten |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
When the node is reaching OOM levels I guess I understand some SIGKILLs happening but when a pod is reaching it's manually set resource limit it also gets a SIGKILL. As the initial post mentions, this can cause a lot of harm. As a workaround we're going to try and make the pod unhealthy before it reaches the memory limit to get a graceful shutdown. If I want this feature created, how should I go about it? Should I provide a PR with code changes or ping someone to make a proposal? |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
This remains an active issue. There appears to be no way to gracefully handle OOMKilled at the moment. @dashpole Can this be re-opened? |
/reopen |
/remove-lifecycle stale |
for reasons outlined in #40157 (comment) we can't just change the signal delivered. OTOH we can integrate with some OOM daemons, but this would require a separate discussion and KEP. |
Kubernetes does not use issues on this repo for support requests. If you have a question on how to use Kubernetes or to debug a specific issue, please visit our forums. /remove-kind feature |
@fromanirh: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@fromanirh this is not a support request, it's a legit feature request. |
@fromanirh Can you reopen this, please? It's clearly a feature request, not a support case. |
it is a feature request for the Kernel not for Kubernetes , the kernel generates the SIGKILL |
@aojea there non-kernel solutions here, such as triggering graceful shutdowns at a threshold memory usage (e.g. 95%) before the hard limit is reached. |
oh, that is not clear from the title and from the comments, sorry
so it should be retitled or open a new issue with the clear request ... and for sure that will need a KEP |
Sure, how about titling it: "Add graceful memory usage-based SIGTERM before hard OOM kill happens" |
Is there another Issue that's been recreated for this? I can't find one in the Issues list. If not, I can create a new feature request issue. |
So an issue has been created already ? if so, would be nice to reference the link here. |
@ffromani this is not a support request, this is a feature request that has been bumped for 6 years now, and i just found a usecase for, just as countless users can find usecases for here Please reopen the issue |
Adding my own voice -- a service being unable to shutdown gracefully can cause all kinds of harm, and causes harm for my organization as well. I thought kube already sent a SIGTERM first and was trying to substantiate that. Is this feature request still un-implemented? When memory limit is reached, pods are SIGKILLed immediately with no chance to shutdown gracefully? |
/reopen there's obvious interest in this feature, but would surely need a KEP and someone sheparding the feature |
@ffromani: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/remove-kind support |
FWIW my user story is: I have a Ruby application (developed not by us) that has a slight memory leak over many days, it will balloon to a certain amount of memory, and then gradually keep adding memory. For this, I started using Sometimes, these containers use imagemagick or ffmpeg to process some media, and these can balloon memory usage. The container can shut itself down and kick the job back to the queue when this happens, in a way that does not leave resident keepalive queues (sidekiq) and other things that need to be cleaned out every so often. Implementation ideas: The container that has been told a SIGTERM for overriding the memory limit gets put first as a candidate for the OOMkiller, this could mean that, even if the application ignores the SIGTERM (or requires more resources for shutting down), it can overprovision the remaining system resources before being killed itself. This could be a toggle per-container, from "hard" OOM kill to a "hard" OOM kill (default). |
Atm apps that go over the memory limit are hard killed 'OOMKilled', which is bad (losing state / not running cleanup code etc)
Is there a way to get SIGTERM instead (with a grace period or 100m before reaching the limit) ?
The text was updated successfully, but these errors were encountered: