-
Notifications
You must be signed in to change notification settings - Fork 40.6k
Kubelet does not delete evicted pods #55051
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
/sig node |
Evicted pods in 1.7.5 are a madness, the deletion of those pods are delayed by days!!!!, for example, I have a pod since 17 days was evicted an appear in the pod list: mynamespace nginxrepo-2549817493-0p91t 0/1 Evicted 0 17d In the case of nginxrepo, the deployment does not exist anymore, but the pods are present in the list of pods as evicted!!! Also delete pods that does not match node selector criteria: nfs-3970785943-5rnn7 0/2 MatchNodeSelector 0 17d After 17 days the pod appear in the list!!. This behavior affect for example in Grafana, because the pods appear in the list of available pods for monitoring, and of course, are evicted!!. By the way @rfranzke this is not a feature request, this is an issue!!!! Please, could you re-tag the case? Regards |
/kind bug |
/remove-kind feature |
Thank you @rfranzke |
is this a duplicate of #54525 from #54525 (comment) it sounds like this is intentional, though I'm not sure what is expected to clean up pods in this case |
The PodGCController in the controller manager? |
A quick workaround we use, is to delete all evicted pods manually after an incident: |
thank you @krallistic, I apply your workaround as cronjob a long time ago but is not a right way! |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
I suppose, this issue can be closed, because the evicted pods deletion can be controlled through settings in kube-controller-manager. For those k8s users who hit the kube-apiserver or etcd performance issues due to too many evicted pods, i would recommend updating the kube-controller-manager config to set Also ask yourself why are there so many evicted pods? Maybe your kube-scheduler keeps scheduling pods on a node which already reports DiskPressure or MemoryPressure? This could be the case if the kube-scheduler is configured with a custom
|
Thanks @kabakaev for pointing that out. Didn't know that this can be configured. Let's close the ticket then. |
@kabakaev - wouldn't pod gc cover all pods (including terminated pods for other reasons) - what if we just want evicted pods to be cleaned up periodically? |
the issue still opened, there no reply for @so0k commented on Feb 27, 2018 |
@so0k apiVersion: batch/v1beta1 Create the task with Check the status of the task with Delete the task with |
Statefulset will auto delete Failed pod kubernetes/pkg/controller/statefulset/stateful_set_control.go Lines 386 to 393 in 52eea97
|
Why does kubernetes keep evicted pod, and what is the purpose of this design? |
One guess for the reason: so that you can look at the failed Pods and see what's happening in the cluster more easily (both in the API and in the metrics). If the Pods immediately disappeared, you'd probably need to use logs to discover what's happening, which is arguably more difficult. |
When I have too many Evicted pods I use the following command: |
/kind feature
What happened:
Kubelet has evicted pods due to disk pressure. Eventually, the disk pressure went away and the pods were scheduled and started again, but the evicted pods remained in the list of pods (
kubectl get pod --show-all
).What you expected to happen:
Wouldn't it be better if the kubelet would have deleted those evicted pods? The expected behaviour would therefore be to not see the evicted pods anymore, i.e. that they get deleted.
How to reproduce it (as minimally and precisely as possible):
Start kubelet with
--eviction-hard
and--eviction-soft
with high thresholds or fill up the disk of a worker node.Environment:
kubectl version
): 1.8.2uname -a
): 4.12.10-coreosThe text was updated successfully, but these errors were encountered: