-
Notifications
You must be signed in to change notification settings - Fork 40.6k
deleting namespace stuck at "Terminating" state #60807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
/sig api-machinery |
@shean-guangchang Do you have some way to reproduce this? And out of curiosity, are you using any CRDs? We faced this problem with TPRs previously. |
/kind bug |
I seem to be experiencing this issue with a rook deployment:
I think it does have something to do with their CRD, I see this in the API server logs:
I've deployed rook to a different namespace now, but I'm not able to create the cluster CRD:
Seems like the CRD was never cleaned up:
|
I have a fission namespace in a similar state:
Fission also uses CRDs, however, they appear to be cleaned up. |
@shean-guangchang - I had the same issue. I've deleted everything under the namespaces manually, deleted and purged everything from "helm" and restarted the master nodes one by one and it fixed the issue. I imagine what i've encountered has something to do with "ark", "tiller" and Kuberenets all working together (i bootstraped using helm and backed-up using ark) so this may not be a Kuberenets issue per say, on the other hand, it was pretty much impossible to troubleshot because there are no relevant logs. |
if it is the rook one, take a look at this: rook/rook#1488 (comment) |
I guess that makes sense, but it seems buggy that it's possible to get a namespace into an undeletable state. |
I have a similar environment (Ark & Helm) with @barakAtSoluto and have the same issue. Purging and restarting the masters didn't fix it for me though. Still stuck at terminating. |
I had that too when trying to recreate the problem. I eventually had to create a new cluster.... |
I'm also seeing this too, on a cluster upgraded from 1.8.4 to 1.9.6. I don't even know what logs to look at |
The same issue on 1.10.1 :( |
Same issue on 1.9.6 Edit: The namespace couldn't be deleted because of some pods hanging. I did a delete with --grace-period=0 --force on them all and after a couple of minutes the namespace was deleted as well. |
Hey, I've got this over and over again and it's most of the time some trouble with finalizers. If a namespace is stuck, try to |
@xetys is it safe? in my case there is only one finalizer named "kubernetes". |
That's strange, I've never seen such a finalizer. I just can speak based in my experience. I did that several time in a production cluster and it's still alive |
Same issue on 1.10.5. I tried all advice in this issue without result. I was able to get rid of the pods, but the namespace is still hanging. |
Actually, the ns too got deleted after a while. |
It would be good to understand what causes this behavior, the only finalizer I had is kubernetes. I also have dynamic webhooks, can these be related? |
@xetys Well, finally I used your trick on the replicas inside that namespace. They had some custom finalizer that probably no longer existed, so I couldn't delete them. When I removed the references to that finalizer, they disappeared and so did the namespace. Thanks! :) |
Same issue on an EKS 1.10.3 cluster:
|
Having the same problem on a bare metal cluster:
My namespace looks like so:
It's actually the second namespace I've had have this problem. |
Try this to get the actual list of all things in your namespace: kubernetes/kubectl#151 (comment) Then for each object do |
removing the initializer did the trick for me... |
When I do Also, when trying what you suggested @adampl I get no output (removing |
@ManifoldFR , I had the same issue as yours and I managed to make it work by making an API call with json file .
and it should delete your namespace, |
In my case, I had to manually delete my ingress load balancer from the GCP Network Service console. I had manually created the load balancer frontend directly in the console. Once I deleted the load balancer the namespace was automatically deleted. I'm suspecting that Kubernetes didn't want to delete since the state of the load balancer was different than the state in the manifest. I will try to automate the ingress frontend creation using annotations next to see if I can resolve this issue. |
you are a star it worked |
Tried a lot of solutions but this is the one that worked for me. Thank you! |
|
This should really be the "accepted" answer - it completely resolved the root of this issue! Take from the link above:
With that being said, I wrote a little microservice to run as a CronJob every hour that automatically deletes Terminating namespaces. You can find it here: https://github.com/oze4/service.remove-terminating-namespaces |
I wrote a little microservice to run as a CronJob every hour that automatically deletes Terminating namespaces. You can find it here: https://github.com/oze4/service.remove-terminating-namespaces |
Yet another oneliner:
But deleting stuck namespaces is not a good solution. Right way is to find out why it's stuck. Very common reason is there's an unavailable API service(s) which prevents cluster from finalizing namespaces.
Deleting it solved the problem
|
good job. |
I had a similar issue on 1.18 in a lab k8s cluster and adding a note to maybe help others. I had been working with the metrics API and with custom metrics in particular. After deleting those k8s objects to recreate it, it stalled on deleting the namespace with an error that the metrics api endpoint could not be found. Putting that back in on another namespace, everything cleared up immediately. This was in the namespace under status.conditions.message:
|
Definitely the cleanest one liner! It's important to note that none of these "solutions" actually solve the root issue. See here for the correct solutionThat is the message would should be spreading 😄 not "yet another one liner". |
This solution solves one of the all possibilities. To look for all possible root causes and fix them, I use this script: https://github.com/thyarles/knsk |
@thyarles very nice! |
I encounter the same problem: # sudo kubectl get ns
NAME STATUS AGE
cattle-global-data Terminating 8d
cattle-global-nt Terminating 8d
cattle-system Terminating 8d
cert-manager Active 8d
default Active 10d
ingress-nginx Terminating 9d
kube-node-lease Active 10d
kube-public Active 10d
kube-system Active 10d
kubernetes-dashboard Terminating 4d6h
local Active 8d
p-2sfgk Active 8d
p-5kdx9 Active 8d
# sudo kubectl get all -n kubernetes-dashboard
No resources found in kubernetes-dashboard namespace.
# sudo kubectl get namespace kubernetes-dashboard -o json
{
"apiVersion": "v1",
"kind": "Namespace",
"metadata": {
"annotations": {
"cattle.io/status": "{\"Conditions\":[{\"Type\":\"ResourceQuotaInit\",\"Status\":\"True\",\"Message\":\"\",\"LastUpdateTime\":\"2020-09-29T01:15:46Z\"},{\"Type\":\"InitialRolesPopulated\",\"Status\":\"True\",\"Message\":\"\",\"LastUpdateTime\":\"2020-09-29T01:15:46Z\"}]}",
"kubectl.kubernetes.io/last-applied-configuration": "{\"apiVersion\":\"v1\",\"kind\":\"Namespace\",\"metadata\":{\"annotations\":{},\"name\":\"kubernetes-dashboard\"}}\n",
"lifecycle.cattle.io/create.namespace-auth": "true"
},
"creationTimestamp": "2020-09-29T01:15:45Z",
"deletionGracePeriodSeconds": 0,
"deletionTimestamp": "2020-10-02T07:59:52Z",
"finalizers": [
"controller.cattle.io/namespace-auth"
],
"managedFields": [
{
"apiVersion": "v1",
"fieldsType": "FieldsV1",
"fieldsV1": {
"f:metadata": {
"f:annotations": {
"f:cattle.io/status": {},
"f:lifecycle.cattle.io/create.namespace-auth": {}
},
"f:finalizers": {
".": {},
"v:\"controller.cattle.io/namespace-auth\"": {}
}
}
},
"manager": "Go-http-client",
"operation": "Update",
"time": "2020-09-29T01:15:45Z"
},
{
"apiVersion": "v1",
"fieldsType": "FieldsV1",
"fieldsV1": {
"f:metadata": {
"f:annotations": {
".": {},
"f:kubectl.kubernetes.io/last-applied-configuration": {}
}
}
},
"manager": "kubectl-client-side-apply",
"operation": "Update",
"time": "2020-09-29T01:15:45Z"
},
{
"apiVersion": "v1",
"fieldsType": "FieldsV1",
"fieldsV1": {
"f:status": {
"f:phase": {}
}
},
"manager": "kube-controller-manager",
"operation": "Update",
"time": "2020-10-02T08:13:49Z"
}
],
"name": "kubernetes-dashboard",
"resourceVersion": "3662184",
"selfLink": "/api/v1/namespaces/kubernetes-dashboard",
"uid": "f1944b81-038b-48c2-869d-5cae30864eaa"
},
"spec": {},
"status": {
"conditions": [
{
"lastTransitionTime": "2020-10-02T08:13:49Z",
"message": "All resources successfully discovered",
"reason": "ResourcesDiscovered",
"status": "False",
"type": "NamespaceDeletionDiscoveryFailure"
},
{
"lastTransitionTime": "2020-10-02T08:11:49Z",
"message": "All legacy kube types successfully parsed",
"reason": "ParsedGroupVersions",
"status": "False",
"type": "NamespaceDeletionGroupVersionParsingFailure"
},
{
"lastTransitionTime": "2020-10-02T08:11:49Z",
"message": "All content successfully deleted, may be waiting on finalization",
"reason": "ContentDeleted",
"status": "False",
"type": "NamespaceDeletionContentFailure"
},
{
"lastTransitionTime": "2020-10-02T08:11:49Z",
"message": "All content successfully removed",
"reason": "ContentRemoved",
"status": "False",
"type": "NamespaceContentRemaining"
},
{
"lastTransitionTime": "2020-10-02T08:11:49Z",
"message": "All content-preserving finalizers finished",
"reason": "ContentHasNoFinalizers",
"status": "False",
"type": "NamespaceFinalizersRemaining"
}
],
"phase": "Terminating"
}
# sudo kubectl version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.2", GitCommit:"f5743093fd1c663cb0cbc89748f730662345d44d", GitTreeState:"clean", BuildDate:"2020-09-16T13:41:02Z", GoVersion:"go1.15", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.2", GitCommit:"f5743093fd1c663cb0cbc89748f730662345d44d", GitTreeState:"clean", BuildDate:"2020-09-16T13:32:58Z", GoVersion:"go1.15", Compiler:"gc", Platform:"linux/amd64"} |
You can use
|
Just copy and paste in your terminal for NS in $(kubectl get ns 2>/dev/null | grep Terminating | cut -f1 -d ' '); do
kubectl get ns $NS -o json > /tmp/$NS.json
sed -i '' "s/\"kubernetes\"//g" /tmp/$NS.json
kubectl replace --raw "/api/v1/namespaces/$NS/finalize" -f /tmp/$NS.json
done |
this worked for me, and I ran after verifying there were no dangling k8s objects in the ns. Thanks! |
I used this to remove a namespace stuck at Terminated example :
|
For all the googlers who bumped into stuck namespaces at Terminating on Rancher specific namespaces (e.g cattle-system), the following modified command (grebois's original) worked for me:
|
Folks, just FYI, when the video for this kubecon talk is out I plan to link to it and some of the helpful comments above, and lock this issue. |
I recorded a 10 minute explanation of what is going on and presented it at this SIG Deep Dive session. Here's a correct comment with 65 upvotes Mentioned several times above, this medium post is an example of doing things the right way. Find and fix the broken api service. All the one liners that just remove the finalizers on the namespace do address the root cause and leave your cluster subtly broken, which will bite you later. So please don't do that. The root cause fix is usually easier anyway. It seems that people like to post variations on this theme even though there's numerous correct answers in the thread already, so I'm going to lock the issue now, to ensure that this comment stays at the bottom. |
I am using v1.8.4 and I am having the problem that deleted namespace stays at "Terminating" state forever. I did "kubectl delete namespace XXXX" already.
The text was updated successfully, but these errors were encountered: