Skip to content

deleting namespace stuck at "Terminating" state #60807

Closed
@shean-guangchang

Description

@shean-guangchang

I am using v1.8.4 and I am having the problem that deleted namespace stays at "Terminating" state forever. I did "kubectl delete namespace XXXX" already.

Activity

added
needs-sigIndicates an issue or PR lacks a `sig/foo` label and requires one.
on Mar 5, 2018
dims

dims commented on Mar 7, 2018

@dims
Member

/sig api-machinery

added
sig/api-machineryCategorizes an issue or PR as relevant to SIG API Machinery.
and removed
needs-sigIndicates an issue or PR lacks a `sig/foo` label and requires one.
on Mar 7, 2018
nikhita

nikhita commented on Mar 10, 2018

@nikhita
Member

@shean-guangchang Do you have some way to reproduce this?

And out of curiosity, are you using any CRDs? We faced this problem with TPRs previously.

nikhita

nikhita commented on Mar 10, 2018

@nikhita
Member

/kind bug

oliviabarrick

oliviabarrick commented on Mar 14, 2018

@oliviabarrick

I seem to be experiencing this issue with a rook deployment:

➜  tmp git:(master) ✗ kubectl delete namespace rook
Error from server (Conflict): Operation cannot be fulfilled on namespaces "rook": The system is ensuring all content is removed from this namespace.  Upon completion, this namespace will automatically be purged by the system.
➜  tmp git:(master) ✗ 

I think it does have something to do with their CRD, I see this in the API server logs:

E0314 07:28:18.284942       1 crd_finalizer.go:275] clusters.rook.io failed with: timed out waiting for the condition
E0314 07:28:18.287629       1 crd_finalizer.go:275] clusters.rook.io failed with: Operation cannot be fulfilled on customresourcedefinitions.apiextensions.k8s.io "clusters.rook.io": the object has been modified; please apply your changes to the latest version and try again

I've deployed rook to a different namespace now, but I'm not able to create the cluster CRD:

➜  tmp git:(master) ✗ cat rook/cluster.yaml 
apiVersion: rook.io/v1alpha1
kind: Cluster
metadata:
  name: rook
  namespace: rook-cluster
spec:
  dataDirHostPath: /var/lib/rook-cluster-store
➜  tmp git:(master) ✗ kubectl create -f rook/
Error from server (MethodNotAllowed): error when creating "rook/cluster.yaml": the server does not allow this method on the requested resource (post clusters.rook.io)

Seems like the CRD was never cleaned up:

➜  tmp git:(master) ✗ kubectl get customresourcedefinitions clusters.rook.io -o yaml
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  creationTimestamp: 2018-02-28T06:27:45Z
  deletionGracePeriodSeconds: 0
  deletionTimestamp: 2018-03-14T07:36:10Z
  finalizers:
  - customresourcecleanup.apiextensions.k8s.io
  generation: 1
  name: clusters.rook.io
  resourceVersion: "9581429"
  selfLink: /apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions/clusters.rook.io
  uid: 7cd16376-1c50-11e8-b33e-aeba0276a0ce
spec:
  group: rook.io
  names:
    kind: Cluster
    listKind: ClusterList
    plural: clusters
    singular: cluster
  scope: Namespaced
  version: v1alpha1
status:
  acceptedNames:
    kind: Cluster
    listKind: ClusterList
    plural: clusters
    singular: cluster
  conditions:
  - lastTransitionTime: 2018-02-28T06:27:45Z
    message: no conflicts found
    reason: NoConflicts
    status: "True"
    type: NamesAccepted
  - lastTransitionTime: 2018-02-28T06:27:45Z
    message: the initial names have been accepted
    reason: InitialNamesAccepted
    status: "True"
    type: Established
  - lastTransitionTime: 2018-03-14T07:18:18Z
    message: CustomResource deletion is in progress
    reason: InstanceDeletionInProgress
    status: "True"
    type: Terminating
➜  tmp git:(master) ✗ 
oliviabarrick

oliviabarrick commented on Mar 14, 2018

@oliviabarrick

I have a fission namespace in a similar state:

➜  tmp git:(master) ✗ kubectl delete namespace fission
Error from server (Conflict): Operation cannot be fulfilled on namespaces "fission": The system is ensuring all content is removed from this namespace.  Upon completion, this namespace will automatically be purged by the system.
➜  tmp git:(master) ✗ kubectl get pods -n fission     
NAME                          READY     STATUS        RESTARTS   AGE
storagesvc-7c5f67d6bd-72jcf   0/1       Terminating   0          8d
➜  tmp git:(master) ✗ kubectl delete pod/storagesvc-7c5f67d6bd-72jcf --force --grace-period=0
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
Error from server (NotFound): pods "storagesvc-7c5f67d6bd-72jcf" not found
➜  tmp git:(master) ✗ kubectl describe pod -n fission storagesvc-7c5f67d6bd-72jcf
Name:                      storagesvc-7c5f67d6bd-72jcf
Namespace:                 fission
Node:                      10.13.37.5/10.13.37.5
Start Time:                Tue, 06 Mar 2018 07:03:06 +0000
Labels:                    pod-template-hash=3719238268
                           svc=storagesvc
Annotations:               <none>
Status:                    Terminating (expires Wed, 14 Mar 2018 06:41:32 +0000)
Termination Grace Period:  30s
IP:                        10.244.2.240
Controlled By:             ReplicaSet/storagesvc-7c5f67d6bd
Containers:
  storagesvc:
    Container ID:  docker://3a1350f6e4871b1ced5c0e890e37087fc72ed2bc8410d60f9e9c26d06a40c457
    Image:         fission/fission-bundle:0.4.1
    Image ID:      docker-pullable://fission/fission-bundle@sha256:235cbcf2a98627cac9b0d0aae6e4ea4aac7b6e6a59d3d77aaaf812eacf9ef253
    Port:          <none>
    Command:
      /fission-bundle
    Args:
      --storageServicePort
      8000
      --filePath
      /fission
    State:          Terminated
      Exit Code:    0
      Started:      Mon, 01 Jan 0001 00:00:00 +0000
      Finished:     Mon, 01 Jan 0001 00:00:00 +0000
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /fission from fission-storage (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from fission-svc-token-zmsxx (ro)
Conditions:
  Type           Status
  Initialized    True 
  Ready          False 
  PodScheduled   True 
Volumes:
  fission-storage:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  fission-storage-pvc
    ReadOnly:   false
  fission-svc-token-zmsxx:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  fission-svc-token-zmsxx
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:          <none>
➜  tmp git:(master) ✗ 

Fission also uses CRDs, however, they appear to be cleaned up.

barakAtSoluto

barakAtSoluto commented on Mar 22, 2018

@barakAtSoluto

@shean-guangchang - I had the same issue. I've deleted everything under the namespaces manually, deleted and purged everything from "helm" and restarted the master nodes one by one and it fixed the issue.

I imagine what i've encountered has something to do with "ark", "tiller" and Kuberenets all working together (i bootstraped using helm and backed-up using ark) so this may not be a Kuberenets issue per say, on the other hand, it was pretty much impossible to troubleshot because there are no relevant logs.

xetys

xetys commented on Mar 23, 2018

@xetys

if it is the rook one, take a look at this: rook/rook#1488 (comment)

oliviabarrick

oliviabarrick commented on Mar 23, 2018

@oliviabarrick

I guess that makes sense, but it seems buggy that it's possible to get a namespace into an undeletable state.

OguzPastirmaci

OguzPastirmaci commented on Apr 26, 2018

@OguzPastirmaci

I have a similar environment (Ark & Helm) with @barakAtSoluto and have the same issue. Purging and restarting the masters didn't fix it for me though. Still stuck at terminating.

barakAtSoluto

barakAtSoluto commented on Apr 29, 2018

@barakAtSoluto

I had that too when trying to recreate the problem. I eventually had to create a new cluster....
Exclude - default, kube-system/public and all ark related namespaces from backup and restore to prevent this from happening...

jaxxstorm

jaxxstorm commented on May 3, 2018

@jaxxstorm

I'm also seeing this too, on a cluster upgraded from 1.8.4 to 1.9.6. I don't even know what logs to look at

225 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.priority/important-soonMust be staffed and worked on either currently, or very soon, ideally in time for the next release.sig/api-machineryCategorizes an issue or PR as relevant to SIG API Machinery.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @FooBarWidget@squarelover@dims@fabiand@timothysc

        Issue actions

          deleting namespace stuck at "Terminating" state · Issue #60807 · kubernetes/kubernetes