StatefulSet - can't rollback from a broken state

/kind bug

**What happened**:

I updated a StatefulSet with a non-existent Docker image. As expected, a pod of the statefulset is destroyed and can't be recreated (ErrImagePull). However, when I change back the StatefulSet with an existing image, the StatefulSet doesn't try to remove the broken pod to replace it by a good one. It keeps trying to pull the non-existing image.
You have to delete the broken pod manually to unblock the situation.

[Related Stackoverflow question](https://stackoverflow.com/questions/48894414/kubernetes-statefulset-pod-startup-error-recovery)

**What you expected to happen**:

When rolling back the bad config, I expected the StatefulSet to remove the broken pod and replace it by a good one.

**How to reproduce it (as minimally and precisely as possible)**:

1. Deploy the following StatefulSet:
```
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: web
spec:
  selector:
    matchLabels:
      app: nginx # has to match .spec.template.metadata.labels
  serviceName: "nginx"
  replicas: 3 # by default is 1
  template:
    metadata:
      labels:
        app: nginx # has to match .spec.selector.matchLabels
    spec:
      terminationGracePeriodSeconds: 10
      containers:
      - name: nginx
        image: k8s.gcr.io/nginx-slim:0.8
        ports:
        - containerPort: 80
          name: web
        volumeMounts:
        - name: www
          mountPath: /usr/share/nginx/html
  volumeClaimTemplates:
  - metadata:
      name: www
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: "standard"
      resources:
        requests:
          storage: 10Gi
```

2. Once the 3 pods are running, update the StatefulSet spec and change the image to `k8s.gcr.io/nginx-slim:foobar`
3. Observe the new pod failing to pull the image.
4. Roll back the change.
5. Observe the broken pod not being deleted.

**Anything else we need to know?**:

* I observed this behaviour both on 1.8 and 1.10.
* This seems related to the discussion in #18568

**Environment**:
- Kubernetes version (use `kubectl version`):
```
Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.7", GitCommit:"dd5e1a2978fd0b97d9b78e1564398aeea7e7fe92", GitTreeState:"clean", BuildDate:"2018-04-19T00:05:56Z", GoVersion:"go1.9
.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"10+", GitVersion:"v1.10.5-gke.3", GitCommit:"6265b9797fc8680c8395abeab12c1e3bad14069a", GitTreeState:"clean", BuildDate:"2018-07-19T23:02:51Z", GoVersi
on:"go1.9.3b4", Compiler:"gc", Platform:"linux/amd64"}
```
- Cloud provider or hardware configuration: Google Kubernetes Engine
- OS (e.g. from /etc/os-release): COS

cc @joe-boyce

	if !isRunningAndReady(replicas[i]) && monotonic {
	glog.V(4).Infof(
	"StatefulSet %s/%s is waiting for Pod %s to be Running and Ready",
	set.Namespace,
	set.Name,
	replicas[i].Name)
	return &status, nil
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

StatefulSet - can't rollback from a broken state #67250

138 remaining items

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

StatefulSet - can't rollback from a broken state #67250

Description

Activity

MrTrustor commented on Aug 10, 2018

joe-boyce commented on Aug 20, 2018

enisoc commented on Aug 21, 2018

mattmb commented on Sep 7, 2018

fejta-bot commented on Dec 6, 2018

138 remaining items

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions