Skip to content

Rollback "xx" failed: no Endpoints with the name "database-pg" found #7967

Closed
@qingguee

Description

@qingguee

Output of helm version:

version.BuildInfo{Version:"v3.2.0-rc.1", GitCommit:"7bffac813db894e06d17bac91d14ea819b5c2310", GitTreeState:"clean", GoVersion:"go1.13.10"}

Output of kubectl version:

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.3", GitCommit:"435f92c719f279a3a67808c80521ea17d5715c66", GitTreeState:"clean", BuildDate:"2018-11-26T12:57:14Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.2", GitCommit:"c97fe5036ef3df2967d086711e6c0c405941e14b", GitTreeState:"clean", BuildDate:"2019-10-16T08:00:38Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}

Cloud Provider/Platform (AKS, GKE, Minikube etc.):
kubeadm

Problem description

Rollback failed, HELM can't find Endpoints

We have two version of a chart, 3.4.0-72 and 3.4.0-73

  • v3.4.0-72 : Contains two endpoints.
$ kubectl -n demo2 get ep
NAME                                     ENDPOINTS                AGE
database-pg           192.168.214.210:9187,192.168.214.210:5432   9m25s
database-pg-replica   192.168.200.200:9187,192.168.200.200:5432   10m
  • v3.4.0-73: We removed those two endpoints, let SVC create endpoints with label-selector

  • The upgrade was successful

  • Rollback failed

$ ./helm3.2 history demo2 -n demo2 
REVISION        UPDATED                         STATUS          CHART                                                   APP VERSION     DESCRIPTION
1               Wed Apr 22 12:49:30 2020        superseded      database-pg-3.4.0-72                 0.0.0-0         Install complete
2               Wed Apr 22 12:51:52 2020        superseded      database-pg-3.4.0-73-b7e958c23e      0.0.0-0         Upgrade complete
3               Wed Apr 22 12:54:26 2020        failed          database-pg-3.4.0-72                 0.0.0-0         Rollback "demo2" failed: no Endpoints with the name "database-pg" found
demo2@node-10:~
  • More log
$ ./helm3.2 rollback demo2 -n demo2 --debug
rollback.go:60: [debug] preparing rollback of demo2
rollback.go:108: [debug] rolling back demo2 (current: v2, target: v1)
rollback.go:67: [debug] creating rolled back release for demo2
rollback.go:73: [debug] performing rollback of demo2
client.go:258: [debug] Starting delete for "database-pg" Role
client.go:108: [debug] creating 1 resource(s)
client.go:258: [debug] Starting delete for "database-pg" RoleBinding
client.go:108: [debug] creating 1 resource(s)
client.go:258: [debug] Starting delete for "database-pg-hook" ServiceAccount
client.go:108: [debug] creating 1 resource(s)
client.go:258: [debug] Starting delete for "database-pg-hook" Role
client.go:108: [debug] creating 1 resource(s)
client.go:258: [debug] Starting delete for "database-pg-hook" RoleBinding
client.go:108: [debug] creating 1 resource(s)
client.go:258: [debug] Starting delete for "database-pg-hook-cleanup" Job
client.go:287: [debug] jobs.batch "database-pg-hook-cleanup" not found
client.go:108: [debug] creating 1 resource(s)
client.go:467: [debug] Watching for changes to Job database-pg-hook-cleanup with timeout of 5m0s
client.go:495: [debug] Add/Modify event for database-pg-hook-cleanup: ADDED
client.go:534: [debug] database-pg-hook-cleanup: Jobs active: 0, jobs failed: 0, jobs succeeded: 0
client.go:495: [debug] Add/Modify event for database-pg-hook-cleanup: MODIFIED
client.go:534: [debug] database-pg-hook-cleanup: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
client.go:495: [debug] Add/Modify event for database-pg-hook-cleanup: MODIFIED
client.go:258: [debug] Starting delete for "database-pg-hook-cleanup" Job
client.go:173: [debug] checking 8 resources for changes
rollback.go:166: [debug] warning: Rollback "demo2" failed: no Endpoints with the name "database-pg" found
Error: no Endpoints with the name "database-pg" found
helm.go:84: [debug] no Endpoints with the name "database-pg" found
helm.sh/helm/v3/pkg/kube.(*Client).Update.func1
       /home/circleci/helm.sh/helm/pkg/kube/client.go:201
helm.sh/helm/v3/pkg/kube.ResourceList.Visit
       /home/circleci/helm.sh/helm/pkg/kube/resource.go:32
helm.sh/helm/v3/pkg/kube.(*Client).Update
       /home/circleci/helm.sh/helm/pkg/kube/client.go:174
helm.sh/helm/v3/pkg/action.(*Rollback).performRollback
       /home/circleci/helm.sh/helm/pkg/action/rollback.go:162
helm.sh/helm/v3/pkg/action.(*Rollback).Run
       /home/circleci/helm.sh/helm/pkg/action/rollback.go:74
main.newRollbackCmd.func1
       /home/circleci/helm.sh/helm/cmd/helm/rollback.go:59
github.com/spf13/cobra.(*Command).execute
       /go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:842
github.com/spf13/cobra.(*Command).ExecuteC
       /go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:950
github.com/spf13/cobra.(*Command).Execute
       /go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:887
main.main
       /home/circleci/helm.sh/helm/cmd/helm/helm.go:83
runtime.main
       /usr/local/go/src/runtime/proc.go:203
runtime.goexit
       /usr/local/go/src/runtime/asm_amd64.s:1357
  • I can see that the rollback was still continued, POD was rolling upgraded, but the "history" has error. It will block next upgrade.
$ ./helm3.2 upgrade demo2--namespace demo2database-pg-3.4.0-73-b7e958c23e.tgz
Error: UPGRADE FAILED: "demo2" has no deployed releases

Additional info

  • I did more test for upgrade from v3.4.0-73 to v3.4.0-72
$ ./helm3.2 upgrade demo2 --namespace demo2 database-pg-3.4.0-72.tgz
Error: UPGRADE FAILED: rendered manifests contain a resource that already exists. Unable to continue with update: Endpoints "database-pg" in namespace "demo2" exists and cannot be imported into the current release: invalid ownership metadata; annotation validation error: missing key "meta.helm.sh/release-name": must be set to "demo2"; annotation validation error: missing key "meta.helm.sh/release-namespace": must be set to "demo2"
demo2@node-10:~

Expectation

HELM3 can rollback to v3.4.0-72 which has two endpoints. No error should seen.

Activity

qingguee

qingguee commented on Apr 23, 2020

@qingguee
Author

I had a scan in pkg/kube/client.go

198		originalInfo := original.Get(info)
199		if originalInfo == nil {
200			kind := info.Mapping.GroupVersionKind.Kind
201			return errors.Errorf("no %s with the name %q found", kind, info.Name)
202		}

I think above code has a problem. It does not consider my case.
During rollback

  • original: Endpoint was created by K8s for v3.4.0-73, but not HELM3
  • target: Endpoint SHOULD be re-created by HELM3 for 3.4.0-72

So, it was checking original and find Endpoint was not existed, then it return error.

Then WHY the NEW resource was not identified at line 179 and 180? It suppose be the new resource creation handling with return .

179		helper := resource.NewHelper(info.Client, info.Mapping)
180		if _, err := helper.Get(info.Namespace, info.Name, info.Export); err != nil {
.......
188			// Since the resource does not exist, create it.
189			if err := createResource(info); err != nil {
190				return errors.Wrap(err, "failed to create resource")
191			}

Because the Endpoint was created by K8s, not HELM3.

Not sure my analysis is correct or not, need someone from HELM to confirm.

BRs,
qingguee

added
bugCategorizes issue or PR as related to a bug.
on Apr 23, 2020
bacongobbler

bacongobbler commented on Apr 23, 2020

@bacongobbler
Member

Hi @qingguee, without a chart to test this behaviour, we cannot help you. Please provide a sample along with a set on instructions that we can use to verify the behaviour you are describing. Thanks.

qingguee

qingguee commented on Apr 24, 2020

@qingguee
Author

Sure.
I will prepare two example chart and detail reproduce step to help identify the root cause.
I guess this weekend.

BRs,
qingguee

qingguee

qingguee commented on Apr 26, 2020

@qingguee
Author

Hi @bacongobbler

I have create two example chart for reduce this issue. Also describe our use case.

Get example charts for rollback issue

They can find at

git clone https://github.com/qingguee/resource.git

Reproduce the issue

git clone https://github.com/qingguee/resource.git

./helm3.2 -n nsexample install example-rollback resource/example-0.1.9/example/

kubectl -n nsexample get svc,po,ep -o wide
NAME                       TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE   SELECTOR
service/example-rollback   ClusterIP   10.98.113.161   <none>        80/TCP    84s   <none>

NAME                     READY   STATUS    RESTARTS   AGE   IP                NODE                 NOMINATED NODE   READINESS GATES
pod/example-rollback-0   1/1     Running   0          84s   192.168.214.217   node-10   <none>           <none>

NAME                         ENDPOINTS   AGE
endpoints/example-rollback   <none>      84s

./helm3.2 -n nsexample upgrade example-rollback resource/example-0.2.1/example/


kubectl -n nsexample get svc,po,ep -o wide
NAME                       TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE     SELECTOR
service/example-rollback   ClusterIP   10.98.113.161   <none>        80/TCP    2m32s   app.kubernetes.io/instance=example-rollback,app.kubernetes.io/name=example

NAME                     READY   STATUS    RESTARTS   AGE   IP                NODE                 NOMINATED NODE   READINESS GATES
pod/example-rollback-0   1/1     Running   0          26s   192.168.214.228   node-10   <none>           <none>

NAME                         ENDPOINTS            AGE
endpoints/example-rollback   192.168.214.228:80   34s


$ ./helm3.2 -n nsexample history example-rollback
REVISION        UPDATED                         STATUS          CHART           APP VERSION     DESCRIPTION
1               Sun Apr 26 02:49:04 2020        superseded      example-0.1.9   1.16.0          Install complete
2               Sun Apr 26 02:51:01 2020        deployed        example-0.2.1   1.16.0          Upgrade complete

$ ./helm3.2 -n nsexample rollback example-rollback 1
Error: no Endpoints with the name "example-rollback" found

Diff two chart

  • example-0.1.9:

    1. service.yaml does not have selector (headless service)
    2. ep-for-service.yaml defined for service as endpoint (In this example, subset is empty.)
  • example-0.2.1

    1. service.yaml have selector
    2. no endpoint defined.

Use case

  1. We was using headless service(no selector) and endpoint in our chart. Our program managing the endpoint subset at runtime.
  2. In new version, we want to use normal service to replace previous headless service. Let K8s to mange endpoint.

Then we met this rollback issue.

qingguee

qingguee commented on Apr 27, 2020

@qingguee
Author

Root cause
Endpoint could be created by K8S, not HELM.
Below code will get wrong feedback when handling endpoints during rollback. It will return nil for err, then skip the endpoint creation.

179		helper := resource.NewHelper(info.Client, info.Mapping)
180		if _, err := helper.Get(info.Namespace, info.Name, info.Export); err != nil {

Proposal

  1. During rollback, When origin does not have endpoint, but target has endpoint, the endpoint should be recreated.
github-actions

github-actions commented on Aug 21, 2020

@github-actions

This issue has been marked as stale because it has been open for 90 days with no activity. This thread will be automatically closed in 30 days if no further activity occurs.

bridgetkromhout

bridgetkromhout commented on Sep 4, 2020

@bridgetkromhout
Member

Hi, @qingguee! Thank you for your ideas to improve Helm. If you'd like, you can submit a Helm Improvement Proposal so as to work in the community to make your ideas a reality: https://github.com/helm/community/blob/master/hips/hip-0001.md - meanwhile, I will close this issue. Thanks!

sig-abyreddy

sig-abyreddy commented on May 27, 2022

@sig-abyreddy

with regards to above comment from @qingguee i.e. #7967 (comment)

I have faced a similar issue with this piece of code while trying to install ingress-nginx helm chart. Here is the debug log,

history.go:52: [debug] getting history for release infra
upgrade.go:84: [debug] preparing upgrade for infra
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
upgrade.go:92: [debug] performing update for infra
upgrade.go:234: [debug] creating upgraded release for infra
client.go:163: [debug] checking 55 resources for changes
client.go:184: [debug] Created a new PodDisruptionBudget called "infra-nginx-ingress-controller" in infra

client.go:403: [debug] Looks like there are no changes for PodDisruptionBudget "infra-nginx-internal-ingress-controller"
client.go:403: [debug] Looks like there are no changes for ServiceAccount "infra-nginx-internal-ingress"
client.go:403: [debug] Looks like there are no changes for ServiceAccount "infra-nginx-internal-ingress-backend"
client.go:403: [debug] Looks like there are no changes for ServiceAccount "infra-velero-server"
client.go:403: [debug] Looks like there are no changes for Secret "reject-image-webhook"
client.go:403: [debug] Looks like there are no changes for ClusterRole "infra-nginx-internal-ingress"
client.go:184: [debug] Created a new ClusterRole called "infra-reloader-role" in 

client.go:403: [debug] Looks like there are no changes for ClusterRoleBinding "infra-nginx-internal-ingress"
client.go:184: [debug] Created a new ClusterRoleBinding called "infra-reloader-role-binding" in 

client.go:403: [debug] Looks like there are no changes for ClusterRoleBinding "infra-velero-server"
client.go:403: [debug] Looks like there are no changes for Role "infra-nginx-internal-ingress"
client.go:403: [debug] Looks like there are no changes for Role "infra-velero-server"
client.go:403: [debug] Looks like there are no changes for RoleBinding "infra-nginx-internal-ingress"
client.go:403: [debug] Looks like there are no changes for RoleBinding "infra-velero-server"
client.go:184: [debug] Created a new Service called "infra-nginx-ingress-defaultbackend" in infra

client.go:403: [debug] Looks like there are no changes for Service "infra-nginx-internal-ingress-controller-metrics"
client.go:403: [debug] Looks like there are no changes for Service "infra-nginx-internal-ingress-controller"
client.go:403: [debug] Looks like there are no changes for Service "infra-nginx-internal-ingress-default-backend"
client.go:403: [debug] Looks like there are no changes for Service "reject-image-webhook"
client.go:403: [debug] Looks like there are no changes for Service "infra-velero"
client.go:184: [debug] Created a new Deployment called "infra-nginx-ingress-defaultbackend" in infra

client.go:403: [debug] Looks like there are no changes for Deployment "reject-image-webhook"
client.go:403: [debug] Looks like there are no changes for BackupStorageLocation "default"
client.go:184: [debug] Created a new IngressClass called "nginx" in

upgrade.go:293: [debug] warning: Upgrade "infra" failed: no IngressClass with the name "nginx" found
upgrade.go:311: [debug] Upgrade failed and atomic is set, rolling back to last successful release
history.go:52: [debug] getting history for release infra
rollback.go:60: [debug] preparing rollback of infra
rollback.go:108: [debug] rolling back infra (current: v2, target: v1)
rollback.go:67: [debug] creating rolled back release for infra
rollback.go:73: [debug] performing rollback of infra
client.go:163: [debug] checking 53 resources for changes

Helm is trying to locate a newly created resource from an earlier release's state, which it shouldn't. As a result of that, helm is trying to rollback the release back to the stable state.

client.go:184: [debug] Created a new IngressClass called "nginx" in

upgrade.go:293: [debug] warning: Upgrade "infra" failed: no IngressClass with the name "nginx" found

Helm version: v3.1.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugCategorizes issue or PR as related to a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @bacongobbler@bridgetkromhout@qingguee@sig-abyreddy

        Issue actions

          Rollback "xx" failed: no Endpoints with the name "database-pg" found · Issue #7967 · helm/helm