Skip to content

Orphaned pod found - but volume paths are still present on disk #60987

Closed
@patrickstjohn

Description

@patrickstjohn

Is this a BUG REPORT or FEATURE REQUEST?:
BUG

What happened:
Kubelet is periodically going into an error state and causing errors with our storage layer (ceph, shared filesystem). Upon cleaning out the orphaned pod directory things eventually right themselves.

  • Workaround: rmdir /var/lib/kubelet/pods/*/volumes/*rook/*

What you expected to happen:
Kubelet should intelligently deal with orphaned pods. Cleaning a stale directory manually should not be required.

How to reproduce it (as minimally and precisely as possible):
Using rook-0.7.0 (this isn't a rook problem as far as I can tell but this is how we're reproducing):
kubectl create -f rook-operator.yaml
kubectl create -f rook-cluster.yaml
kubectl create -f rook-filesystem.yaml

Mount/write to the shared filesystem and monitor /var/log/messages for the following:
kubelet: E0309 16:46:30.429770 3112 kubelet_volumes.go:128] Orphaned pod "2815f27a-219b-11e8-8a2a-ec0d9a3a445a" found, but volume paths are still present on disk : There were a total of 1 errors similar to this. Turn up verbosity to see them.

Anything else we need to know?:
This looks identical to the following: #45464 but for a different plugin.

Environment:

  • Kubernetes version (use kubectl version):
    Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.3", GitCommit:"d2835416544f298c919e2ead3be3d0864b52323b", GitTreeState:"clean", BuildDate:"2018-02-07T12:22:21Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.3", GitCommit:"d2835416544f298c919e2ead3be3d0864b52323b", GitTreeState:"clean", BuildDate:"2018-02-07T11:55:20Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}

  • Cloud provider or hardware configuration:
    Bare-metal private cloud

  • OS (e.g. from /etc/os-release):
    Red Hat Enterprise Linux Server release 7.4 (Maipo)

  • Kernel (e.g. uname -a):
    Linux 4.4.115-1.el7.elrepo.x86_64 Unit test coverage in Kubelet is lousy. (~30%) #1 SMP Sat Feb 3 20:11:41 EST 2018 x86_64 x86_64 x86_64 GNU/Linux

  • Install tools:
    kubeadm

Activity

added
needs-sigIndicates an issue or PR lacks a `sig/foo` label and requires one.
on Mar 9, 2018
mlmhl

mlmhl commented on Mar 10, 2018

@mlmhl
Contributor

/sig storage

added
sig/storageCategorizes an issue or PR as relevant to SIG Storage.
and removed
needs-sigIndicates an issue or PR lacks a `sig/foo` label and requires one.
on Mar 10, 2018
benceszikora

benceszikora commented on Apr 10, 2018

@benceszikora

I have the same issue, except that I can't even remove that directory:
rmdir: failed to remove ‘/var/lib/kubelet/pods/7b383940-3cc7-11e8-a78b-b8ca3a70880c/volumes/rook.io~rook/backups’: Device or resource busy

patrickstjohn

patrickstjohn commented on Apr 11, 2018

@patrickstjohn
Author

@iliketosneeze When we've run into that issue the only recourse is to unfortunately reboot the host. Once it comes up things seem to be in a clean state.

lknhd

lknhd commented on Apr 22, 2018

@lknhd

i also experience this issue using kubernetes v1.10.1. Manually deleting the directory solves the problem. But yes, kubelet should intelligently deal with orphaned pods

@iliketosneeze maybe you should try umounting the directory from tmpfs

bersace

bersace commented on Apr 23, 2018

@bersace

Same here with minikube :

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.1", GitCommit:"d4ab47518836c750f9949b9e0d387f20fb92260b", GitTreeState:"clean", BuildDate:"2018-04-12T14:26:04Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.0", GitCommit:"fc32d2f3698e36b93322a3465f63a14e9f0eaead", GitTreeState:"clean", BuildDate:"2018-03-26T16:44:10Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
$ journalctl -f
Apr 23 12:37:26 minikube kubelet[2886]: E0423 12:37:26.781919    2886 kubelet_volumes.go:140] Orphaned pod "a08c2261-3eec-11e8-83b3-a0ea30334065" found, but volume paths are still present on disk : There were a total of 1 errors similar to this. Turn up verbosity to see them.
Apr 23 12:37:28 minikube kubelet[2886]: E0423 12:37:28.789802    2886 kubelet_volumes.go:140] Orphaned pod "a08c2261-3eec-11e8-83b3-a0ea30334065" found, but volume paths are still present on disk : There were a total of 1 errors similar to this. Turn up verbosity to see them.
# find /var/lib/kubelet/pods/a08c2261-3eec-11e8-83b3-a0ea30334065/containers/kube-proxy/4ac98a82
/var/lib/kubelet/pods/a08c2261-3eec-11e8-83b3-a0ea30334065/containers/kube-proxy/4ac98a82
# ls /var/lib/kubelet/pods/a08c2261-3eec-11e8-83b3-a0ea30334065/volumes/
kubernetes.io~configmap  kubernetes.io~secret
$ 

That looks like an internal pod. I guess there is nothing to remove.

Overv

Overv commented on May 22, 2018

@Overv

We are seeing a similar issue with our own custom flexvolume in a Kubernetes 1.8.9 cluster. Is there any way to resolve this without restarting the host until there is an actual solution?

benceszikora

benceszikora commented on May 23, 2018

@benceszikora

@lukmanulhakimd that did help with removing them, but then new volume mounts failed as the host was stuck in uninterruptable I/O. I had to cold cycle to nodes in the end.

michaelkoro

michaelkoro commented on Jul 12, 2018

@michaelkoro

I'm having the same issue with kubernetes 1.8.5, rancher 1.6.13, docker-ce17.03.02.
I think the kubelet should be able to acknowledge this problem, which obviously doesn't not happen..

pvlltvk

pvlltvk commented on Jul 27, 2018

@pvlltvk

We're also having this issue with Kubernetes 1.9.6, Docker 17.03.1-ce and vSphere Cloud Provider for persistent storage.

minhdanh

minhdanh commented on Aug 2, 2018

@minhdanh

Having the same issue with Kubernetes 1.10.2, Docker 18.06.0-ce

175 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.sig/nodeCategorizes an issue or PR as relevant to SIG Node.sig/storageCategorizes an issue or PR as relevant to SIG Storage.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Participants

      @masterkain@redbaron@lisa@calder@owend

      Issue actions

        Orphaned pod found - but volume paths are still present on disk · Issue #60987 · kubernetes/kubernetes