Description
Is this a BUG REPORT or FEATURE REQUEST?:
BUG
What happened:
Kubelet is periodically going into an error state and causing errors with our storage layer (ceph, shared filesystem). Upon cleaning out the orphaned pod directory things eventually right themselves.
- Workaround:
rmdir /var/lib/kubelet/pods/*/volumes/*rook/*
What you expected to happen:
Kubelet should intelligently deal with orphaned pods. Cleaning a stale directory manually should not be required.
How to reproduce it (as minimally and precisely as possible):
Using rook-0.7.0 (this isn't a rook problem as far as I can tell but this is how we're reproducing):
kubectl create -f rook-operator.yaml
kubectl create -f rook-cluster.yaml
kubectl create -f rook-filesystem.yaml
Mount/write to the shared filesystem and monitor /var/log/messages for the following:
kubelet: E0309 16:46:30.429770 3112 kubelet_volumes.go:128] Orphaned pod "2815f27a-219b-11e8-8a2a-ec0d9a3a445a" found, but volume paths are still present on disk : There were a total of 1 errors similar to this. Turn up verbosity to see them.
Anything else we need to know?:
This looks identical to the following: #45464 but for a different plugin.
Environment:
-
Kubernetes version (use
kubectl version
):
Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.3", GitCommit:"d2835416544f298c919e2ead3be3d0864b52323b", GitTreeState:"clean", BuildDate:"2018-02-07T12:22:21Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.3", GitCommit:"d2835416544f298c919e2ead3be3d0864b52323b", GitTreeState:"clean", BuildDate:"2018-02-07T11:55:20Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}
-
Cloud provider or hardware configuration:
Bare-metal private cloud -
OS (e.g. from /etc/os-release):
Red Hat Enterprise Linux Server release 7.4 (Maipo) -
Kernel (e.g.
uname -a
):
Linux 4.4.115-1.el7.elrepo.x86_64 Unit test coverage in Kubelet is lousy. (~30%) #1 SMP Sat Feb 3 20:11:41 EST 2018 x86_64 x86_64 x86_64 GNU/Linux -
Install tools:
kubeadm
Activity
mlmhl commentedon Mar 10, 2018
/sig storage
benceszikora commentedon Apr 10, 2018
I have the same issue, except that I can't even remove that directory:
rmdir: failed to remove ‘/var/lib/kubelet/pods/7b383940-3cc7-11e8-a78b-b8ca3a70880c/volumes/rook.io~rook/backups’: Device or resource busy
patrickstjohn commentedon Apr 11, 2018
@iliketosneeze When we've run into that issue the only recourse is to unfortunately reboot the host. Once it comes up things seem to be in a clean state.
lknhd commentedon Apr 22, 2018
i also experience this issue using kubernetes v1.10.1. Manually deleting the directory solves the problem. But yes, kubelet should intelligently deal with orphaned pods
@iliketosneeze maybe you should try umounting the directory from tmpfs
bersace commentedon Apr 23, 2018
Same here with minikube :
That looks like an internal pod. I guess there is nothing to remove.
Overv commentedon May 22, 2018
We are seeing a similar issue with our own custom flexvolume in a Kubernetes 1.8.9 cluster. Is there any way to resolve this without restarting the host until there is an actual solution?
benceszikora commentedon May 23, 2018
@lukmanulhakimd that did help with removing them, but then new volume mounts failed as the host was stuck in uninterruptable I/O. I had to cold cycle to nodes in the end.
michaelkoro commentedon Jul 12, 2018
I'm having the same issue with kubernetes 1.8.5, rancher 1.6.13, docker-ce17.03.02.
I think the kubelet should be able to acknowledge this problem, which obviously doesn't not happen..
pvlltvk commentedon Jul 27, 2018
We're also having this issue with Kubernetes 1.9.6, Docker 17.03.1-ce and vSphere Cloud Provider for persistent storage.
minhdanh commentedon Aug 2, 2018
Having the same issue with Kubernetes 1.10.2, Docker 18.06.0-ce
175 remaining items