-
Notifications
You must be signed in to change notification settings - Fork 482
Description
Looks like kube-router
is affected by this upstream issue. It is related to not cleaning up the pod ips when a container is not able to start/failed (specifically related to the use of host-local
IPAM).
Here is an example for a failed Datadog Agent Pod to start, due to this bug.
Apr 11 00:52:45 srv-8d-01-a06 kubelet: E0411 00:52:45.083738 19237 pod_workers.go:186] Error syncing pod b58c515c-3d18-11e8-830a-f8db888f5640 ("datadog-agent-4dw8f_default(b58c515c-3d18-11e8-830a-f8db888f5640)"), skipping: failed to "CreatePodSandbox" for "datadog-agent-4dw8f_default(b58c515c-3d18-11e8-830a-f8db888f5640)" with CreatePodSandboxError: "CreatePodSandbox for pod \"datadog-agent-4dw8f_default(b58c515c-3d18-11e8-830a-f8db888f5640)\" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod \"datadog-agent-4dw8f_default\" network: failed to allocate for range 0: no IP addresses available in range set: 172.22.5.1-172.22.5.254"
If you look in /var/lib/cni/networks/kubernetes
you will see your max cidr (in this case a /24
) range ips listed, effectively stopping any new containers from being launched due to being out of ip space.
/var/lib/cni/networks/kubernetes
(00:58 root@srv-8d-01-a06 kubernetes) > ls -al|wc -l
258
After stopping kubelet
and docker
, removing all the files in /var/lib/cni/networks/kubernetes
and then starting docker
and kubelet
, the pods successfully started.
This is a major problem as it can cause a host to become unusable for any new container launches.
Activity
iMartyn commentedon Apr 18, 2018
I faced exactly this issue when switching from canal to kube-router. As Killcity mentioned, removing the contents of /var/lib/cni/networks/kubernetes/ folder worked a charm, so thanks for the hint!
t3hmrman commentedon Sep 16, 2018
Just ran into this as well, in my case I use
containerd
but the following commands got the issue fixed:This got my pods up and runing again, and they all have IPs.
trevex commentedon Oct 2, 2018
Encountered the same issue as well with
kube-router
on k8s 1.11.2. We use a pxebooted CoreOS and/var/run/docker
is therefore not persisted between reboots. Therefore I expect to run into this issue frequently. Is there a potential fix in the works or is https://github.com/jsenon/api-cni-cleanup basically the only long term solution?roffe commentedon Oct 2, 2018
This is a CNI bug and not specific to kube-router imho, I've had it with calico & weave-net as well. Fix has always been deleting the files in /var/lib/cni/networks
roffe commentedon Oct 2, 2018
Usually it's some other problem leading up to the leases being filled and no new can be allocated
roffe commentedon Oct 2, 2018
One way to trigger it with kube-router is to disable ipv6 on the nodes and use CNI < 0.7. Every allocation of IP will fail and /var/lib/cni/networks would get filled up and no pods can get scheduled
roffe commentedon Oct 2, 2018
Afaik Kops <1.10 all use a old version of the CNI plugins and i think kubeadm and kubespray does as well
mazzystr commentedon Mar 12, 2019
I have the same bug using podman and cni. Thanks @iMartyn for the good workaround.
sahilsharma-bb commentedon Feb 12, 2020
Folks, I tried to implement but ran into some issues. I ran it as a daemonset nd commented out CronJob part (as suggested by @jsenon in his Readme) but I assume daemonset should run in kube-system namespace which was missing from the deployment.yaml file.
Somehow I ran it as a daemonset with ClusterRole, CluserRoleBinding and ServiceAccount and it was running fine but was not deleting the stale IP files.
Upon seeing the logs of the pod it was running and when I hit the http://:/cleanup it didn't delete the CNI files. Don't know why?
Can one share their experiences.
K8s version: 1.11
Set-up by Kops on AWS EC2 nodes
OS: Ubuntu:16.04
aauren commentedon Apr 24, 2020
Closing this as this isn't really an issue with kube-router.