Skip to content

failed to allocate for range 0: no IP addresses available in range set #383

@killcity

Description

@killcity

Looks like kube-router is affected by this upstream issue. It is related to not cleaning up the pod ips when a container is not able to start/failed (specifically related to the use of host-local IPAM).

Here is an example for a failed Datadog Agent Pod to start, due to this bug.

Apr 11 00:52:45 srv-8d-01-a06 kubelet: E0411 00:52:45.083738 19237 pod_workers.go:186] Error syncing pod b58c515c-3d18-11e8-830a-f8db888f5640 ("datadog-agent-4dw8f_default(b58c515c-3d18-11e8-830a-f8db888f5640)"), skipping: failed to "CreatePodSandbox" for "datadog-agent-4dw8f_default(b58c515c-3d18-11e8-830a-f8db888f5640)" with CreatePodSandboxError: "CreatePodSandbox for pod \"datadog-agent-4dw8f_default(b58c515c-3d18-11e8-830a-f8db888f5640)\" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod \"datadog-agent-4dw8f_default\" network: failed to allocate for range 0: no IP addresses available in range set: 172.22.5.1-172.22.5.254"

If you look in /var/lib/cni/networks/kubernetes you will see your max cidr (in this case a /24) range ips listed, effectively stopping any new containers from being launched due to being out of ip space.

/var/lib/cni/networks/kubernetes
(00:58 root@srv-8d-01-a06 kubernetes) > ls -al|wc -l
258

After stopping kubelet and docker, removing all the files in /var/lib/cni/networks/kubernetes and then starting docker and kubelet, the pods successfully started.

This is a major problem as it can cause a host to become unusable for any new container launches.

Activity

iMartyn

iMartyn commented on Apr 18, 2018

@iMartyn

I faced exactly this issue when switching from canal to kube-router. As Killcity mentioned, removing the contents of /var/lib/cni/networks/kubernetes/ folder worked a charm, so thanks for the hint!

t3hmrman

t3hmrman commented on Sep 16, 2018

@t3hmrman

Just ran into this as well, in my case I use containerd but the following commands got the issue fixed:

# systemctl stop kubelet
# systemctl stop containerd
# ls /var/lib/cni/networks/
k8s-pod-network  mynet
# mv /var/lib/cni/networks /var/lib/cni/networks.bak
# mkdir /var/lib/cni/networks
# systemctl start containerd
# systemctl start kubelet

This got my pods up and runing again, and they all have IPs.

trevex

trevex commented on Oct 2, 2018

@trevex

Encountered the same issue as well with kube-router on k8s 1.11.2. We use a pxebooted CoreOS and /var/run/docker is therefore not persisted between reboots. Therefore I expect to run into this issue frequently. Is there a potential fix in the works or is https://github.com/jsenon/api-cni-cleanup basically the only long term solution?

roffe

roffe commented on Oct 2, 2018

@roffe
Contributor

This is a CNI bug and not specific to kube-router imho, I've had it with calico & weave-net as well. Fix has always been deleting the files in /var/lib/cni/networks

roffe

roffe commented on Oct 2, 2018

@roffe
Contributor

Usually it's some other problem leading up to the leases being filled and no new can be allocated

roffe

roffe commented on Oct 2, 2018

@roffe
Contributor

One way to trigger it with kube-router is to disable ipv6 on the nodes and use CNI < 0.7. Every allocation of IP will fail and /var/lib/cni/networks would get filled up and no pods can get scheduled

roffe

roffe commented on Oct 2, 2018

@roffe
Contributor

Afaik Kops <1.10 all use a old version of the CNI plugins and i think kubeadm and kubespray does as well

mazzystr

mazzystr commented on Mar 12, 2019

@mazzystr

I have the same bug using podman and cni. Thanks @iMartyn for the good workaround.

sahilsharma-bb

sahilsharma-bb commented on Feb 12, 2020

@sahilsharma-bb

Folks, I tried to implement but ran into some issues. I ran it as a daemonset nd commented out CronJob part (as suggested by @jsenon in his Readme) but I assume daemonset should run in kube-system namespace which was missing from the deployment.yaml file.
Somehow I ran it as a daemonset with ClusterRole, CluserRoleBinding and ServiceAccount and it was running fine but was not deleting the stale IP files.
Upon seeing the logs of the pod it was running and when I hit the http://:/cleanup it didn't delete the CNI files. Don't know why?
Can one share their experiences.
K8s version: 1.11
Set-up by Kops on AWS EC2 nodes
OS: Ubuntu:16.04

aauren

aauren commented on Apr 24, 2020

@aauren
Collaborator

Closing this as this isn't really an issue with kube-router.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @trevex@iMartyn@t3hmrman@aauren@murali-reddy

        Issue actions

          failed to allocate for range 0: no IP addresses available in range set · Issue #383 · cloudnativelabs/kube-router