Skip to content

Kubernetes-cni issue with 1.9.0 - no ip address available in range #57280

@ieugen

Description

@ieugen

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug

What happened:
I've upgraded my cluster from 1.8.5 -> 1.9.0. I upgraded the packages on my system and then used the kubeadm upgrade instructions. The upgrade went well. I changed docker logging config and restarted docker. Kube-dns did not start and I had two of them. Also my deployments did not start as well.

What you expected to happen:
I expected my cluster to work normally as expected.

How to reproduce it (as minimally and precisely as possible):

  1. Create new cluster with 1.8.5.
  2. Update packages to 1.9.0
  3. kubeadm upgrade plan
  4. kubeadm upgrade apply 1.9.0
  5. weep.

Anything else we need to know?:

The main issue is that pods don't find an IP address:

 E1216 23:50:16.116098   28152 pod_workers.go:186] Error syncing pod 6f5b9673-e2b5-11e7-a0f5-001e67d35991 ("kube-dns-6f4fd4bdf-xrj4w_kube-system(6f5b9673-e2b5-11e7-a0f5-001e67d35991)"), skipping: failed to "CreatePodSandbox" for "kube-dns-6f4fd4bdf-xrj4w_kube-system(6f5b9673-e2b5-11e7-a0f5-001e67d35991)" with CreatePodSandboxError: "CreatePodSandbox for pod \"kube-dns-6f4fd4bdf-xrj4w_kube-system(6f5b9673-e2b5-11e7-a0f5-001e67d35991)\" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod \"kube-dns-6f4fd4bdf-xrj4w_kube-system\" network: failed to allocate for range 0: no IP addresses available in range set: 10.244.0.1-10.244.0.254"

Now, I have checked to see what happens in /var/lib/cni/networks and I have found that all the addresses are filled up:

/var/lib/cni/networks# ls cbr0/
10.244.0.10   10.244.0.123  10.244.0.147  10.244.0.170	10.244.0.194  10.244.0.217  10.244.0.240  10.244.0.35  10.244.0.59  10.244.0.82
10.244.0.100  10.244.0.124  10.244.0.148  10.244.0.171	10.244.0.195  10.244.0.218  10.244.0.241  10.244.0.36  10.244.0.6   10.244.0.83
10.244.0.101  10.244.0.125  10.244.0.149  10.244.0.172	10.244.0.196  10.244.0.219  10.244.0.242  10.244.0.37  10.244.0.60  10.244.0.84
10.244.0.102  10.244.0.126  10.244.0.15   10.244.0.173	10.244.0.197  10.244.0.22   10.244.0.243  10.244.0.38  10.244.0.61  10.244.0.85
10.244.0.103  10.244.0.127  10.244.0.150  10.244.0.174	10.244.0.198  10.244.0.220  10.244.0.244  10.244.0.39  10.244.0.62  10.244.0.86
10.244.0.104  10.244.0.128  10.244.0.151  10.244.0.175	10.244.0.199  10.244.0.221  10.244.0.245  10.244.0.4   10.244.0.63  10.244.0.87
10.244.0.105  10.244.0.129  10.244.0.152  10.244.0.176	10.244.0.2    10.244.0.222  10.244.0.246  10.244.0.40  10.244.0.64  10.244.0.88
10.244.0.106  10.244.0.13   10.244.0.153  10.244.0.177	10.244.0.20   10.244.0.223  10.244.0.247  10.244.0.41  10.244.0.65  10.244.0.89
10.244.0.107  10.244.0.130  10.244.0.154  10.244.0.178	10.244.0.200  10.244.0.224  10.244.0.248  10.244.0.42  10.244.0.66  10.244.0.9
10.244.0.108  10.244.0.131  10.244.0.155  10.244.0.179	10.244.0.201  10.244.0.225  10.244.0.249  10.244.0.43  10.244.0.67  10.244.0.90
10.244.0.109  10.244.0.132  10.244.0.156  10.244.0.18	10.244.0.202  10.244.0.226  10.244.0.25   10.244.0.44  10.244.0.68  10.244.0.91
10.244.0.11   10.244.0.133  10.244.0.157  10.244.0.180	10.244.0.203  10.244.0.227  10.244.0.250  10.244.0.45  10.244.0.69  10.244.0.92
10.244.0.110  10.244.0.134  10.244.0.158  10.244.0.181	10.244.0.204  10.244.0.228  10.244.0.251  10.244.0.46  10.244.0.7   10.244.0.93
10.244.0.111  10.244.0.135  10.244.0.159  10.244.0.182	10.244.0.205  10.244.0.229  10.244.0.252  10.244.0.47  10.244.0.70  10.244.0.94
10.244.0.112  10.244.0.136  10.244.0.16   10.244.0.183	10.244.0.206  10.244.0.23   10.244.0.253  10.244.0.48  10.244.0.71  10.244.0.95
10.244.0.113  10.244.0.137  10.244.0.160  10.244.0.184	10.244.0.207  10.244.0.230  10.244.0.254  10.244.0.49  10.244.0.72  10.244.0.96
10.244.0.114  10.244.0.138  10.244.0.161  10.244.0.185	10.244.0.208  10.244.0.231  10.244.0.26   10.244.0.5   10.244.0.73  10.244.0.97
10.244.0.115  10.244.0.139  10.244.0.162  10.244.0.186	10.244.0.209  10.244.0.232  10.244.0.27   10.244.0.50  10.244.0.74  10.244.0.98
10.244.0.116  10.244.0.14   10.244.0.163  10.244.0.187	10.244.0.21   10.244.0.233  10.244.0.28   10.244.0.51  10.244.0.75  10.244.0.99
10.244.0.117  10.244.0.140  10.244.0.164  10.244.0.188	10.244.0.210  10.244.0.234  10.244.0.29   10.244.0.52  10.244.0.76  last_reserved_ip.0
10.244.0.118  10.244.0.141  10.244.0.165  10.244.0.189	10.244.0.211  10.244.0.235  10.244.0.3	  10.244.0.53  10.244.0.77
10.244.0.119  10.244.0.142  10.244.0.166  10.244.0.19	10.244.0.212  10.244.0.236  10.244.0.30   10.244.0.54  10.244.0.78
10.244.0.12   10.244.0.143  10.244.0.167  10.244.0.190	10.244.0.213  10.244.0.237  10.244.0.31   10.244.0.55  10.244.0.79
10.244.0.120  10.244.0.144  10.244.0.168  10.244.0.191	10.244.0.214  10.244.0.238  10.244.0.32   10.244.0.56  10.244.0.8
10.244.0.121  10.244.0.145  10.244.0.169  10.244.0.192	10.244.0.215  10.244.0.239  10.244.0.33   10.244.0.57  10.244.0.80
10.244.0.122  10.244.0.146  10.244.0.17   10.244.0.193	10.244.0.216  10.244.0.24   10.244.0.34   10.244.0.58  10.244.0.81

and flannel creates a lot of files without stopping:

/var/lib/cni/flannel#  ls | wc ; date 
   1207    1207   78455
Sat Dec 16 23:53:25 UTC 2017
root@staging:/var/lib/cni/flannel#  ls | wc ; date 
   1212    1212   78780
Sat Dec 16 23:53:27 UTC 2017
root@staging:/var/lib/cni/flannel#  ls | wc ; date 
   1214    1214   78910
Sat Dec 16 23:53:28 UTC 2017

Environment:

  • Kubernetes version (use kubectl version):
kubectl version
Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.0", GitCommit:"925c127ec6b946659ad0fd596fa959be43f0cc05", GitTreeState:"clean", BuildDate:"2017-12-15T21:07:38Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.0", GitCommit:"925c127ec6b946659ad0fd596fa959be43f0cc05", GitTreeState:"clean", BuildDate:"2017-12-15T20:55:30Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration: bare metal, single node
  • OS (e.g. from /etc/os-release):
PRETTY_NAME="Debian GNU/Linux 9 (stretch)"
NAME="Debian GNU/Linux"
VERSION_ID="9"
VERSION="9 (stretch)"
ID=debian

For some reason the system ends up with 2 deploy/kube-dns and

 kubectl -n kube-system get all 
NAME                 DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR                   AGE
ds/kube-flannel-ds   1         1         1         1            1           beta.kubernetes.io/arch=amd64   33m
ds/kube-proxy        1         1         1         1            1           <none>                          38m

NAME                          DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deploy/kube-dns               1         2         1            1           38m
deploy/kubernetes-dashboard   1         1         1            1           30m

NAME                                DESIRED   CURRENT   READY     AGE
rs/kube-dns-545bc4bfd4              1         1         1         38m
rs/kube-dns-6f4fd4bdf               1         1         0         4m
rs/kubernetes-dashboard-79ddfdc44   1         1         1         30m

NAME                 DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR                   AGE
ds/kube-flannel-ds   1         1         1         1            1           beta.kubernetes.io/arch=amd64   33m
ds/kube-proxy        1         1         1         1            1           <none>                          38m

NAME                          DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deploy/kube-dns               1         2         1            1           38m
deploy/kubernetes-dashboard   1         1         1            1           30m

NAME                                DESIRED   CURRENT   READY     AGE
rs/kube-dns-545bc4bfd4              1         1         1         38m
rs/kube-dns-6f4fd4bdf               1         1         0         4m
rs/kubernetes-dashboard-79ddfdc44   1         1         1         30m

NAME                                           READY     STATUS              RESTARTS   AGE
po/etcd-staging.gr8pi.net                      1/1       Running             0          3m
po/kube-apiserver-staging.gr8pi.net            1/1       Running             0          4m
po/kube-controller-manager-staging.gr8pi.net   1/1       Running             0          4m
po/kube-dns-545bc4bfd4-xs7zw                   3/3       Running             6          38m
po/kube-dns-6f4fd4bdf-wvmgp                    0/3       ContainerCreating   0          2m
po/kube-flannel-ds-8nb76                       1/1       Running             3          33m
po/kube-proxy-fvtzr                            1/1       Running             0          4m
po/kube-scheduler-staging.gr8pi.net            1/1       Running             0          4m
po/kubernetes-dashboard-79ddfdc44-p6sfv        1/1       Running             4          30m

NAME                       TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)         AGE
svc/kube-dns               ClusterIP   10.96.0.10       <none>        53/UDP,53/TCP   38m
svc/kubernetes-dashboard   NodePort    10.103.193.125   <none>        80:30940/TCP    30m

Activity

added
needs-sigIndicates an issue or PR lacks a `sig/foo` label and requires one.
on Dec 16, 2017
ieugen

ieugen commented on Dec 16, 2017

@ieugen
Author

/sig network

removed
needs-sigIndicates an issue or PR lacks a `sig/foo` label and requires one.
on Dec 16, 2017
ieugen

ieugen commented on Dec 17, 2017

@ieugen
Author

This also happens after reseting and re-installing cluster

kubeadm reset + rm -rf /var/lib/cni/flannel/* and rm -rf /var/lib/cni/networks/cbr0/* and ip link delete cni0 flannel.1 .

It seems I can't create a cluster with 1.9.0.

ieugen

ieugen commented on Dec 17, 2017

@ieugen
Author

I can confirm downgrading to 1.8.5 makes cluster work. I did kubeadm reset, downgrade and kubeadm init.

xiangpengzhao

xiangpengzhao commented on Dec 18, 2017

@xiangpengzhao
Contributor

For some reason the system ends up with 2 deploy/kube-dns

Some discussion here: #55720

squeed

squeed commented on Dec 18, 2017

@squeed
Contributor

Interesting; taking a look. I wonder if it is the old kubenet GC code no longer working.

ieugen

ieugen commented on Dec 18, 2017

@ieugen
Author

Thanks @xiangpengzhao. 55720 explains the duplication. Not sure if it explains the bug about container not starting and IP's getting exhausted.

ghost

ghost commented on Dec 20, 2017

@ghost

Same issue on Ubuntu 16.04.1 LTS

pytimer

pytimer commented on Dec 22, 2017

@pytimer
Contributor

Same issue on Centos7.2

xiangpengzhao

xiangpengzhao commented on Dec 22, 2017

@xiangpengzhao
Contributor

/cc @kubernetes/sig-network-bugs

mehrdadpfg

mehrdadpfg commented on Dec 26, 2017

@mehrdadpfg

same issue here on debian 9 with kube-router

45 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

kind/bugCategorizes issue or PR as related to a bug.sig/networkCategorizes an issue or PR as relevant to SIG Network.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

    Development

    No branches or pull requests

      Participants

      @tomdee@matthiasr@XANi@insertjokehere@JoeJasinski

      Issue actions

        Kubernetes-cni issue with 1.9.0 - no ip address available in range · Issue #57280 · kubernetes/kubernetes