-
Notifications
You must be signed in to change notification settings - Fork 41k
Description
Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
What happened:
I've upgraded my cluster from 1.8.5 -> 1.9.0. I upgraded the packages on my system and then used the kubeadm upgrade instructions. The upgrade went well. I changed docker logging config and restarted docker. Kube-dns did not start and I had two of them. Also my deployments did not start as well.
What you expected to happen:
I expected my cluster to work normally as expected.
How to reproduce it (as minimally and precisely as possible):
- Create new cluster with 1.8.5.
- Update packages to 1.9.0
- kubeadm upgrade plan
- kubeadm upgrade apply 1.9.0
- weep.
Anything else we need to know?:
The main issue is that pods don't find an IP address:
E1216 23:50:16.116098 28152 pod_workers.go:186] Error syncing pod 6f5b9673-e2b5-11e7-a0f5-001e67d35991 ("kube-dns-6f4fd4bdf-xrj4w_kube-system(6f5b9673-e2b5-11e7-a0f5-001e67d35991)"), skipping: failed to "CreatePodSandbox" for "kube-dns-6f4fd4bdf-xrj4w_kube-system(6f5b9673-e2b5-11e7-a0f5-001e67d35991)" with CreatePodSandboxError: "CreatePodSandbox for pod \"kube-dns-6f4fd4bdf-xrj4w_kube-system(6f5b9673-e2b5-11e7-a0f5-001e67d35991)\" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod \"kube-dns-6f4fd4bdf-xrj4w_kube-system\" network: failed to allocate for range 0: no IP addresses available in range set: 10.244.0.1-10.244.0.254"
Now, I have checked to see what happens in /var/lib/cni/networks
and I have found that all the addresses are filled up:
/var/lib/cni/networks# ls cbr0/
10.244.0.10 10.244.0.123 10.244.0.147 10.244.0.170 10.244.0.194 10.244.0.217 10.244.0.240 10.244.0.35 10.244.0.59 10.244.0.82
10.244.0.100 10.244.0.124 10.244.0.148 10.244.0.171 10.244.0.195 10.244.0.218 10.244.0.241 10.244.0.36 10.244.0.6 10.244.0.83
10.244.0.101 10.244.0.125 10.244.0.149 10.244.0.172 10.244.0.196 10.244.0.219 10.244.0.242 10.244.0.37 10.244.0.60 10.244.0.84
10.244.0.102 10.244.0.126 10.244.0.15 10.244.0.173 10.244.0.197 10.244.0.22 10.244.0.243 10.244.0.38 10.244.0.61 10.244.0.85
10.244.0.103 10.244.0.127 10.244.0.150 10.244.0.174 10.244.0.198 10.244.0.220 10.244.0.244 10.244.0.39 10.244.0.62 10.244.0.86
10.244.0.104 10.244.0.128 10.244.0.151 10.244.0.175 10.244.0.199 10.244.0.221 10.244.0.245 10.244.0.4 10.244.0.63 10.244.0.87
10.244.0.105 10.244.0.129 10.244.0.152 10.244.0.176 10.244.0.2 10.244.0.222 10.244.0.246 10.244.0.40 10.244.0.64 10.244.0.88
10.244.0.106 10.244.0.13 10.244.0.153 10.244.0.177 10.244.0.20 10.244.0.223 10.244.0.247 10.244.0.41 10.244.0.65 10.244.0.89
10.244.0.107 10.244.0.130 10.244.0.154 10.244.0.178 10.244.0.200 10.244.0.224 10.244.0.248 10.244.0.42 10.244.0.66 10.244.0.9
10.244.0.108 10.244.0.131 10.244.0.155 10.244.0.179 10.244.0.201 10.244.0.225 10.244.0.249 10.244.0.43 10.244.0.67 10.244.0.90
10.244.0.109 10.244.0.132 10.244.0.156 10.244.0.18 10.244.0.202 10.244.0.226 10.244.0.25 10.244.0.44 10.244.0.68 10.244.0.91
10.244.0.11 10.244.0.133 10.244.0.157 10.244.0.180 10.244.0.203 10.244.0.227 10.244.0.250 10.244.0.45 10.244.0.69 10.244.0.92
10.244.0.110 10.244.0.134 10.244.0.158 10.244.0.181 10.244.0.204 10.244.0.228 10.244.0.251 10.244.0.46 10.244.0.7 10.244.0.93
10.244.0.111 10.244.0.135 10.244.0.159 10.244.0.182 10.244.0.205 10.244.0.229 10.244.0.252 10.244.0.47 10.244.0.70 10.244.0.94
10.244.0.112 10.244.0.136 10.244.0.16 10.244.0.183 10.244.0.206 10.244.0.23 10.244.0.253 10.244.0.48 10.244.0.71 10.244.0.95
10.244.0.113 10.244.0.137 10.244.0.160 10.244.0.184 10.244.0.207 10.244.0.230 10.244.0.254 10.244.0.49 10.244.0.72 10.244.0.96
10.244.0.114 10.244.0.138 10.244.0.161 10.244.0.185 10.244.0.208 10.244.0.231 10.244.0.26 10.244.0.5 10.244.0.73 10.244.0.97
10.244.0.115 10.244.0.139 10.244.0.162 10.244.0.186 10.244.0.209 10.244.0.232 10.244.0.27 10.244.0.50 10.244.0.74 10.244.0.98
10.244.0.116 10.244.0.14 10.244.0.163 10.244.0.187 10.244.0.21 10.244.0.233 10.244.0.28 10.244.0.51 10.244.0.75 10.244.0.99
10.244.0.117 10.244.0.140 10.244.0.164 10.244.0.188 10.244.0.210 10.244.0.234 10.244.0.29 10.244.0.52 10.244.0.76 last_reserved_ip.0
10.244.0.118 10.244.0.141 10.244.0.165 10.244.0.189 10.244.0.211 10.244.0.235 10.244.0.3 10.244.0.53 10.244.0.77
10.244.0.119 10.244.0.142 10.244.0.166 10.244.0.19 10.244.0.212 10.244.0.236 10.244.0.30 10.244.0.54 10.244.0.78
10.244.0.12 10.244.0.143 10.244.0.167 10.244.0.190 10.244.0.213 10.244.0.237 10.244.0.31 10.244.0.55 10.244.0.79
10.244.0.120 10.244.0.144 10.244.0.168 10.244.0.191 10.244.0.214 10.244.0.238 10.244.0.32 10.244.0.56 10.244.0.8
10.244.0.121 10.244.0.145 10.244.0.169 10.244.0.192 10.244.0.215 10.244.0.239 10.244.0.33 10.244.0.57 10.244.0.80
10.244.0.122 10.244.0.146 10.244.0.17 10.244.0.193 10.244.0.216 10.244.0.24 10.244.0.34 10.244.0.58 10.244.0.81
and flannel creates a lot of files without stopping:
/var/lib/cni/flannel# ls | wc ; date
1207 1207 78455
Sat Dec 16 23:53:25 UTC 2017
root@staging:/var/lib/cni/flannel# ls | wc ; date
1212 1212 78780
Sat Dec 16 23:53:27 UTC 2017
root@staging:/var/lib/cni/flannel# ls | wc ; date
1214 1214 78910
Sat Dec 16 23:53:28 UTC 2017
Environment:
- Kubernetes version (use
kubectl version
):
kubectl version
Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.0", GitCommit:"925c127ec6b946659ad0fd596fa959be43f0cc05", GitTreeState:"clean", BuildDate:"2017-12-15T21:07:38Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.0", GitCommit:"925c127ec6b946659ad0fd596fa959be43f0cc05", GitTreeState:"clean", BuildDate:"2017-12-15T20:55:30Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}
- Cloud provider or hardware configuration: bare metal, single node
- OS (e.g. from /etc/os-release):
PRETTY_NAME="Debian GNU/Linux 9 (stretch)"
NAME="Debian GNU/Linux"
VERSION_ID="9"
VERSION="9 (stretch)"
ID=debian
- Kernel (e.g.
uname -a
): 4.9.0-4-amd64 Unit test coverage in Kubelet is lousy. (~30%) #1 SMP Debian 4.9.65-3 (2017-12-03) x86_64 GNU/Linux - Install tools: kubeadm
- Others: flannel 0.9.1
For some reason the system ends up with 2 deploy/kube-dns
and
kubectl -n kube-system get all
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
ds/kube-flannel-ds 1 1 1 1 1 beta.kubernetes.io/arch=amd64 33m
ds/kube-proxy 1 1 1 1 1 <none> 38m
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deploy/kube-dns 1 2 1 1 38m
deploy/kubernetes-dashboard 1 1 1 1 30m
NAME DESIRED CURRENT READY AGE
rs/kube-dns-545bc4bfd4 1 1 1 38m
rs/kube-dns-6f4fd4bdf 1 1 0 4m
rs/kubernetes-dashboard-79ddfdc44 1 1 1 30m
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
ds/kube-flannel-ds 1 1 1 1 1 beta.kubernetes.io/arch=amd64 33m
ds/kube-proxy 1 1 1 1 1 <none> 38m
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deploy/kube-dns 1 2 1 1 38m
deploy/kubernetes-dashboard 1 1 1 1 30m
NAME DESIRED CURRENT READY AGE
rs/kube-dns-545bc4bfd4 1 1 1 38m
rs/kube-dns-6f4fd4bdf 1 1 0 4m
rs/kubernetes-dashboard-79ddfdc44 1 1 1 30m
NAME READY STATUS RESTARTS AGE
po/etcd-staging.gr8pi.net 1/1 Running 0 3m
po/kube-apiserver-staging.gr8pi.net 1/1 Running 0 4m
po/kube-controller-manager-staging.gr8pi.net 1/1 Running 0 4m
po/kube-dns-545bc4bfd4-xs7zw 3/3 Running 6 38m
po/kube-dns-6f4fd4bdf-wvmgp 0/3 ContainerCreating 0 2m
po/kube-flannel-ds-8nb76 1/1 Running 3 33m
po/kube-proxy-fvtzr 1/1 Running 0 4m
po/kube-scheduler-staging.gr8pi.net 1/1 Running 0 4m
po/kubernetes-dashboard-79ddfdc44-p6sfv 1/1 Running 4 30m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
svc/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP 38m
svc/kubernetes-dashboard NodePort 10.103.193.125 <none> 80:30940/TCP 30m
Activity
ieugen commentedon Dec 16, 2017
/sig network
ieugen commentedon Dec 17, 2017
This also happens after reseting and re-installing cluster
kubeadm reset
+rm -rf /var/lib/cni/flannel/*
andrm -rf /var/lib/cni/networks/cbr0/*
andip link delete cni0 flannel.1
.It seems I can't create a cluster with 1.9.0.
ieugen commentedon Dec 17, 2017
I can confirm downgrading to 1.8.5 makes cluster work. I did kubeadm reset, downgrade and kubeadm init.
xiangpengzhao commentedon Dec 18, 2017
Some discussion here: #55720
squeed commentedon Dec 18, 2017
Interesting; taking a look. I wonder if it is the old kubenet GC code no longer working.
ieugen commentedon Dec 18, 2017
Thanks @xiangpengzhao. 55720 explains the duplication. Not sure if it explains the bug about container not starting and IP's getting exhausted.
ghost commentedon Dec 20, 2017
Same issue on Ubuntu 16.04.1 LTS
pytimer commentedon Dec 22, 2017
Same issue on Centos7.2
xiangpengzhao commentedon Dec 22, 2017
/cc @kubernetes/sig-network-bugs
mehrdadpfg commentedon Dec 26, 2017
same issue here on debian 9 with kube-router
45 remaining items