CoreDNS not started with k8s 1.11 and weave (CentOS 7) #998

New issue

Closed

kubernetes/website

#9872

Closed

CoreDNS not started with k8s 1.11 and weave (CentOS 7)#998

kubernetes/website

#9872

Assignees

Labels

kind/documentationlifecycle/activepriority/important-soon

Milestone

v1.12

carlosrmendes

Is this a BUG REPORT or FEATURE REQUEST?

BUG REPORT

Versions

kubeadm version 1.11

Environment:

Kubernetes version (use kubectl version): 1.11
Cloud provider or hardware configuration: aws ec2 with (16vcpus 64gb RAM)
OS (e.g. from /etc/os-release): centos 7
Kernel (e.g. uname -a): 3.10.0-693.17.1.el7.x86_64
Others: weave as cni add-on

What happened?

after kubeadm init the coreos pods stay in Error

NAME                                   READY     STATUS    RESTARTS   AGE
coredns-78fcdf6894-ljdjp               0/1       Error     6          9m
coredns-78fcdf6894-p6flm               0/1       Error     6          9m
etcd-master                            1/1       Running   0          8m
heapster-5bbdfbff9f-h5h2n              1/1       Running   0          9m
kube-apiserver-master                  1/1       Running   0          8m
kube-controller-manager-master         1/1       Running   0          8m
kube-proxy-5642r                       1/1       Running   0          9m
kube-scheduler-master                  1/1       Running   0          8m
kubernetes-dashboard-6948bdb78-bwkvx   1/1       Running   0          9m
weave-net-r5jkg                        2/2       Running   0          9m

The logs of both pods show the following:
standard_init_linux.go:178: exec user process caused "operation not permitted"

neolit123

added

priority/needs-more-evidence

on Jul 17, 2018

neolit123

Member

@kubernetes/sig-network-bugs

@carlosmkb, what is your docker version?

timothysc

Member

I find this hard to believe, we pretty extensively test CentOS 7 on our side.

Do you have the system and pod logs?

dims

Member

this one? https://stackoverflow.com/questions/44127247/does-anyone-know-a-workaround-for-no-new-privileges-blocking-selinux-transitions

carlosrmendes

Author

@dims , can make sense, I will try

@neolit123 and @timothysc

docker version: docker-1.13.1-63.git94f4240.el7.centos.x86_64

coredns pods log: standard_init_linux.go:178: exec user process caused "operation not permitted"
system log journalctl -xeu kubelet:

Jul 17 23:45:17 server.raid.local kubelet[20442]: E0717 23:45:17.679867   20442 pod_workers.go:186] Error syncing pod dd030886-89f4-11e8-9786-0a92797fa29e ("cas-7d6d97c7bd-mzw5j_raidcloud(dd030886-89f4-11e8-9786-0a92797fa29e)"), skipping: failed to "StartContainer" for "cas" with ImagePullBackOff: "Back-off pulling image \"registry.raidcloud.io/raidcloud/cas:180328.pvt.01\""
Jul 17 23:45:18 server.raid.local kubelet[20442]: I0717 23:45:18.679059   20442 kuberuntime_manager.go:513] Container {Name:json2ldap Image:registry.raidcloud.io/raidcloud/json2ldap:180328.pvt.01 Command:[] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[] Resources:{Limits:map[] Requests:map[]} VolumeMounts:[{Name:default-token-f2cmq ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath: MountPropagation:<nil>}] VolumeDevices:[] LivenessProbe:&Probe{Handler:Handler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/,Port:8080,Host:,Scheme:HTTP,HTTPHeaders:[],},TCPSocket:nil,},InitialDelaySeconds:30,TimeoutSeconds:5,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:3,} ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:&SecurityContext{Capabilities:nil,Privileged:*true,SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,ReadOnlyRootFilesystem:nil,AllowPrivilegeEscalation:nil,RunAsGroup:nil,} Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Jul 17 23:45:18 server.raid.local kubelet[20442]: E0717 23:45:18.680001   20442 pod_workers.go:186] Error syncing pod dcc39ce2-89f4-11e8-9786-0a92797fa29e ("json2ldap-666fc85686-tmxrr_raidcloud(dcc39ce2-89f4-11e8-9786-0a92797fa29e)"), skipping: failed to "StartContainer" for "json2ldap" with ImagePullBackOff: "Back-off pulling image \"registry.raidcloud.io/raidcloud/json2ldap:180328.pvt.01\""
Jul 17 23:45:21 server.raid.local kubelet[20442]: I0717 23:45:21.678232   20442 kuberuntime_manager.go:513] Container {Name:coredns Image:k8s.gcr.io/coredns:1.1.3 Command:[] Args:[-conf /etc/coredns/Corefile] WorkingDir: Ports:[{Name:dns HostPort:0 ContainerPort:53 Protocol:UDP HostIP:} {Name:dns-tcp HostPort:0 ContainerPort:53 Protocol:TCP HostIP:} {Name:metrics HostPort:0 ContainerPort:9153 Protocol:TCP HostIP:}] EnvFrom:[] Env:[] Resources:{Limits:map[memory:{i:{value:178257920 scale:0} d:{Dec:<nil>} s:170Mi Format:BinarySI}] Requests:map[cpu:{i:{value:100 scale:-3} d:{Dec:<nil>} s:100m Format:DecimalSI} memory:{i:{value:73400320 scale:0} d:{Dec:<nil>} s:70Mi Format:BinarySI}]} VolumeMounts:[{Name:config-volume ReadOnly:true MountPath:/etc/coredns SubPath: MountPropagation:<nil>} {Name:coredns-token-6nhgg ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath: MountPropagation:<nil>}] VolumeDevices:[] LivenessProbe:&Probe{Handler:Handler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/health,Port:8080,Host:,Scheme:HTTP,HTTPHeaders:[],},TCPSocket:nil,},InitialDelaySeconds:60,TimeoutSeconds:5,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:5,} ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:&SecurityContext{Capabilities:&Capabilities{Add:[NET_BIND_SERVICE],Drop:[all],},Privileged:nil,SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,ReadOnlyRootFilesystem:*true,AllowPrivilegeEscalation:*false,RunAsGroup:nil,} Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Jul 17 23:45:21 server.raid.local kubelet[20442]: I0717 23:45:21.678311   20442 kuberuntime_manager.go:757] checking backoff for container "coredns" in pod "coredns-78fcdf6894-znfvw_kube-system(9b44aa92-89f7-11e8-9786-0a92797fa29e)"
Jul 17 23:45:21 server.raid.local kubelet[20442]: I0717 23:45:21.678404   20442 kuberuntime_manager.go:767] Back-off 5m0s restarting failed container=coredns pod=coredns-78fcdf6894-znfvw_kube-system(9b44aa92-89f7-11e8-9786-0a92797fa29e)
Jul 17 23:45:21 server.raid.local kubelet[20442]: E0717 23:45:21.678425   20442 pod_workers.go:186] Error syncing pod 9b44aa92-89f7-11e8-9786-0a92797fa29e ("coredns-78fcdf6894-znfvw_kube-system(9b44aa92-89f7-11e8-9786-0a92797fa29e)"), skipping: failed to "StartContainer" for "coredns" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=coredns pod=coredns-78fcdf6894-znfvw_kube-system(9b44aa92-89f7-11e8-9786-0a92797fa29e)"
Jul 17 23:45:22 server.raid.local kubelet[20442]: I0717 23:45:22.679145   20442 kuberuntime_manager.go:513] Container {Name:login Image:registry.raidcloud.io/raidcloud/admin:180329.pvt.05 Command:[] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[] Resources:{Limits:map[] Requests:map[]} VolumeMounts:[{Name:login-config ReadOnly:true MountPath:/usr/share/nginx/conf/ SubPath: MountPropagation:<nil>} {Name:default-token-f2cmq ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath: MountPropagation:<nil>}] VolumeDevices:[] LivenessProbe:&Probe{Handler:Handler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/health,Port:8080,Host:,Scheme:HTTP,HTTPHeaders:[],},TCPSocket:nil,},InitialDelaySeconds:5,TimeoutSeconds:5,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:3,} ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:&SecurityContext{Capabilities:nil,Privileged:*true,SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,ReadOnlyRootFilesystem:nil,AllowPrivilegeEscalation:nil,RunAsGroup:nil,} Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Jul 17 23:45:22 server.raid.local kubelet[20442]: E0717 23:45:22.679941   20442 pod_workers.go:186] Error syncing pod dc8392a9-89f4-11e8-9786-0a92797fa29e ("login-85ffb66bb8-5l9fq_raidcloud(dc8392a9-89f4-11e8-9786-0a92797fa29e)"), skipping: failed to "StartContainer" for "login" with ImagePullBackOff: "Back-off pulling image \"registry.raidcloud.io/raidcloud/admin:180329.pvt.05\""
Jul 17 23:45:23 server.raid.local kubelet[20442]: I0717 23:45:23.678172   20442 kuberuntime_manager.go:513] Container {Name:coredns Image:k8s.gcr.io/coredns:1.1.3 Command:[] Args:[-conf /etc/coredns/Corefile] WorkingDir: Ports:[{Name:dns HostPort:0 ContainerPort:53 Protocol:UDP HostIP:} {Name:dns-tcp HostPort:0 ContainerPort:53 Protocol:TCP HostIP:} {Name:metrics HostPort:0 ContainerPort:9153 Protocol:TCP HostIP:}] EnvFrom:[] Env:[] Resources:{Limits:map[memory:{i:{value:178257920 scale:0} d:{Dec:<nil>} s:170Mi Format:BinarySI}] Requests:map[cpu:{i:{value:100 scale:-3} d:{Dec:<nil>} s:100m Format:DecimalSI} memory:{i:{value:73400320 scale:0} d:{Dec:<nil>} s:70Mi Format:BinarySI}]} VolumeMounts:[{Name:config-volume ReadOnly:true MountPath:/etc/coredns SubPath: MountPropagation:<nil>} {Name:coredns-token-6nhgg ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath: MountPropagation:<nil>}] VolumeDevices:[] LivenessProbe:&Probe{Handler:Handler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/health,Port:8080,Host:,Scheme:HTTP,HTTPHeaders:[],},TCPSocket:nil,},InitialDelaySeconds:60,TimeoutSeconds:5,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:5,} ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:&SecurityContext{Capabilities:&Capabilities{Add:[NET_BIND_SERVICE],Drop:[all],},Privileged:nil,SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,ReadOnlyRootFilesystem:*true,AllowPrivilegeEscalation:*false,RunAsGroup:nil,} Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Jul 17 23:45:23 server.raid.local kubelet[20442]: I0717 23:45:23.678412   20442 kuberuntime_manager.go:757] checking backoff for container "coredns" in pod "coredns-78fcdf6894-lcqt5_kube-system(9b45a068-89f7-11e8-9786-0a92797fa29e)"
Jul 17 23:45:23 server.raid.local kubelet[20442]: I0717 23:45:23.678532   20442 kuberuntime_manager.go:767] Back-off 5m0s restarting failed container=coredns pod=coredns-78fcdf6894-lcqt5_kube-system(9b45a068-89f7-11e8-9786-0a92797fa29e)
Jul 17 23:45:23 server.raid.local kubelet[20442]: E0717 23:45:23.678554   20442 pod_workers.go:186] Error syncing pod 9b45a068-89f7-11e8-9786-0a92797fa29e ("coredns-78fcdf6894-lcqt5_kube-system(9b45a068-89f7-11e8-9786-0a92797fa29e)"), skipping: failed to "StartContainer" for "coredns" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=coredns pod=coredns-78fcdf6894-lcqt5_kube-system(9b45a068-89f7-11e8-9786-0a92797fa29e)"

neolit123

mentioned this

on Jul 19, 2018

Do not merge - check CoreDNS image kubernetes/kubernetes#66309

chrisohaver

Found a couple instances of the same errors reported in other scenarios in the past.
Might try removing "allowPrivilegeEscalation: false" from the CoreDNS deployment to see if that helps.

sectorsize512

Same issue for me. Similar setup CentOS 7.4.1708, Docker version 1.13.1, build 94f4240/1.13.1 (comes with CentOS):

[root@faas-A01 ~]# kubectl get pods --all-namespaces
NAMESPACE     NAME                                                 READY     STATUS             RESTARTS   AGE
kube-system   calico-node-2vssv                                    2/2       Running            0          9m
kube-system   calico-node-4vr7t                                    2/2       Running            0          7m
kube-system   calico-node-nlfnd                                    2/2       Running            0          17m
kube-system   calico-node-rgw5w                                    2/2       Running            0          23m
kube-system   coredns-78fcdf6894-p4wbl                             0/1       CrashLoopBackOff   9          30m
kube-system   coredns-78fcdf6894-r4pwf                             0/1       CrashLoopBackOff   9          30m
kube-system   etcd-faas-a01.sl.cloud9.ibm.com                      1/1       Running            0          29m
kube-system   kube-apiserver-faas-a01.sl.cloud9.ibm.com            1/1       Running            0          29m
kube-system   kube-controller-manager-faas-a01.sl.cloud9.ibm.com   1/1       Running            0          29m
kube-system   kube-proxy-55csj                                     1/1       Running            0          17m
kube-system   kube-proxy-56r8c                                     1/1       Running            0          30m
kube-system   kube-proxy-kncql                                     1/1       Running            0          9m
kube-system   kube-proxy-mf2bp                                     1/1       Running            0          7m
kube-system   kube-scheduler-faas-a01.sl.cloud9.ibm.com            1/1       Running            0          29m
[root@faas-A01 ~]# kubectl logs --namespace=all coredns-78fcdf6894-p4wbl
Error from server (NotFound): namespaces "all" not found
[root@faas-A01 ~]# kubectl logs --namespace=kube-system coredns-78fcdf6894-p4wbl
standard_init_linux.go:178: exec user process caused "operation not permitted"

sectorsize512

just in case, selinux is in permissive mode on all nodes.

sectorsize512

An I'm using Calico (not weave as @carlosmkb).

chrisohaver

[root@faas-A01 ~]# kubectl logs --namespace=kube-system coredns-78fcdf6894-p4wbl
standard_init_linux.go:178: exec user process caused "operation not permitted"

Ah - This is an error from kubectl when trying to get the logs, not the contents of the logs...

carlosrmendes

Author

@chrisohaver the kubectl logs works with another kube-system pods

chrisohaver

OK - have you tried removing "allowPrivilegeEscalation: false" from the CoreDNS deployment to see if that helps?

chrisohaver

... does a kubectl describe of the coredns pod show anything interesting?

Leozki

Same issue for me.
CentOS Linux release 7.5.1804 (Core)
Docker version 1.13.1, build dded712/1.13.1
flannel as cni add-on

[root@k8s ~]# kubectl get pods --all-namespaces
NAMESPACE     NAME                                 READY     STATUS             RESTARTS   AGE
kube-system   coredns-78fcdf6894-cfmm7             0/1       CrashLoopBackOff   12         15m
kube-system   coredns-78fcdf6894-k65js             0/1       CrashLoopBackOff   11         15m
kube-system   etcd-k8s.master                      1/1       Running            0          14m
kube-system   kube-apiserver-k8s.master            1/1       Running            0          13m
kube-system   kube-controller-manager-k8s.master   1/1       Running            0          14m
kube-system   kube-flannel-ds-fts6v                1/1       Running            0          14m
kube-system   kube-proxy-4tdb5                     1/1       Running            0          15m
kube-system   kube-scheduler-k8s.master            1/1       Running            0          14m
[root@k8s ~]# kubectl logs coredns-78fcdf6894-cfmm7 -n kube-system
standard_init_linux.go:178: exec user process caused "operation not permitted"
[root@k8s ~]# kubectl describe pods coredns-78fcdf6894-cfmm7 -n kube-system
Name:           coredns-78fcdf6894-cfmm7
Namespace:      kube-system
Node:           k8s.master/192.168.150.40
Start Time:     Fri, 27 Jul 2018 00:32:09 +0800
Labels:         k8s-app=kube-dns
                pod-template-hash=3497892450
Annotations:    <none>
Status:         Running
IP:             10.244.0.12
Controlled By:  ReplicaSet/coredns-78fcdf6894
Containers:
  coredns:
    Container ID:  docker://3b7670fbc07084410984d7e3f8c0fa1b6d493a41d2a4e32f5885b7db9d602417
    Image:         k8s.gcr.io/coredns:1.1.3
    Image ID:      docker-pullable://k8s.gcr.io/coredns@sha256:db2bf53126ed1c761d5a41f24a1b82a461c85f736ff6e90542e9522be4757848
    Ports:         53/UDP, 53/TCP, 9153/TCP
    Host Ports:    0/UDP, 0/TCP, 0/TCP
    Args:
      -conf
      /etc/coredns/Corefile
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Fri, 27 Jul 2018 00:46:30 +0800
      Finished:     Fri, 27 Jul 2018 00:46:30 +0800
    Ready:          False
    Restart Count:  12
    Limits:
      memory:  170Mi
    Requests:
      cpu:        100m
      memory:     70Mi
    Liveness:     http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
    Environment:  <none>
    Mounts:
      /etc/coredns from config-volume (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from coredns-token-vqslm (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      coredns
    Optional:  false
  coredns-token-vqslm:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  coredns-token-vqslm
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     CriticalAddonsOnly
                 node-role.kubernetes.io/master:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                 From                 Message
  ----     ------            ----                ----                 -------
  Warning  FailedScheduling  16m (x6 over 16m)   default-scheduler    0/1 nodes are available: 1 node(s) were not ready.
  Normal   Scheduled         16m                 default-scheduler    Successfully assigned kube-system/coredns-78fcdf6894-cfmm7 to k8s.master
  Warning  BackOff           14m (x10 over 16m)  kubelet, k8s.master  Back-off restarting failed container
  Normal   Pulled            14m (x5 over 16m)   kubelet, k8s.master  Container image "k8s.gcr.io/coredns:1.1.3" already present on machine
  Normal   Created           14m (x5 over 16m)   kubelet, k8s.master  Created container
  Normal   Started           14m (x5 over 16m)   kubelet, k8s.master  Started container
  Normal   Pulled            11m (x4 over 12m)   kubelet, k8s.master  Container image "k8s.gcr.io/coredns:1.1.3" already present on machine
  Normal   Created           11m (x4 over 12m)   kubelet, k8s.master  Created container
  Normal   Started           11m (x4 over 12m)   kubelet, k8s.master  Started container
  Warning  BackOff           2m (x56 over 12m)   kubelet, k8s.master  Back-off restarting failed container
[root@k8s ~]# uname
Linux
[root@k8s ~]# uname -a
Linux k8s.master 3.10.0-693.el7.x86_64 #1 SMP Tue Aug 22 21:09:27 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
[root@k8s ~]# cat /etc/redhat-release 
CentOS Linux release 7.5.1804 (Core) 
[root@k8s ~]# docker --version
Docker version 1.13.1, build dded712/1.13.1

13 remaining items

chrisohaver

Thats fine. We should perhaps mention that there are negative security implications when disabling SELinux, or changing the allowPrivilegeEscalation setting.

The most secure solution is to upgrade Docker to the version that Kubernetes recommends (17.03)

neolit123

Member

@chrisohaver
understood, will amend the copy and submit a PR for this.

neolit123

self-assigned this

on Aug 10, 2018

neolit123

added

priority/important-soon

kind/documentation

and removed

priority/needs-more-evidence

on Aug 10, 2018

neolit123

added this to the v1.12 milestone

added

mentioned this

troubleshooting-kubeadm: add guide for fixing stale CoreDNS pods kubernetes/website#9872

k8s-ci-robot

closed this as completedin #9872

on Aug 22, 2018

mydockergit

There is also answer for that in stackoverflow:
https://stackoverflow.com/questions/53075796/coredns-pods-have-crashloopbackoff-or-error-state

This error

[FATAL] plugin/loop: Seen "HINFO IN 6900627972087569316.7905576541070882081." more than twice, loop detected

is caused when CoreDNS detects a loop in the resolve configuration, and it is the intended behavior. You are hitting this issue:

#1162

coredns/coredns#2087

Hacky solution: Disable the CoreDNS loop detection

Edit the CoreDNS configmap:

kubectl -n kube-system edit configmap coredns

Remove or comment out the line with loop, save and exit.

Then remove the CoreDNS pods, so new ones can be created with new config:

kubectl -n kube-system delete pod -l k8s-app=kube-dns

All should be fine after that.

Preferred Solution: Remove the loop in the DNS configuration

First, check if you are using systemd-resolved. If you are running Ubuntu 18.04, it is probably the case.

systemctl list-unit-files | grep enabled | grep systemd-resolved

If it is, check which resolv.conf file your cluster is using as reference:

ps auxww | grep kubelet

You might see a line like:

/usr/bin/kubelet ... --resolv-conf=/run/systemd/resolve/resolv.conf

The important part is --resolv-conf - we figure out if systemd resolv.conf is used, or not.

If it is the resolv.conf of systemd, do the following:

Check the content of /run/systemd/resolve/resolv.conf to see if there is a record like:

nameserver 127.0.0.1

If there is 127.0.0.1, it is the one causing the loop.

To get rid of it, you should not edit that file, but check other places to make it properly generated.

Check all files under /etc/systemd/network and if you find a record like

DNS=127.0.0.1

delete that record. Also check /etc/systemd/resolved.conf and do the same if needed. Make sure you have at least one or two DNS servers configured, such as

DNS=1.1.1.1 1.0.0.1

After doing all that, restart the systemd services to put your changes into effect:
systemctl restart systemd-networkd systemd-resolved

After that, verify that DNS=127.0.0.1 is no more in the resolv.conf file:

cat /run/systemd/resolve/resolv.conf

Finally, trigger re-creation of the DNS pods

kubectl -n kube-system delete pod -l k8s-app=kube-dns

Summary: The solution involves getting rid of what looks like a DNS lookup loop from the host DNS configuration. Steps vary between different resolv.conf managers/implementations.

chrisohaver

Thanks. It's also covered in the CoreDNS loop plugin readme ...

mengxifl

I have same problem , and another problem
1、mean i can not found dns . the error is
[ERROR] plugin/errors: 2 2115717704248378980.1120568170924441806. HINFO: unreachable backend: read udp 10.224.0.3:57088->8.8.8.8:53: i/o timeout
[ERROR] plugin/errors: 2 2115717704248378980.1120568170924441806. HINFO: unreachable backend: read udp 10.224.0.3:38819->172.16.254.1:53: i/o timeout
........

my /etc/resolv.com
noly have
nameserver 172.16.254.1 #this is my dns
nameserver 8.8.8.8 #another dns in net
i run

kubectl -n kube-system get deployment coredns -o yaml |
sed 's/allowPrivilegeEscalation: false/allowPrivilegeEscalation: true/g' |
kubectl apply -f -

then pod rebuild only have one error

[ERROR] plugin/errors: 2 10594135170717325.8545646296733374240. HINFO: unreachable backend: no upstream host

I don't know if that's normal . maybe

2、the coredns cannot found my api service . error is

kube-dns Failed to list *v1.Endpoints getsockopt: 10.96.0.1:6443 api connection refused

coredns restart again and again ,at last will CrashLoopBackOff

so i have to run coredns on master node i do that

kubectl edit deployment/coredns --namespace=kube-system
spec.template.spec
nodeSelector:
node-role.kubernetes.io/master: ""

I don't know if that's normal

at last give my env

Linux 4.20.10-1.el7.elrepo.x86_64 /// centos 7

docker Version: 18.09.3

[root@k8smaster00 ~]# docker image ls -a
REPOSITORY TAG IMAGE ID CREATED SIZE
k8s.gcr.io/kube-controller-manager v1.13.3 0482f6400933 6 weeks ago 146MB
k8s.gcr.io/kube-proxy v1.13.3 98db19758ad4 6 weeks ago 80.3MB
k8s.gcr.io/kube-apiserver v1.13.3 fe242e556a99 6 weeks ago 181MB
k8s.gcr.io/kube-scheduler v1.13.3 3a6f709e97a0 6 weeks ago 79.6MB
quay.io/coreos/flannel v0.11.0-amd64 ff281650a721 7 weeks ago 52.6MB
k8s.gcr.io/coredns 1.2.6 f59dcacceff4 4 months ago 40MB
k8s.gcr.io/etcd 3.2.24 3cab8e1b9802 6 months ago 220MB
k8s.gcr.io/pause 3.1 da86e6ba6ca1 15 months ago 742kB

kubenets is 1.13.3

I think this is a bug Expect an official update or a solution

chrisohaver

I have same problem ...

@mengxifl, Those errors are significantly different than the ones reported and discussed in this issue.

[ERROR] plugin/errors: 2 2115717704248378980.1120568170924441806. HINFO: unreachable backend: read udp 10.224.0.3:57088->8.8.8.8:53: i/o timeout
[ERROR] plugin/errors: 2 2115717704248378980.1120568170924441806. HINFO: unreachable backend: read udp 10.224.0.3:38819->172.16.254.1:53: i/o timeout

Those errors mean that the CoreDNS pod (and probably all other pods) cannot reach your nameservers. This suggests a networking problem in your cluster to the outside world. Possibly flannel misconfiguration or firewalls.

the coredns cannot found my api service ...
so i have to run coredns on master node

This is also not normal. If I understand you correctly, you are saying that CoreDNS can contact the API from the master node but not other nodes. This would suggest pod to service networking problems between nodes within your cluster - perhaps an issue with flannel configuration or firewalls.

mengxifl

I have same problem ...

@mengxifl, Those errors are significantly different than the ones reported and discussed in this issue.

[ERROR] plugin/errors: 2 2115717704248378980.1120568170924441806. HINFO: unreachable backend: read udp 10.224.0.3:57088->8.8.8.8:53: i/o timeout
[ERROR] plugin/errors: 2 2115717704248378980.1120568170924441806. HINFO: unreachable backend: read udp 10.224.0.3:38819->172.16.254.1:53: i/o timeout

Those errors mean that the CoreDNS pod (and probably all other pods) cannot reach your nameservers. This suggests a networking problem in your cluster to the outside world. Possibly flannel misconfiguration or firewalls.

the coredns cannot found my api service ...
so i have to run coredns on master node

This is also not normal. If I understand you correctly, you are saying that CoreDNS can contact the API from the master node but not other nodes. This would suggest pod to service networking problems between nodes within your cluster - perhaps an issue with flannel configuration or firewalls.

Thank you for your reply

maybe i should put up my yaml file

I use
kubeadm init --config=config.yaml

my config.yaml content is

apiVersion: kubeadm.k8s.io/v1alpha3
kind: InitConfiguration
apiEndpoint:
  advertiseAddress: "172.16.254.74"
  bindPort: 6443
---
apiVersion: kubeadm.k8s.io/v1alpha3
kind: ClusterConfiguration
kubernetesVersion: "v1.13.3"
etcd:
  external:
    endpoints:
    - "https://172.16.254.86:2379" 
    - "https://172.16.254.87:2379"
    - "https://172.16.254.88:2379"
    caFile: /etc/kubernetes/pki/etcd/ca.pem
    certFile: /etc/kubernetes/pki/etcd/client.pem
    keyFile: /etc/kubernetes/pki/etcd/client-key.pem
networking:
  podSubnet: "10.224.0.0/16"
  serviceSubnet: "10.96.0.0/12"
apiServerCertSANs:
- k8smaster00
- k8smaster01
- k8snode00
- k8snode01
- 172.16.254.74
- 172.16.254.79
- 172.16.254.80
- 172.16.254.81
- 172.16.254.85 #Vip
- 127.0.0.1
clusterName: "cluster"
controlPlaneEndpoint: "172.16.254.85:6443"

apiServerExtraArgs:
  service-node-port-range: 20-65535

my fannel yaml is default

https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

systemctl status firewalld
ALL node say
Unit firewalld.service could not be found.

cat /etc/sysconfig/iptables
ALL node say
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -p tcp -m tcp --dport 1:65535 -j ACCEPT
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT
-A OUTPUT -p tcp -m tcp --sport 1:65535 -j ACCEPT
-A FORWARD -p tcp -m tcp --dport 1:65535 -j ACCEPT
-A FORWARD -p tcp -m tcp --sport 1:65535 -j ACCEPT
COMMI

cat /etc/resolv.conf & ping bing.com
ALL node say
[1] 6330
nameserver 172.16.254.1
nameserver 8.8.8.8
PING bing.com (13.107.21.200) 56(84) bytes of data.
64 bytes from 13.107.21.200 (13.107.21.200): icmp_seq=2 ttl=111 time=149 ms

uname -rs
master node say
Linux 4.20.10-1.el7.elrepo.x86_64

uname -rs
slave node say
Linux 4.4.176-1.el7.elrepo.x86_64

so i don't think firewall have issue mybe fannel ? but i use default config . And maybe linux version . i don't know .

OK I run
/sbin/iptables -t nat -I POSTROUTING -s 10.224.0.0/16 -j MASQUERADE

on all my node that work for me . thanks

to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

neolit123

Labels

kind/documentationlifecycle/activepriority/important-soon

Type

No type

Projects

No projects

Milestone

v1.12
Closed Jan 4, 2019, 100% complete

Relationships

None yet

Development

troubleshooting-kubeadm: add guide for fixing stale CoreDNS podskubernetes/website

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CoreDNS not started with k8s 1.11 and weave (CentOS 7) #998

Is this a BUG REPORT or FEATURE REQUEST?

Versions

What happened?

13 remaining items

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

CoreDNS not started with k8s 1.11 and weave (CentOS 7) #998

Description

Is this a BUG REPORT or FEATURE REQUEST?

Versions

What happened?

Activity

neolit123 commented on Jul 17, 2018

timothysc commented on Jul 17, 2018

dims commented on Jul 17, 2018

carlosrmendes commented on Jul 17, 2018

chrisohaver commented on Jul 19, 2018

sectorsize512 commented on Jul 19, 2018

sectorsize512 commented on Jul 19, 2018

sectorsize512 commented on Jul 19, 2018

chrisohaver commented on Jul 23, 2018

carlosrmendes commented on Jul 23, 2018

chrisohaver commented on Jul 23, 2018

chrisohaver commented on Jul 23, 2018

Leozki commented on Jul 26, 2018

13 remaining items

chrisohaver commented on Aug 10, 2018

neolit123 commented on Aug 10, 2018

mydockergit commented on Jan 23, 2019

chrisohaver commented on Jan 23, 2019

mengxifl commented on Mar 20, 2019

chrisohaver commented on Mar 20, 2019

mengxifl commented on Mar 21, 2019

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions