Closed
Description
Is this a BUG REPORT or FEATURE REQUEST?
BUG REPORT
Versions
kubeadm version 1.11
Environment:
- Kubernetes version (use
kubectl version
): 1.11 - Cloud provider or hardware configuration: aws ec2 with (16vcpus 64gb RAM)
- OS (e.g. from /etc/os-release): centos 7
- Kernel (e.g.
uname -a
): 3.10.0-693.17.1.el7.x86_64 - Others: weave as cni add-on
What happened?
after kubeadm init the coreos pods stay in Error
NAME READY STATUS RESTARTS AGE
coredns-78fcdf6894-ljdjp 0/1 Error 6 9m
coredns-78fcdf6894-p6flm 0/1 Error 6 9m
etcd-master 1/1 Running 0 8m
heapster-5bbdfbff9f-h5h2n 1/1 Running 0 9m
kube-apiserver-master 1/1 Running 0 8m
kube-controller-manager-master 1/1 Running 0 8m
kube-proxy-5642r 1/1 Running 0 9m
kube-scheduler-master 1/1 Running 0 8m
kubernetes-dashboard-6948bdb78-bwkvx 1/1 Running 0 9m
weave-net-r5jkg 2/2 Running 0 9m
The logs of both pods show the following:
standard_init_linux.go:178: exec user process caused "operation not permitted"
Activity
neolit123 commentedon Jul 17, 2018
@kubernetes/sig-network-bugs
@carlosmkb, what is your docker version?
timothysc commentedon Jul 17, 2018
I find this hard to believe, we pretty extensively test CentOS 7 on our side.
Do you have the system and pod logs?
dims commentedon Jul 17, 2018
this one? https://stackoverflow.com/questions/44127247/does-anyone-know-a-workaround-for-no-new-privileges-blocking-selinux-transitions
carlosrmendes commentedon Jul 17, 2018
@dims , can make sense, I will try
@neolit123 and @timothysc
docker version: docker-1.13.1-63.git94f4240.el7.centos.x86_64
coredns pods log:
standard_init_linux.go:178: exec user process caused "operation not permitted"
system log
journalctl -xeu kubelet
:chrisohaver commentedon Jul 19, 2018
Found a couple instances of the same errors reported in other scenarios in the past.
Might try removing "allowPrivilegeEscalation: false" from the CoreDNS deployment to see if that helps.
sectorsize512 commentedon Jul 19, 2018
Same issue for me. Similar setup CentOS 7.4.1708, Docker version 1.13.1, build 94f4240/1.13.1 (comes with CentOS):
sectorsize512 commentedon Jul 19, 2018
just in case, selinux is in permissive mode on all nodes.
sectorsize512 commentedon Jul 19, 2018
An I'm using Calico (not weave as @carlosmkb).
chrisohaver commentedon Jul 23, 2018
Ah - This is an error from kubectl when trying to get the logs, not the contents of the logs...
carlosrmendes commentedon Jul 23, 2018
@chrisohaver the
kubectl logs
works with another kube-system podschrisohaver commentedon Jul 23, 2018
OK - have you tried removing "allowPrivilegeEscalation: false" from the CoreDNS deployment to see if that helps?
chrisohaver commentedon Jul 23, 2018
... does a
kubectl describe
of the coredns pod show anything interesting?Leozki commentedon Jul 26, 2018
Same issue for me.
CentOS Linux release 7.5.1804 (Core)
Docker version 1.13.1, build dded712/1.13.1
flannel as cni add-on
13 remaining items
chrisohaver commentedon Aug 10, 2018
Thats fine. We should perhaps mention that there are negative security implications when disabling SELinux, or changing the allowPrivilegeEscalation setting.
The most secure solution is to upgrade Docker to the version that Kubernetes recommends (17.03)
neolit123 commentedon Aug 10, 2018
@chrisohaver
understood, will amend the copy and submit a PR for this.
mydockergit commentedon Jan 23, 2019
There is also answer for that in stackoverflow:
https://stackoverflow.com/questions/53075796/coredns-pods-have-crashloopbackoff-or-error-state
This error
is caused when CoreDNS detects a loop in the resolve configuration, and it is the intended behavior. You are hitting this issue:
#1162
coredns/coredns#2087
Hacky solution: Disable the CoreDNS loop detection
Edit the CoreDNS configmap:
Remove or comment out the line with
loop
, save and exit.Then remove the CoreDNS pods, so new ones can be created with new config:
All should be fine after that.
Preferred Solution: Remove the loop in the DNS configuration
First, check if you are using
systemd-resolved
. If you are running Ubuntu 18.04, it is probably the case.If it is, check which
resolv.conf
file your cluster is using as reference:You might see a line like:
The important part is
--resolv-conf
- we figure out if systemd resolv.conf is used, or not.If it is the
resolv.conf
ofsystemd
, do the following:Check the content of
/run/systemd/resolve/resolv.conf
to see if there is a record like:If there is
127.0.0.1
, it is the one causing the loop.To get rid of it, you should not edit that file, but check other places to make it properly generated.
Check all files under
/etc/systemd/network
and if you find a record likedelete that record. Also check
/etc/systemd/resolved.conf
and do the same if needed. Make sure you have at least one or two DNS servers configured, such asAfter doing all that, restart the systemd services to put your changes into effect:
systemctl restart systemd-networkd systemd-resolved
After that, verify that
DNS=127.0.0.1
is no more in theresolv.conf
file:Finally, trigger re-creation of the DNS pods
Summary: The solution involves getting rid of what looks like a DNS lookup loop from the host DNS configuration. Steps vary between different resolv.conf managers/implementations.
chrisohaver commentedon Jan 23, 2019
Thanks. It's also covered in the CoreDNS loop plugin readme ...
mengxifl commentedon Mar 20, 2019
I have same problem , and another problem
1、mean i can not found dns . the error is
[ERROR] plugin/errors: 2 2115717704248378980.1120568170924441806. HINFO: unreachable backend: read udp 10.224.0.3:57088->8.8.8.8:53: i/o timeout
[ERROR] plugin/errors: 2 2115717704248378980.1120568170924441806. HINFO: unreachable backend: read udp 10.224.0.3:38819->172.16.254.1:53: i/o timeout
........
my /etc/resolv.com
noly have
nameserver 172.16.254.1 #this is my dns
nameserver 8.8.8.8 #another dns in net
i run
kubectl -n kube-system get deployment coredns -o yaml |
sed 's/allowPrivilegeEscalation: false/allowPrivilegeEscalation: true/g' |
kubectl apply -f -
then pod rebuild only have one error
[ERROR] plugin/errors: 2 10594135170717325.8545646296733374240. HINFO: unreachable backend: no upstream host
I don't know if that's normal . maybe
2、the coredns cannot found my api service . error is
kube-dns Failed to list *v1.Endpoints getsockopt: 10.96.0.1:6443 api connection refused
coredns restart again and again ,at last will CrashLoopBackOff
so i have to run coredns on master node i do that
kubectl edit deployment/coredns --namespace=kube-system
spec.template.spec
nodeSelector:
node-role.kubernetes.io/master: ""
I don't know if that's normal
at last give my env
Linux 4.20.10-1.el7.elrepo.x86_64 /// centos 7
docker Version: 18.09.3
[root@k8smaster00 ~]# docker image ls -a
REPOSITORY TAG IMAGE ID CREATED SIZE
k8s.gcr.io/kube-controller-manager v1.13.3 0482f6400933 6 weeks ago 146MB
k8s.gcr.io/kube-proxy v1.13.3 98db19758ad4 6 weeks ago 80.3MB
k8s.gcr.io/kube-apiserver v1.13.3 fe242e556a99 6 weeks ago 181MB
k8s.gcr.io/kube-scheduler v1.13.3 3a6f709e97a0 6 weeks ago 79.6MB
quay.io/coreos/flannel v0.11.0-amd64 ff281650a721 7 weeks ago 52.6MB
k8s.gcr.io/coredns 1.2.6 f59dcacceff4 4 months ago 40MB
k8s.gcr.io/etcd 3.2.24 3cab8e1b9802 6 months ago 220MB
k8s.gcr.io/pause 3.1 da86e6ba6ca1 15 months ago 742kB
kubenets is 1.13.3
I think this is a bug Expect an official update or a solution
chrisohaver commentedon Mar 20, 2019
@mengxifl, Those errors are significantly different than the ones reported and discussed in this issue.
Those errors mean that the CoreDNS pod (and probably all other pods) cannot reach your nameservers. This suggests a networking problem in your cluster to the outside world. Possibly flannel misconfiguration or firewalls.
This is also not normal. If I understand you correctly, you are saying that CoreDNS can contact the API from the master node but not other nodes. This would suggest pod to service networking problems between nodes within your cluster - perhaps an issue with flannel configuration or firewalls.
mengxifl commentedon Mar 21, 2019
Thank you for your reply
maybe i should put up my yaml file
I use
kubeadm init --config=config.yaml
my config.yaml content is
my fannel yaml is default
https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
systemctl status firewalld
ALL node say
Unit firewalld.service could not be found.
cat /etc/sysconfig/iptables
ALL node say
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -p tcp -m tcp --dport 1:65535 -j ACCEPT
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT
-A OUTPUT -p tcp -m tcp --sport 1:65535 -j ACCEPT
-A FORWARD -p tcp -m tcp --dport 1:65535 -j ACCEPT
-A FORWARD -p tcp -m tcp --sport 1:65535 -j ACCEPT
COMMI
cat /etc/resolv.conf & ping bing.com
ALL node say
[1] 6330
nameserver 172.16.254.1
nameserver 8.8.8.8
PING bing.com (13.107.21.200) 56(84) bytes of data.
64 bytes from 13.107.21.200 (13.107.21.200): icmp_seq=2 ttl=111 time=149 ms
uname -rs
master node say
Linux 4.20.10-1.el7.elrepo.x86_64
uname -rs
slave node say
Linux 4.4.176-1.el7.elrepo.x86_64
so i don't think firewall have issue mybe fannel ? but i use default config . And maybe linux version . i don't know .
OK I run
/sbin/iptables -t nat -I POSTROUTING -s 10.224.0.0/16 -j MASQUERADE
on all my node that work for me . thanks