Node flapping between Ready/NotReady with PLEG issues

**Is this a request for help?** No


**What keywords did you search in Kubernetes issues before filing this one?** (If you have found any duplicates, you should instead reply there.): PLEG NotReady kubelet

---

**Is this a BUG REPORT or FEATURE REQUEST?** Bug



**Kubernetes version** (use `kubectl version`): 1.6.2


**Environment**:
- **Cloud provider or hardware configuration**: CoreOS on AWS
- **OS** (e.g. from /etc/os-release):CoreOS 1353.7.0
- **Kernel** (e.g. `uname -a`): 4.9.24-coreos
- **Install tools**:
- **Others**:


**What happened**:

I have a 3-worker cluster. Two and sometimes all three nodes keep dropping into `NotReady`with the following messages in `journalctl -u kubelet`:

```
May 05 13:59:56 ip-10-50-20-208.ec2.internal kubelet[2858]: I0505 13:59:56.872880    2858 kubelet_node_status.go:379] Recording NodeNotReady event message for node ip-10-50-20-208.ec2.internal
May 05 13:59:56 ip-10-50-20-208.ec2.internal kubelet[2858]: I0505 13:59:56.872908    2858 kubelet_node_status.go:682] Node became not ready: {Type:Ready Status:False LastHeartbeatTime:2017-05-05 13:59:56.872865742 +0000 UTC LastTransitionTime:2017-05-05 13:59:56.872865742 +0000 UTC Reason:KubeletNotReady Message:PLEG is not healthy: pleg was last seen active 3m7.629592089s ago; threshold is 3m0s}
May 05 14:07:57 ip-10-50-20-208.ec2.internal kubelet[2858]: I0505 14:07:57.598132    2858 kubelet_node_status.go:379] Recording NodeNotReady event message for node ip-10-50-20-208.ec2.internal
May 05 14:07:57 ip-10-50-20-208.ec2.internal kubelet[2858]: I0505 14:07:57.598162    2858 kubelet_node_status.go:682] Node became not ready: {Type:Ready Status:False LastHeartbeatTime:2017-05-05 14:07:57.598117026 +0000 UTC LastTransitionTime:2017-05-05 14:07:57.598117026 +0000 UTC Reason:KubeletNotReady Message:PLEG is not healthy: pleg was last seen active 3m7.346983738s ago; threshold is 3m0s}
May 05 14:17:58 ip-10-50-20-208.ec2.internal kubelet[2858]: I0505 14:17:58.536101    2858 kubelet_node_status.go:379] Recording NodeNotReady event message for node ip-10-50-20-208.ec2.internal
May 05 14:17:58 ip-10-50-20-208.ec2.internal kubelet[2858]: I0505 14:17:58.536134    2858 kubelet_node_status.go:682] Node became not ready: {Type:Ready Status:False LastHeartbeatTime:2017-05-05 14:17:58.536086605 +0000 UTC LastTransitionTime:2017-05-05 14:17:58.536086605 +0000 UTC Reason:KubeletNotReady Message:PLEG is not healthy: pleg was last seen active 3m7.275467289s ago; threshold is 3m0s}
May 05 14:29:59 ip-10-50-20-208.ec2.internal kubelet[2858]: I0505 14:29:59.648922    2858 kubelet_node_status.go:379] Recording NodeNotReady event message for node ip-10-50-20-208.ec2.internal
May 05 14:29:59 ip-10-50-20-208.ec2.internal kubelet[2858]: I0505 14:29:59.648952    2858 kubelet_node_status.go:682] Node became not ready: {Type:Ready Status:False LastHeartbeatTime:2017-05-05 14:29:59.648910669 +0000 UTC LastTransitionTime:2017-05-05 14:29:59.648910669 +0000 UTC Reason:KubeletNotReady Message:PLEG is not healthy: pleg was last seen active 3m7.377520804s ago; threshold is 3m0s}
May 05 14:44:00 ip-10-50-20-208.ec2.internal kubelet[2858]: I0505 14:44:00.938266    2858 kubelet_node_status.go:379] Recording NodeNotReady event message for node ip-10-50-20-208.ec2.internal
May 05 14:44:00 ip-10-50-20-208.ec2.internal kubelet[2858]: I0505 14:44:00.938297    2858 kubelet_node_status.go:682] Node became not ready: {Type:Ready Status:False LastHeartbeatTime:2017-05-05 14:44:00.938251338 +0000 UTC LastTransitionTime:2017-05-05 14:44:00.938251338 +0000 UTC Reason:KubeletNotReady Message:PLEG is not healthy: pleg was last seen active 3m7.654775919s ago; threshold is 3m0s}
```

docker daemon is fine (local `docker ps`, `docker images`, etc. all work and respond immediately). 

using weave networking installed via `kubectl apply -f https://git.io/weave-kube-1.6`

**What you expected to happen**:

Nodes to be ready.


**How to reproduce it** (as minimally and precisely as possible):

Wish I knew how!


**Anything else we need to know**:

All of the nodes (workers and masters) on same private subnet with NAT gateway to Internet. Workers in security group that allows unlimited access (all ports) from masters security group; masters allow all ports from same subnet. proxy is running on workers; apiserver, controller-manager, scheduler on masters. 

`kubectl logs` and `kubectl exec` always hang, even when run from the master itself (or from outside).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Node flapping between Ready/NotReady with PLEG issues #45419

287 remaining items

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Node flapping between Ready/NotReady with PLEG issues #45419

Description

Activity

yujuhong commented on May 5, 2017

deitch commented on May 5, 2017

deitch commented on May 5, 2017

deitch commented on May 5, 2017

qiujian16 commented on May 11, 2017

deitch commented on May 11, 2017

qiujian16 commented on May 11, 2017

deitch commented on May 11, 2017

qiujian16 commented on May 11, 2017

yujuhong commented on May 11, 2017

bjhaid commented on May 11, 2017

deitch commented on May 11, 2017

bjhaid commented on May 11, 2017

deitch commented on May 11, 2017

287 remaining items

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions