root cause kernel soft lockups 

We're seeing an abnormal number of soft lockups, pegged cpus and unusable nodes recently. The last repros were from @bowei and @freehan. 

**Symptoms**
Kernel logs showed ebtables related traces, CPU was pegged at 100% and remained there, no ssh, basically no clarity till the node was reset. Looking back through older test logs, there are several failures that had NotReady nodes that MIGHT be the same bug. 

**Actions**
Minhan's working on a repro and @dchen1107 is trying to figure out when this spike happened by spelunking test logs. 

We've been syncing iptables rules from kube-proxy more often than we need to for a couple of releases (https://github.com/kubernetes/kubernetes/issues/26637), this is the only thing that springs to mind that might cause cpu spikes. 

We should probably try to mitigate for 1.5, marking as release-blocker till we have a better handle. 
@saad-ali @kubernetes/sig-network @kubernetes/sig-node 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

root cause kernel soft lockups #37853

140 remaining items

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

root cause kernel soft lockups #37853

Description

Activity

saad-ali commented on Dec 2, 2016

robin-anil commented on Dec 3, 2016

140 remaining items

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions