DaemonsetController can't feel it when node has more resources, e.g. other Pod exits #46935

k82cn · 2017-06-04T12:07:48Z

Is this a BUG REPORT or FEATURE REQUEST? (choose one):
BUG REPORT

Kubernetes version (use kubectl version):
master branch

What happened:
This's a follow up of #45628, there's a case that: a node does not have enough cpu or memory resource to create the pod of ds, later I delete pods on it, and it satisfy the request now, but the ds controller could not feel it.

The text was updated successfully, but these errors were encountered:

k8s-github-robot · 2017-06-04T12:07:50Z

@k82cn There are no sig labels on this issue. Please add a sig label by:
(1) mentioning a sig: @kubernetes/sig-<team-name>-misc
(2) specifying the label manually: /sig <label>

Note: method (1) will trigger a notification to the team. You can find the team list here.

k82cn · 2017-06-04T12:09:41Z

/sig apps

k82cn · 2017-06-04T12:17:21Z

One option is similar with addNode, get all DS and check it; the performance is a concern: pod deletion is more frequent than node addition.

I'm thinking to handle it by #42002; let scheduler help to place the Pod when there's enough resource.

0xmichalis · 2017-06-04T12:30:54Z

@kubernetes/sig-apps-bugs @kubernetes/sig-scheduling-bugs

k82cn · 2017-06-04T12:52:33Z

/assign

janetkuo · 2017-07-19T18:39:00Z

We'd love to see this fixed in 1.8. It's unlikely we'll have #42002 done in 1.8, so @k82cn would you be willing to fix it with the naive solution first?

k82cn · 2017-07-20T00:13:55Z

@janetkuo , sure, np. let's fix it before #42002.

lukaszo · 2017-07-20T10:58:24Z

I've always thought that there is some kind of periodic resync which checks all daemons and should handle this issue. It's even mentioned in some code comments.
Does it even work? Or how does it work?

enisoc · 2017-07-20T22:55:59Z

@lukaszo wrote:

I've always thought that there is some kind of periodic resync [...]

I thought so too, but as far as I can tell, the default period is 12 hours and the randomized value could be up to 2x that, so we can't really rely on it for this kind of problem.

kubernetes/cmd/kube-controller-manager/app/options/options.go

Line 84 in b66be98

    
           MinResyncPeriod:                                 metav1.Duration{Duration: 12 * time.Hour},

k82cn · 2017-07-21T02:23:04Z

@lukaszo , @enisoc , I think this's for reflector sync. In DaemonSet, there are runWorkers running to sync up DaemonSet; but it's triggered by event, e.g. AddNode, DaemonSet's Pod deleted. If any misunderstanding, please correct me :).

enisoc · 2017-07-21T17:44:57Z

As far as I can tell, the min-resync-period flag (default 12hrs) ends up being used as the defaultEventHandlerResyncPeriod, which is the resync period you get if you use AddEventHandler() (like DaemonSet does) instead of AddEventHandlerWithResyncPeriod() (like StatefulSet does).

So I think this explains why DaemonSet does not appear to resync. It's using the default period which is a random value between 12h and 24h.

0xmichalis · 2017-07-21T18:28:44Z

Eventually, statefulsets should also be pushed to use AddEventHandler. We don't want controller loops to depend on tight resync intervals. The resource handlers on secondary caches are paramount for complementing the resync logic of the main controllers, eg. once a Node has more resources (status change on the Node?), a watch event in the secondary (node) cache of the DS controller should trigger a resync of all the DaemonSets that want to run on that node.

k82cn · 2017-07-22T02:18:14Z

once a Node has more resources (status change on the Node?)

nop, it dependents on pod's event (e.g. add, update, delete) :(. It avoid too many info in Node object. In scheduler, it cache those info in schedulercache; and uses those info by PodFitResource predicate.

In DaemonSet's deletePod handler, we only care about the DS pod and enqueue DS; if other pod in Node is deleted, DS controller will not do anything.

Automatic merge from submit-queue (batch tested with PRs 49488, 50407, 46105, 50456, 50258) Requeue DaemonSets if non-daemon pods were deleted. **What this PR does / why we need it**: Requeue DaemonSets if no daemon pods were deleted. **Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #46935 **Release note**: ```release-note None ```

k8s-github-robot added the needs-sig label Jun 4, 2017

k8s-ci-robot added the sig/apps label Jun 4, 2017

k8s-github-robot removed the needs-sig label Jun 4, 2017

0xmichalis added the area/workload-api/daemonset label Jun 4, 2017

k8s-ci-robot added sig/scheduling kind/bug labels Jun 4, 2017

k8s-ci-robot assigned k82cn Jun 4, 2017

kow3ns mentioned this issue Jul 17, 2017

DaemonSet to support "add first, then delete" rolling update #48841

Closed

janetkuo added this to the v1.8 milestone Jul 19, 2017

k82cn mentioned this issue Jul 24, 2017

Requeue DaemonSets if non-daemon pods were deleted. #49488

Merged

k8s-github-robot closed this as completed in #49488 Aug 11, 2017

chentao1596 mentioned this issue Jan 26, 2018

Daemonset can not create the needed pod when the node has enough resource after the other daemonset delete #58868

Closed

guevara mentioned this issue Apr 29, 2019

详解 Kubernetes DaemonSet 的实现原理 guevara/read-it-later#3507

Open

yf2008 mentioned this issue May 8, 2020

详解 Kubernetes DaemonSet 的实现原理 yf2008/duty-machine#132

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DaemonsetController can't feel it when node has more resources, e.g. other Pod exits #46935

DaemonsetController can't feel it when node has more resources, e.g. other Pod exits #46935

k82cn commented Jun 4, 2017

k8s-github-robot commented Jun 4, 2017

Uh oh!

k82cn commented Jun 4, 2017

Uh oh!

k82cn commented Jun 4, 2017

Uh oh!

0xmichalis commented Jun 4, 2017

Uh oh!

k82cn commented Jun 4, 2017

Uh oh!

janetkuo commented Jul 19, 2017

Uh oh!

k82cn commented Jul 20, 2017

Uh oh!

lukaszo commented Jul 20, 2017

Uh oh!

enisoc commented Jul 20, 2017

Uh oh!

k82cn commented Jul 21, 2017

Uh oh!

enisoc commented Jul 21, 2017

Uh oh!

0xmichalis commented Jul 21, 2017

Uh oh!

k82cn commented Jul 22, 2017

Uh oh!

DaemonsetController can't feel it when node has more resources, e.g. other Pod exits #46935

DaemonsetController can't feel it when node has more resources, e.g. other Pod exits #46935

Comments

k82cn commented Jun 4, 2017

k8s-github-robot commented Jun 4, 2017

Uh oh!

k82cn commented Jun 4, 2017

Uh oh!

k82cn commented Jun 4, 2017

Uh oh!

0xmichalis commented Jun 4, 2017

Uh oh!

k82cn commented Jun 4, 2017

Uh oh!

janetkuo commented Jul 19, 2017

Uh oh!

k82cn commented Jul 20, 2017

Uh oh!

lukaszo commented Jul 20, 2017

Uh oh!

enisoc commented Jul 20, 2017

Uh oh!

k82cn commented Jul 21, 2017

Uh oh!

enisoc commented Jul 21, 2017

Uh oh!

0xmichalis commented Jul 21, 2017

Uh oh!

k82cn commented Jul 22, 2017

Uh oh!