Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DaemonsetController can't feel it when node has more resources, e.g. other Pod exits #46935

Closed
k82cn opened this issue Jun 4, 2017 · 13 comments · Fixed by #49488
Closed

DaemonsetController can't feel it when node has more resources, e.g. other Pod exits #46935

k82cn opened this issue Jun 4, 2017 · 13 comments · Fixed by #49488
Assignees
Labels
area/workload-api/daemonset kind/bug Categorizes issue or PR as related to a bug. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.
Milestone

Comments

@k82cn
Copy link
Member

k82cn commented Jun 4, 2017

Is this a BUG REPORT or FEATURE REQUEST? (choose one):
BUG REPORT

Kubernetes version (use kubectl version):
master branch

What happened:
This's a follow up of #45628, there's a case that: a node does not have enough cpu or memory resource to create the pod of ds, later I delete pods on it, and it satisfy the request now, but the ds controller could not feel it.

@k8s-github-robot
Copy link

@k82cn There are no sig labels on this issue. Please add a sig label by:
(1) mentioning a sig: @kubernetes/sig-<team-name>-misc
(2) specifying the label manually: /sig <label>

Note: method (1) will trigger a notification to the team. You can find the team list here.

@k8s-github-robot k8s-github-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jun 4, 2017
@k82cn
Copy link
Member Author

k82cn commented Jun 4, 2017

/sig apps

@k8s-ci-robot k8s-ci-robot added the sig/apps Categorizes an issue or PR as relevant to SIG Apps. label Jun 4, 2017
@k8s-github-robot k8s-github-robot removed the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jun 4, 2017
@k82cn
Copy link
Member Author

k82cn commented Jun 4, 2017

One option is similar with addNode, get all DS and check it; the performance is a concern: pod deletion is more frequent than node addition.

I'm thinking to handle it by #42002; let scheduler help to place the Pod when there's enough resource.

@0xmichalis
Copy link
Contributor

@kubernetes/sig-apps-bugs @kubernetes/sig-scheduling-bugs

@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. kind/bug Categorizes issue or PR as related to a bug. labels Jun 4, 2017
@k82cn
Copy link
Member Author

k82cn commented Jun 4, 2017

/assign

@janetkuo
Copy link
Member

We'd love to see this fixed in 1.8. It's unlikely we'll have #42002 done in 1.8, so @k82cn would you be willing to fix it with the naive solution first?

@k82cn
Copy link
Member Author

k82cn commented Jul 20, 2017

@janetkuo , sure, np. let's fix it before #42002.

@lukaszo
Copy link
Contributor

lukaszo commented Jul 20, 2017

I've always thought that there is some kind of periodic resync which checks all daemons and should handle this issue. It's even mentioned in some code comments.
Does it even work? Or how does it work?

@enisoc
Copy link
Member

enisoc commented Jul 20, 2017

@lukaszo wrote:

I've always thought that there is some kind of periodic resync [...]

I thought so too, but as far as I can tell, the default period is 12 hours and the randomized value could be up to 2x that, so we can't really rely on it for this kind of problem.

MinResyncPeriod: metav1.Duration{Duration: 12 * time.Hour},

@k82cn
Copy link
Member Author

k82cn commented Jul 21, 2017

@lukaszo , @enisoc , I think this's for reflector sync. In DaemonSet, there are runWorkers running to sync up DaemonSet; but it's triggered by event, e.g. AddNode, DaemonSet's Pod deleted. If any misunderstanding, please correct me :).

@enisoc
Copy link
Member

enisoc commented Jul 21, 2017

As far as I can tell, the min-resync-period flag (default 12hrs) ends up being used as the defaultEventHandlerResyncPeriod, which is the resync period you get if you use AddEventHandler() (like DaemonSet does) instead of AddEventHandlerWithResyncPeriod() (like StatefulSet does).

So I think this explains why DaemonSet does not appear to resync. It's using the default period which is a random value between 12h and 24h.

@0xmichalis
Copy link
Contributor

Eventually, statefulsets should also be pushed to use AddEventHandler. We don't want controller loops to depend on tight resync intervals. The resource handlers on secondary caches are paramount for complementing the resync logic of the main controllers, eg. once a Node has more resources (status change on the Node?), a watch event in the secondary (node) cache of the DS controller should trigger a resync of all the DaemonSets that want to run on that node.

@k82cn
Copy link
Member Author

k82cn commented Jul 22, 2017

once a Node has more resources (status change on the Node?)

nop, it dependents on pod's event (e.g. add, update, delete) :(. It avoid too many info in Node object. In scheduler, it cache those info in schedulercache; and uses those info by PodFitResource predicate.

In DaemonSet's deletePod handler, we only care about the DS pod and enqueue DS; if other pod in Node is deleted, DS controller will not do anything.

k8s-github-robot pushed a commit that referenced this issue Aug 11, 2017
Automatic merge from submit-queue (batch tested with PRs 49488, 50407, 46105, 50456, 50258)

Requeue DaemonSets if non-daemon pods were deleted.

**What this PR does / why we need it**:
Requeue DaemonSets if no daemon pods were deleted.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #46935

**Release note**:

```release-note
None
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/workload-api/daemonset kind/bug Categorizes issue or PR as related to a bug. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants