Skip to content

Make periodic NodeStatus updates cheaper #14733

Closed
@gmarek

Description

@gmarek
Contributor

We're currently sending whole NodeStatuses only to update a Node heartbeat. This a big source of traffic in the cluster, and one of the possible causes of 1000-node cluster failures. We need to extract heartbeat to a new API 'Heartbeat' object with timestamp and object reference only, and make Kubelet/NodeController to use this object instead of NodeStatus to determine health of the Node.

cc: @wojtek-t @lavalamp @bgrant0607 @brendandburns @smarterclayton @timothysc @davidopp @fgrzadkowski

Activity

self-assigned this
on Sep 29, 2015
wojtek-t

wojtek-t commented on Sep 29, 2015

@wojtek-t
Member
smarterclayton

smarterclayton commented on Sep 29, 2015

@smarterclayton
Contributor

Why not just add a heartbeat subresource to the node that performs the
operation?

EDIT: to do the update, but continue to just retrieve nodes as is. I don't see the need to retrieve a Heartbeat, just to set it. We don't have a separate pod binding object, we just have the "bind" verb on the pod that mutates nodeName

gmarek

gmarek commented on Sep 29, 2015

@gmarek
ContributorAuthor

That's the plan for the implementation, but we still need to store it somewhere, hence the need for an API object.

smarterclayton

smarterclayton commented on Sep 29, 2015

@smarterclayton
Contributor

Why wouldn't the store continue to be on the node? Virtual resources like scale / bind / etc already handle this. Maybe that's what you're suggesting, just want to be sure.

smarterclayton

smarterclayton commented on Sep 29, 2015

@smarterclayton
Contributor

I.e. PUT /nodes/foo/heartbeat simply sets lastProbeTime and accepts the "Heartbeat" virtual sub resource you described.

wojtek-t

wojtek-t commented on Sep 29, 2015

@wojtek-t
Member

But this would require changing the implementation of how subresources are implemented. The problem is that currently we just send the whole object in that case (e.g. whole pod in case of binding). We would need efficient PATCH operation that would send only data that is changing (not the whole object).

smarterclayton

smarterclayton commented on Sep 29, 2015

@smarterclayton
Contributor

Are you looking to reduce UPDATE cost, or WATCH of updated node cost?

Binding is a minimal resource - it's just the object reference for the target node and the name of the pod. For the former, you would POST/PUT heartbeat "lastProbeTime" to the apiserver, which would run a guaranteed update on node that only alters the lastProbeTime. The kubelets would only send a minimal object, but anyone watching would still get the update.

For the latter, if the resources are truly split, wouldn't that require the node controller to fetch and watch two objects in order to stitch that data together? I didn't think watch on node was the problem, which is why I assume you meant the former.

wojtek-t

wojtek-t commented on Sep 29, 2015

@wojtek-t
Member

Yes - I mostly meant the former.

For the watch - with the new watch in apiserver - we will read the data from etcd only once and then send it only to interested watchers (and there should be constant number of those).

So yes - I'm mostly worried about write path (not read path).

smarterclayton

smarterclayton commented on Sep 29, 2015

@smarterclayton
Contributor

Binding is modeled today as:

POST /v1/namespaces/<namespace>/pods/<pod>/binding {"kind":"Binding", "metadata": {"name": "<pod>"}, "to": {"name": "node1"}}

The result of that is persisted into the pod in etcd via the GuaranteedUpdate loop.

PUT /v1/namespaces/<namespace>/nodes/<node>/heartbeat {"kind":"Heartbeat", "metadata": {"name": "<node>"}, "timestamp": "2015-10-09 12:03:00Z"}

Could be very similar.

117 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Labels

area/apiIndicates an issue on api area.area/kubeletarea/nodecontrollerkind/featureCategorizes issue or PR as related to a new feature.lifecycle/activeIndicates that an issue or PR is actively being worked on by a contributor.priority/important-soonMust be staffed and worked on either currently, or very soon, ideally in time for the next release.sig/nodeCategorizes an issue or PR as relevant to SIG Node.sig/scalabilityCategorizes an issue or PR as relevant to SIG Scalability.

Type

No type

Projects

No projects

Relationships

None yet

    Development

    No branches or pull requests

      Participants

      @spiffxp@timothysc@lavalamp@mtaufen@liggitt

      Issue actions

        Make periodic NodeStatus updates cheaper · Issue #14733 · kubernetes/kubernetes