CRI: add methods for container stats #45614

yujuhong · 2017-05-10T21:32:13Z

What this PR does / why we need it:
Define methods in CRI to get container stats.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged):
Part of kubernetes/enhancements#290; addresses #27097

Special notes for your reviewer:
This PR defines the minimum required container metrics for the existing components to function, loosely based on the previous discussion on core metrics as well as the existing cadvisor/summary APIs.

Two new RPC calls are added to the RuntimeService: ContainerStats and ListContainerStats. The former retrieves stats for a given container, while the latter gets stats for all containers in one call.

The stats gathering time of each subsystem can vary substantially (e.g., cpu vs. disk), so even though the on-demand model preferred due to its simplicity, we’d rather give the container runtime more flexibility to determine the collection frequency for each subsystem*. As a trade-off, each piece of stats for the subsystem must contain a timestamp to let kubelet know how fresh/recent the stats are. In the future, we should also recommend a guideline for how recent the stats should be in order to ensure the reliability (e.g., eviction) and the responsiveness (e.g., autoscaling) of the kubernetes cluster.

The next step is to plumb this through kubelet so that kubelet can choose consume container stats from CRI or cadvisor.

*Alternatively, we can add calls to get stats of individual subsystems. However, kubelet does not have the complete knowledge of the runtime environment, so this would only lead to unnecessary complexity in kubelet.

Release note:

Augment CRI to support retrieving container stats from the runtime.

k8s-reviewable · 2017-05-10T21:32:19Z

This change is

yujuhong · 2017-05-10T21:37:37Z

Feel free to review and leave comments: @dashpole @vishh @timstclair @Random-Liu @feiskyer @mrunalp

/cc @michmike based on #27097 (comment). Note that CRI didn't start with the windows use case, so the API will need quite a lot of changes to support windows. This PR alone should not affect the current status of support of windows containers.

dashpole

just one nit. Otherwise looks good

dashpole · 2017-05-10T21:46:03Z

pkg/kubelet/api/v1alpha1/runtime/api.proto

+    UInt64Value UsageCoreNanoSeocnds = 2;
+}
+
+// CpuUsage provides the memory usage information.


s/CpuUsage/MemoryUsage

timstclair

I don't really like the async + timestamp model of these stats (I know it's what we use now with cAdvisor), and I'm wondering whether we can get away from it.

What if:

(List)ContainerStats is expected to be slow, and on-demand.
Requests are for specific stats (maybe they should even be separate requests?)
Kubelet handles caching

I guess I'm mostly wondering how much flexibility the current model adds over the above. Can you think of an example of where the async scan method we currently use would be preferred?

I'm just brainstorming here - Interested in hearing everyone's thoughts.

Also, are we going to need any "core" monitoring of non-standard resources (e.g. GPU), or will the Kubelet not need usage information for those?

timstclair · 2017-05-10T22:55:20Z

pkg/kubelet/api/v1alpha1/runtime/api.proto

+message CpuUsage {
+    // Timestamp in nanoseconds at which the information were collected. Must be > 0.
+    int64 timestamp = 1;
+    // Cumulative CPU usage (sum of all cores) since object creation.


s/object/container/ ? Do you expect these messages to be reused with other object types?

Since the message doesn't have "Container" in the name, I thought we could keep the term neutral.

nit: sum across all cores

timstclair · 2017-05-10T22:57:59Z

pkg/kubelet/api/v1alpha1/runtime/api.proto

@@ -71,6 +71,12 @@ service RuntimeService {
    // PortForward prepares a streaming endpoint to forward ports from a PodSandbox.
    rpc PortForward(PortForwardRequest) returns (PortForwardResponse) {}

+    // ContainerStats returns stats of the container. If the container does not
+    // exist, the call returns an error.


Should we add any expectations about how long this call is allowed to take? E.g. if it can take a few seconds to get disk stats, can this do so on demand, or is it expected to return the last known value?

yujuhong · 2017-05-11T00:03:33Z

I don't really like the async + timestamp model of these stats (I know it's what we use now with cAdvisor), and I'm wondering whether we can get away from it.

What if:

(List)ContainerStats is expected to be slow, and on-demand.
Requests are for specific stats (maybe they should even be separate requests?)
Kubelet handles caching
I guess I'm mostly wondering how much flexibility the current model adds over the above. Can you think of an example of where the async scan method we currently use would be preferred?

Good question. I agree that the on-demand model is simpler and gives kubelet more control (if it caches), but I have a few reservations.

Runtimes could gather the stats in different ways, and some of them may already rely on cached data from cadvisor or Prometheus (e.g., containerd). I am not certain whether we can enforce all containers runtimes to move to the on-demand model right now. Another concern is the scalability and performance since kubelet would be making N calls for each container, where N number of categories of stats.

Also, runtimes should have better ideas about how long it takes to gather specific stats, so (I assume) they can adjust the frequency better than kubelet if needed.

I'm just brainstorming here - Interested in hearing everyone's thoughts.

Me too.

Also, are we going to need any "core" monitoring of non-standard resources (e.g. GPU), or will the Kubelet not need usage information for those?

We haven't got there yet, but it's definitely something to think about. For now, GPU is just a device we pass to the runtime. Maybe monitoring doesn't have to go through CRI?

yujuhong · 2017-05-11T00:45:30Z

BTW, it's still uncertain whether containerd can support stats with timestamps. See discussions in containerd/containerd#678
/cc @stevvooe

k8s-ci-robot · 2017-05-11T00:45:31Z

@yujuhong: GitHub didn't allow me to request PR reviews from the following users: stevvooe.

Note that only people with write access to kubernetes/kubernetes can review this PR.
.

In response to this:

BTW, it's still uncertain whether containerd can support stats with timestamps. See discussions in containerd/containerd#678
/cc @stevvooe

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

stevvooe · 2017-05-12T00:24:13Z

BTW, it's still uncertain whether containerd can support stats with timestamps.

There are two possibilities for supporting this:

Allow subsets of metrics to be queried over /metrics API, allowing certain metrics to be queried at varying sample rates.
Cache certain expensive metrics and export timestamps along with the metrics that are cached. (How does cadvisor do this?).

I do share @timstclair's concern that the sample rate should be set by the client, rather than an internal cache, so we'd have to figure out 1. Last time I spoke with @thockin and @juliusv, we thought it best to get the metrics out the door on the containerd and address the sampling problems as they come. In that vein, I think both 1 and 2 are option which could be implemented in the current approach.

To clarify on containerd/containerd#678, the concern was over the solution of solving the split sample rate problem for unrelated metrics on a single endpoint. Our approach right now ends up having containerd export something very close to cadvisor in breadth and cardinality. This approach means the sample rate of metrics will defer to the lowest common denominator, which is probably disk usage. Option 1 would allow one to query certain metrics, like CPU, at a much higher sample rate.

vishh · 2017-05-12T05:49:42Z

pkg/kubelet/api/v1alpha1/runtime/api.proto

+// ContainerStats provides the resource usage statistics for a container.
+message ContainerStats {
+    // Information of the container.
+    Container container = 1;


Why is the Container object embedded here?

Maybe container id and metadata is enough?

That would not be enough. Kubelet relies on labels and annotations heavily and it'd need those to reconstruct the pod. If we group all of those (id, metdata, labels, annotations, ...) into a separate metadada object like the k8s API, it would help, but that's a bigger change.

vishh · 2017-05-12T05:50:13Z

pkg/kubelet/api/v1alpha1/runtime/api.proto

+    // Memory usage gathered from the container.
+    MemoryUsage memory_usage = 3;
+    // Root filesystem usage gathered from the container.
+    FilesystemUsage root_filesystem_usage = 4;


Suggest using writable_layer_usage instead.

And do we need the _usage suffix?

Done and done.

vishh · 2017-05-12T05:51:34Z

pkg/kubelet/api/v1alpha1/runtime/api.proto

+message CpuUsage {
+    // Timestamp in nanoseconds at which the information were collected. Must be > 0.
+    int64 timestamp = 1;
+    // Cumulative CPU usage (sum of all cores) since object creation.


nit: sum across all cores

vishh · 2017-05-12T05:53:06Z

pkg/kubelet/api/v1alpha1/runtime/api.proto

+    // Timestamp in nanoseconds at which the information were collected. Must be > 0.
+    int64 timestamp = 1;
+    // The amount of working set memory in bytes.
+    UInt64Value WorkingSetBytes  = 2;


WorkingSet isn't well defined outside of cAdvisor currently. At the CRI level can we instead get more detailed metrics - essentially the output of memory.stat from cgroups?

I agree that working set may not be the best field, but I'm hesitant to include the output of the cgroups memory.stat as they are very detailed. Can we include only a subgroup of those and expand later if needed? Do you have any suggestion on what to included? @vishh @dchen1107

yujuhong · 2017-05-17T23:21:51Z

Here is a simple (and incomplete) comparison:
1. On-demand

Pros:
- Tighter control: Kubelet determines when and how often to collect stats.
- Simpler flow: Caching may be needed in kubelet, but in the lower layers (shim, runtime).
Cons:
- Hard to retrieve stats for all containers (which is the main feature kubelet needs).
  - If kubelet gets stats of each container individually, it may need to issue hundreds of calls since we support 110 pods on one node. This is not scalable.
  - If kubelet gets stats for all containers in one call, and some of the stats are expensive to collect (e.g., disk usage), some stats will be significantly more stale than others. This makes the on-demand model less useful.
- The runtime could be overwhelmed by excessive requests due to bugs or from users (who has node access).

2. Cached w/ timestamps

Pros:
- Flexibility: Hides the implementation detail of the runtimes.
- Less change in kubelet: This is similar to what cadvisor provides today.
Cons:
- Needs coordination: We may need to add recommendations on the update frequencies of the stats to ensure k8s system components function properly.

If we rule out a hybrid model (where some stats are on-demand and some cached), I think the "cached model" is more reasonable given that the on-demand model is not suitable for slow stats.
Another possibility is using a streaming connection and lets runtime pushes new stats whenever they are available. I consider that an optimization to the cached model, and we can add that later if needed.

Thoughts and/or objections?

/cc @dchen1107 @vishh @dashpole @Random-Liu @timstclair

This commit also changes the image-filesystem-related types.

yujuhong · 2017-05-25T00:02:34Z

@dchen1107 PTAL!

k8s-ci-robot · 2017-05-25T00:07:02Z

@yujuhong: The following test(s) failed:

Test name	Commit	Details	Rerun command
Jenkins verification	2145088f70697ec72f00681d3e7cb4365eb89180	link	`@k8s-bot verify test this`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

dchen1107 · 2017-05-26T17:13:58Z

@k8s-bot verify test this

yujuhong · 2017-05-26T17:15:39Z

@k8s-bot pull-kubernetes-federation-e2e-gce test this

yujuhong · 2017-05-26T17:17:26Z

@k8s-bot pull-kubernetes-e2e-gce-etcd3 test this

dchen1107 · 2017-05-26T17:19:01Z

/lgtm

We can introduce on-demand API in the future.

k8s-github-robot · 2017-05-26T17:21:29Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dchen1107, yujuhong

Needs approval from an approver in each of these OWNERS Files:

~~pkg/kubelet/OWNERS~~ [dchen1107,yujuhong]

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

k8s-github-robot · 2017-05-26T23:59:07Z

Automatic merge from submit-queue (batch tested with PRs 45809, 46515, 46484, 46516, 45614)

@yujuhong

Automatic merge from submit-queue (batch tested with PRs 50555, 51152). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Implement CRI stats in Docker Shim **What this PR does / why we need it**: This PR implements CRI Stats in the Docker Shim. It is needed to enable CRI stats for Docker and ongoing /stats/summary API changes in moving to use CRI. Related issues: kubernetes#46984 (CRI: instruct kubelet to (optionally) consume container stats from CRI) kubernetes#45614 (CRI: add methods for container stats) This PR is also a followup to my original PR (kubernetes#50396) to implement Windows Container Stats. The plan is that Windows Stats will use a hybrid model: pod and container level stats will come from CRI (via dockershim) and that node level stats will come from a "winstats" package that exports cadvisor like datastructures using windows specific perf counters from the node. I will update that PR to only export node level stats. @yujuhong @yguo0905 @dchen1107 @jdumars @anhowe @michmike **Special notes for your reviewer**: **Release note**: ```release-note ```

yujuhong assigned dchen1107 May 10, 2017

k8s-ci-robot added cncf-cla: yes labels May 10, 2017

k8s-github-robot added approved size/XXL release-note labels May 10, 2017

dashpole reviewed May 10, 2017

View reviewed changes

timstclair self-requested a review May 10, 2017 22:41

timstclair reviewed May 10, 2017

View reviewed changes

vishh reviewed May 12, 2017

View reviewed changes

feiskyer mentioned this pull request May 15, 2017

Container metrics hyperhq/hyperd#632

Open

k8s-github-robot added the needs-rebase label May 15, 2017

yujuhong force-pushed the container-metrics branch from 294de25 to 8a0db2e Compare May 17, 2017 22:20

k8s-github-robot removed the needs-rebase label May 17, 2017

yujuhong force-pushed the container-metrics branch from 8a0db2e to 2145088 Compare May 17, 2017 22:39

k8s-github-robot added size/M and removed size/XXL labels May 17, 2017

yujuhong changed the title ~~CRI: add methods for container stats~~ [WIP] CRI: add methods for container stats May 17, 2017

k8s-ci-robot requested review from dashpole, timstclair, Random-Liu, vishh and dchen1107 May 17, 2017 23:21

k8s-github-robot added the size/XXL label May 24, 2017

yujuhong added 2 commits May 24, 2017 14:45

CRI: add methods for getting container stats

893a7b2

This commit also changes the image-filesystem-related types.

Run hack/update-generated-runtime.sh

aff2be9

yujuhong force-pushed the container-metrics branch from 3bc86c4 to 946af0e Compare May 24, 2017 21:45

k8s-github-robot added approved and removed needs-rebase kind/api-change labels May 24, 2017

yujuhong force-pushed the container-metrics branch 2 times, most recently from 59f9407 to d41f10c Compare May 24, 2017 22:07

Update all relevant interfaces and create stubs

417e9c8

yujuhong force-pushed the container-metrics branch from d41f10c to 417e9c8 Compare May 24, 2017 22:21

yujuhong changed the title ~~[WIP] CRI: add methods for container stats~~ CRI: add methods for container stats May 25, 2017

k8s-ci-robot added the lgtm label May 26, 2017

k8s-github-robot merged commit e903c58 into kubernetes:master May 26, 2017

Random-Liu mentioned this pull request Jun 2, 2017

Update CRI version containerd/cri#63

Closed

This was referenced Jun 5, 2017

A better story of container metrics for runtime integration #27097

Closed

CRI: instruct kubelet to (optionally) consume container stats from CRI #46984

Closed

calebamiles mentioned this pull request Jun 6, 2017

Enhance the Container Runtime Interface kubernetes/enhancements#290

Closed

dchen1107 mentioned this pull request Aug 16, 2017

Add Windows Server Containers Stats and Metrics to Kubelet #50396

Merged

bobbypage mentioned this pull request Aug 22, 2017

Implement CRI stats in Docker Shim #51152

Merged

yujuhong deleted the container-metrics branch September 11, 2017 20:33

CRI: add methods for container stats #45614

CRI: add methods for container stats #45614

Conversation

yujuhong commented May 10, 2017

Uh oh!

k8s-reviewable commented May 10, 2017

Uh oh!

yujuhong commented May 10, 2017

Uh oh!

dashpole left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

timstclair left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yujuhong commented May 11, 2017

Uh oh!

yujuhong commented May 11, 2017

Uh oh!

k8s-ci-robot commented May 11, 2017

Uh oh!

stevvooe commented May 12, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yujuhong commented May 17, 2017

Uh oh!

yujuhong commented May 25, 2017

Uh oh!

k8s-ci-robot commented May 25, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dchen1107 commented May 26, 2017

Uh oh!

yujuhong commented May 26, 2017

Uh oh!

yujuhong commented May 26, 2017

Uh oh!

dchen1107 commented May 26, 2017

Uh oh!

k8s-github-robot commented May 26, 2017

Uh oh!

k8s-github-robot commented May 26, 2017

Uh oh!

k8s-ci-robot commented May 25, 2017 •

edited

Loading