Closed
Description
How to read the metrics created by kubectl top nodes
and kubectl top pods --all-namespaces
?
Environment
I am running an AWS EKS cluster on a single EC2 t3.large instance (VM). The instance has 2 vCPU and 8 GiB Memory. vCPUs are virtual CPUs. T3 instances can burst for some time above the baseline. Meaning, they use a credit based scheduler and get more physical CPU time if needed.
kubectl top nodes
I run the kubectl top nodes
command three times in a row and received three different outputs:
$ kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
ip-10-10-1-247.ec2.internal 38m 1% 410Mi 5%
$ kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
ip-10-10-1-247.ec2.internal 40m 2% 411Mi 5%
$ kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
ip-10-10-1-247.ec2.internal 39m 1% 411Mi 5%
- I do not understand is what does the CPU(cores) 38m, 40m, 39m actually refer to? What is the unit
m
? The aspect I do not understand is, I am using 1-2% of CPU. However, percent is a relative unit. What is a 100% CPU? - The same question can be asked for Memory%, what is 100% memory?
kubectl top pods --all-namespaces
Different from the kubectl top nodes
the kubectl top pods --all-namespaces
actually produces the same output. However, there is not much going on in my Kubernetes cluster. The following is a sample output:
$ kubectl top pods --all-namespaces
NAMESPACE NAME CPU(cores) MEMORY(bytes)
kube-system alb-ingress-controller-aws-alb-ingress-controller-67d7cf85lwdg2 3m 10Mi
kube-system aws-node-9nmnw 2m 20Mi
kube-system coredns-7bcbfc4774-q4pjj 2m 7Mi
kube-system coredns-7bcbfc4774-wwlcr 2m 7Mi
kube-system external-dns-54df666786-2ld9w 1m 12Mi
kube-system kube-proxy-ss87v 2m 10Mi
kube-system kubernetes-dashboard-5478c45897-fcm48 1m 12Mi
kube-system metrics-server-5f64dbfb9d-fnk5r 1m 12Mi
kube-system tiller-deploy-85744d9bfb-64pcr 1m 29Mi
- What does the CPU(Cores) section represent here? What does the unit
m
stand for? - What does the MEMORY(bytes) actually mean? Is this how much memory the pod container uses? However,
kubectl top nodes
, shows me a MEMORY(bytes) usage of approx. 410Mi. But if I add all the MEMORY(bytes) of the pods together I end up at10+20+7+7+12+10+13+12+29 = 120Mi
. Where are the other410-120 = 290Mi
used?
Metadata
Metadata
Assignees
Type
Projects
Milestone
Relationships
Development
No branches or pull requests
Activity
DirectXMan12 commentedon Jan 3, 2019
ok, so on the subject of milliunits: https://github.com/DirectXMan12/k8s-prometheus-adapter/blob/master/docs/walkthrough.md#quantity-values or https://github.com/kubernetes-incubator/custom-metrics-apiserver/blob/master/docs/getting-started.md#quantities explains.
on the subject of what the values actually mean:
cores are a meaurement of CPU time over time (which sounds weird, but it's not actually). Normal CPU usage from the OS is measured in cumulative CPU time used. As your process runs actively, it accumulates CPU time used. When it's not running, or waiting on something else, it doesn't use CPU time. We take the rate of change of that cumulative usage, and call it "cores", because 1 second of CPU usage used over 1 second of real time means that the CPU dedicated an entire core of the CPU to your process for that 1 second interval.
memory is the sum of the memory usage for the containers in your pod, for pods. For nodes, I believe it's the memory usage as reported by the system-wide node cgroup. So it may include stuff that's not in a pod, IIRC. We just report the information given to us by cadvisor on the node.
Jeeppler commentedon Jan 4, 2019
@DirectXMan12 thanks for the explanation, that makes more sense now. However, I would be happy if the documentation for metrics server could contain a description of the units.
DirectXMan12 commentedon Jan 15, 2019
I'd be happy to accept a PR to update the docs :-).
kubectl top pod
does not show any utilization % canonical/microk8s#356fejta-bot commentedon Apr 28, 2019
Issues go stale after 90d of inactivity.
Mark the issue as fresh with
/remove-lifecycle stale
.Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with
/close
.Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
fejta-bot commentedon May 28, 2019
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with
/remove-lifecycle rotten
.Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with
/close
.Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
fejta-bot commentedon Jun 27, 2019
Rotten issues close after 30d of inactivity.
Reopen the issue with
/reopen
.Mark the issue as fresh with
/remove-lifecycle rotten
.Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
k8s-ci-robot commentedon Jun 27, 2019
@fejta-bot: Closing this issue.
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
blaise-sumo commentedon Nov 27, 2019
@DirectXMan12 , I would like to update the docs, I looked in this repo and also in https://github.com/DirectXMan12/k8s-prometheus-adapter/tree/master/docs but I'm not sure if that is the correct place.
blaisep commentedon Nov 27, 2019
Since this thread appears in search results, the docs seem to be updated at:
https://kubernetes.io/docs/tasks/configure-pod-container/assign-cpu-resource/#cpu-units
wu105 commentedon Feb 23, 2020
I can understand the concept of units, requests, limits and actuals, but have a hard time to reconcile the 'kubectl top' output with that of "traditional linux commands", e.g., uptime (load everages of past 1, 5, and 15 minutes), free (memory used, free, shared, buff/cache, available), top ('uptime' + 'free -k', then VIRT/RES/SHR memory and %CPU/%MEM on processes).
One question on the kubectl top commands is for what time period it is reporting? When samples are taken and what samples are rolled up for the output? On the other hand, Linux commands always report the state at the time the command is executed, and they include non-kubernetes processes.
The understanding would have impact on managing the resource utilization and the stability of kubernetes. We have random node crashes, suspecting resource crunch, but the utilization numbers are usually low. Kubernetes require us to turn off swapping, which can be a contributing factor because no swapping means no wiggle room in memory overruns, and wasted inactive contents occupying memory.
7 remaining items