监控系统分为2个部分:prometheus-server、node-exporter
使用tunnel-coredns的clusterip替换prometheus-server.yaml 中的spec.dnsConfig.nameservers变量
kubectl -n edge-system get svc tunnel-coredns -o=jsonpath='{.spec.clusterIP}'
kubectl apply -f prometheus-server.yaml
kubectl apply -f https://raw.githubusercontent.com/superedge/superedge/main/deployment/prometheus-node-exporter.yaml
是否采集到kubelet metrics
$ curl -G http://<prometheus-server的clusterip>/api/v1/series? --data-urlencode 'match[]=container_processes{job="node-cadvisor"}'
{
[
{
"__name__": "container_processes",
"id": "/system.slice/docker.service",
"instance": "edge-7x94bd",
"job": "node-cadvisor",
"unInstanceId": "none"
},
{
"__name__": "container_processes",
"id": "/system.slice/kubelet.service",
"instance": "edge-7x94bd",
"job": "node-cadvisor",
"unInstanceId": "none"
}
]
}
是否采集到node metrics
curl -G http://<prometheus-server的clusterip>/api/v1/series? --data-urlencode 'match[]=node_cpu_guest_seconds_total{job="node-exporter"}'
{
"status": "success",
"data": [
{
"__name__": "node_cpu_guest_seconds_total",
"cpu": "0",
"instance": "edge-7x94bd",
"job": "node-exporter",
"mode": "nice",
"unInstanceId": "none"
},
{
"__name__": "node_cpu_guest_seconds_total",
"cpu": "0",
"instance": "edge-7x94bd",
"job": "node-exporter",
"mode": "user",
"unInstanceId": "none"
},
{
"__name__": "node_cpu_guest_seconds_total",
"cpu": "1",
"instance": "edge-7x94bd",
"job": "node-exporter",
"mode": "nice",
"unInstanceId": "none"
},
{
"__name__": "node_cpu_guest_seconds_total",
"cpu": "1",
"instance": "edge-7x94bd",
"job": "node-exporter",
"mode": "user",
"unInstanceId": "none"
}
]
}