Skip to content

metrics-server部署后服务不可用 #417

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
liyongzhezz opened this issue Mar 19, 2019 · 6 comments
Closed

metrics-server部署后服务不可用 #417

liyongzhezz opened this issue Mar 19, 2019 · 6 comments

Comments

@liyongzhezz
Copy link

问题

metrics-server部署后服务不可用

现象

# kubectl top node 
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)
# kubectl logs metrics-server-v0.3.1-78b6dbdcc-jprvr -n kube-system -c metrics-server
I0319 11:15:27.868355       1 serving.go:273] Generated self-signed cert (apiserver.local.config/certificates/apiserver.crt, apiserver.local.config/certificates/apiserver.key)
[restful] 2019/03/19 11:15:52 log.go:33: [restful/swagger] listing is available at https://:443/swaggerapi
[restful] 2019/03/19 11:15:52 log.go:33: [restful/swagger] https://:443/swaggerui/ is mapped to folder /swagger-ui/
I0319 11:15:54.063190       1 serve.go:96] Serving securely on [::]:443
E0319 11:16:22.787769       1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:10.10.99.53: unable to fetch metrics from Kubelet 10.10.99.53 (10.10.99.53): Get https://10.10.99.53:10255/stats/summary/: http: server gave HTTP response to HTTPS client, unable to fully scrape metrics from source kubelet_summary:10.10.99.52: unable to fetch metrics from Kubelet 10.10.99.52 (10.10.99.52): Get https://10.10.99.52:10255/stats/summary/: http: server gave HTTP response to HTTPS client, unable to fully scrape metrics from source kubelet_summary:10.10.99.51: unable to fetch metrics from Kubelet 10.10.99.51 (10.10.99.51): Get https://10.10.99.51:10255/stats/summary/: http: server gave HTTP response to HTTPS client]
# kubectl get pod -n kube-system -o wide | grep metrics
metrics-server-v0.3.1-78b6dbdcc-jprvr      2/2     Running   0          4m24s   172.21.36.73     10.10.99.51   <none>           <none>

当前配置

· k8s版本:1.13.2
· 已创建metrics-server证书
· kube-apiserver已按照文档添加相应参数并重启
· metrics-server版本0.3.1,yaml文件使用kubernetes/cluster/addons/metrics-server/

@astrisk
Copy link

astrisk commented Apr 9, 2019

@liyongzhezz 这个我搞了2天,终于发现秘密了。 0.3.1的参数名跟0.2.1的不一样
--requestheader-extra-headers-prefix="X-Remote-Extra-" 要改成这样
--requestheader-extra-headers-prefix=X-Remote-Extra-,真是天坑呀

@liyongzhezz
Copy link
Author

@astrisk 我看只是多了一组引号,你这个参数改动后服务可用了吗?

我发现了我的问题,是因为apiserver要去连接metrics-server的pod,但是我master和node网络没打通,我在master上安装flanneld打通网络后就可以了

@zuihou
Copy link

zuihou commented Apr 18, 2019

@astrisk 我看只是多了一组引号,你这个参数改动后服务可用了吗?

我发现了我的问题,是因为apiserver要去连接metrics-server的pod,但是我master和node网络没打通,我在master上安装flanneld打通网络后就可以了

我也一样的问题:
我部署了1台 master (etcd, apiserver, controller-nammger, scheduler, flannel), 7台 node (flannel, docker, kubelet ,kube-proxy)

按照你说的 在matster上安装了flanneld, 也不行。

有没有联系方式,沟通下?

@liyongzhezz
Copy link
Author

@astrisk 我看只是多了一组引号,你这个参数改动后服务可用了吗?
我发现了我的问题,是因为apiserver要去连接metrics-server的pod,但是我master和node网络没打通,我在master上安装flanneld打通网络后就可以了

我也一样的问题:
我部署了1台 master (etcd, apiserver, controller-nammger, scheduler, flannel), 7台 node (flannel, docker, kubelet ,kube-proxy)

按照你说的 在matster上安装了flanneld, 也不行。

有没有联系方式,沟通下?

可以加我的qq,一起讨论:961829889

@zuihou
Copy link

zuihou commented Apr 24, 2019

最后,在跟 @liyongzhezz 讨论后改了几个地方, 在此记录。
1, /etc/systemd/system/kube-apiserver.service 中
--requestheader-allowed-names=aggregator (文章中是--requestheader-allowed-names="")
2, /etc/systemd/system/kube-apiserver.service 中增加了--enable-aggregator-routing=true
3, 重启 master apiserver
4, 重启 node kubelet

最终 kubectl get --raw "/apis/metrics.k8s.io/v1beta1" | jq . 和 kubectl top nodes 均正常运行。
但还有少许看起来异常的日志,不知原因:

[root@server033 metrics-server]# kubectl logs -f metrics-server-v0.3.1-65484b96f7-p2h8p -n kube-system -c metrics-server-nanny

ERROR: logging before flag.Parse: I0424 03:46:40.240355       1 pod_nanny.go:65] Invoked by [/pod_nanny --config-dir=/etc/config --cpu=80m --extra-cpu=0.5m --memory=80Mi --extra-memory=8Mi --threshold=5 --deployment=metrics-server-v0.3.1 --container=metrics-server --poll-period=300000 --estimator=exponential]
ERROR: logging before flag.Parse: I0424 03:46:40.240427       1 pod_nanny.go:81] Watching namespace: kube-system, pod: metrics-server-v0.3.1-65484b96f7-p2h8p, container: metrics-server.
ERROR: logging before flag.Parse: I0424 03:46:40.240438       1 pod_nanny.go:82] storage: MISSING, extra_storage: 0Gi
ERROR: logging before flag.Parse: I0424 03:46:40.241501       1 pod_nanny.go:109] cpu: 80m, extra_cpu: 0.5m, memory: 80Mi, extra_memory: 8Mi
ERROR: logging before flag.Parse: I0424 03:46:40.241529       1 pod_nanny.go:138] Resources: [{Base:{i:{value:80 scale:-3} d:{Dec:<nil>} s:80m Format:DecimalSI} ExtraPerNode:{i:{value:5 scale:-4} d:{Dec:<nil>} s: Format:DecimalSI} Name:cpu} {Base:{i:{value:83886080 scale:0} d:{Dec:<nil>} s: Format:BinarySI} ExtraPerNode:{i:{value:8388608 scale:0} d:{Dec:<nil>} s: Format:BinarySI} Name:memory}]

[root@server033 metrics-server]# systemctl status kube-apiserver
● kube-apiserver.service - Kubernetes API Server
   Loaded: loaded (/etc/systemd/system/kube-apiserver.service; enabled; vendor preset: disabled)
   Active: active (running) since 三 2019-04-24 11:40:46 CST; 12min ago
     Docs: https://github.com/GoogleCloudPlatform/kubernetes
 Main PID: 18207 (kube-apiserver)
    Tasks: 14
   Memory: 437.7M
   CGroup: /system.slice/kube-apiserver.service
           └─18207 /opt/k8s/bin/kube-apiserver --enable-admission-plugins=Initializers,NamespaceLifecycle,NodeRestriction,LimitRanger,ServiceAccount,DefaultStorageClass,ResourceQuota --anonymous-auth...

4月 24 11:52:49 server033 kube-apiserver[18207]: W0424 11:52:49.356619   18207 x509.go:171] x509: subject with cn=system:kube-controller-manager is not in the allowed list: [aggregator]
4月 24 11:52:49 server033 kube-apiserver[18207]: W0424 11:52:49.358433   18207 x509.go:171] x509: subject with cn=system:kube-controller-manager is not in the allowed list: [aggregator]
4月 24 11:52:49 server033 kube-apiserver[18207]: W0424 11:52:49.515302   18207 x509.go:171] x509: subject with cn=system:kube-scheduler is not in the allowed list: [aggregator]
4月 24 11:52:49 server033 kube-apiserver[18207]: W0424 11:52:49.517136   18207 x509.go:171] x509: subject with cn=system:kube-scheduler is not in the allowed list: [aggregator]
4月 24 11:52:50 server033 kube-apiserver[18207]: I0424 11:52:50.958808   18207 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
4月 24 11:52:50 server033 kube-apiserver[18207]: I0424 11:52:50.963179   18207 controller.go:116] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Requeue.
4月 24 11:52:51 server033 kube-apiserver[18207]: W0424 11:52:51.360938   18207 x509.go:171] x509: subject with cn=system:kube-controller-manager is not in the allowed list: [aggregator]
4月 24 11:52:51 server033 kube-apiserver[18207]: W0424 11:52:51.362866   18207 x509.go:171] x509: subject with cn=system:kube-controller-manager is not in the allowed list: [aggregator]
4月 24 11:52:51 server033 kube-apiserver[18207]: W0424 11:52:51.519525   18207 x509.go:171] x509: subject with cn=system:kube-scheduler is not in the allowed list: [aggregator]
4月 24 11:52:51 server033 kube-apiserver[18207]: W0424 11:52:51.521517   18207 x509.go:171] x509: subject with cn=system:kube-scheduler is not in the allowed list: [aggregator]

也不知道这些日志影响其他地方不。。

@xujingbin
Copy link

@astrisk 请问你说的参数是在哪里改
@zuihou 你好我和你一样的报错ERROR: logging before flag.Parse: I0424 03:46:40.241501 1 pod_nanny.go:109] cpu:, 应该是这个导致了dashboard没法用,请问你有解决了吗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants