-
-
Notifications
You must be signed in to change notification settings - Fork 2k
Description
Currently the auto scaling is base on request time/sec. Can we trigger the auto scaling by CPU/Memory usage? Currently if the request less than 5 times/secs and the cpu/memory resource is out of limits, the function will be terminated.
Expected Behavior
The auto scaling is not only trigger by request/secs but also trigger by replica sets's CPU & Memory's usage
Current Behaviour
Currently if the request less than 5 times/secs and the cpu/memory resource is out of limits, the function will be terminated.
Possible Solution
Base on alex's demo video. Maybe add the auto scaling rule on deploy.go and prometheus to alert
the resource full.
Steps to Reproduce (for bugs)
- deploy a high resource needed function on openfaas
- trigger job less than 5 times/second but can exceed pod's CPU/memory
- the function will be terminate and not trigger auto scaling
Context
Since I want to dynamic scaling my replica sets to provide different amount request. But my
bottle neck is not the request time/sec. My function need large CPU/Memory. I can manual
create pods to make my replica can afford more function request. but when the replica set
out of resource. the function request will loss.
Your Environment
- Docker version
docker version
(e.g. Docker 17.0.05 ):
Client:
Version: 17.09.0-ce
API version: 1.32
Go version: go1.8.3
Git commit: afdb6d4
Built: Tue Sep 26 22:40:09 2017
OS/Arch: darwin/amd64
Server:
Version: 17.09.0-ce
API version: 1.32 (minimum version 1.12)
Go version: go1.8.3
Git commit: afdb6d4
Built: Tue Sep 26 22:45:38 2017
OS/Arch: linux/amd64
Experimental: true - Are you using Docker Swarm or Kubernetes (FaaS-netes)?
Kubernetes (FaaS-netes) - Operating System and version (e.g. Linux, Windows, MacOS):
Local: MacOS 10.12.5, remote: GCP k8s - Link to your project or a code example to reproduce issue:
Activity
alexellis commentedon Nov 11, 2017
Entirely valid scenario, here's two ideas:
1. Change alerts/metrics
2. Use K8s built-in scaling
There may be other suggestions worth exploring too. Option 1 is not going to work with Swarm, Nomad, Hyper, etc.
rayhero commentedon Nov 13, 2017
@alexellis thanks for the suggestion! I'll try to integrate the NodeExporter in prometheus.
rayhero commentedon Nov 13, 2017
@alexellis : I try to use kubectl create -f node-exporter.yaml to add node-exporter on kubernetes. But the prometheus can not find the node-exporter on target page. The yaml I using: https://github.com/cedriclam/example-Prometheus-with-Kubernetes-and-Grafana/blob/master/node-exporter.yaml
I try to set the namespace but still not work. How to add node-exporter in openfaas?
rayhero commentedon Nov 13, 2017
I try to use the sample (https://github.com/cedriclam/example-Prometheus-with-Kubernetes-and-Grafana) to install nodeExporter on Kubernetes. And I add the scrape setting on faas-netes's configmaps.yaml:
The prometheus show the nodeExpoter info on target page. But the status is down. How to make prometheus can get info from nodeExporter? Does I still miss some settings?
rayhero commentedon Nov 13, 2017
I fix the bug. It's the network issue. The fix flow is install nodeExporter first. And set nodeExporter's ip in scrape's settings. I can got the node info from nodeExporter now lol.
rayhero commentedon Nov 15, 2017
I can success fire the alert on prometheus. But I can not see the pod number increase. How do control the auto scaling in openfaas? I check to code and found the control flow is in configmaps,
so I add the node resource monitoring:
then I notice that I need to set the alert rule to observe resource:
I use the same label which the autoscaling example used. I can see the alert firing on the Prometheus alert. But I don't see the pod increase like echo example.
If I use kubectl autoscale deployment auto-hotspot --cpu-percent=20 --min=1 --max=100 --namespace=openfaas-fn
It can success auto scaling the pod number. Can some one teach me where to control the auto scaling in fass-netes?
kaikai-liu commentedon May 17, 2018
@rayhero Hi Rayhero, I have some clues for your concerns.
I configured prometheus to scrape the cAdvisor metrics in Kubelet for every pod and created related CPU usage alerts. I can see that the alert is firing when the CPU usage of a pod in the namespace of openfaas-fn exceeds the alert value. But that does not trigger Kubernetes or OpenFaaS gateway to create more pods. The reason I think is the following.
The gateway alert is from gateway invocation rate and the labels of the alert is like {alertname="APIHighInvocationRate" function_name="find-prime" service="gateway" severity="major" value="18"}; but the CPU usage alert is from container_cpu_usage_seconds_total and the alert label is like {alertname="APIHighCPUUsage" pod_name="find-prime-79659f57bd-cxkkj" service="gateway" severity="major" value=0.47"}. The active alerts that are firing are sending notifications to the OpenFaaS gateway so that gateway may scale up the pods. But the gateway may not recognize the latter alert with a label of pod_name="find-prime-79659f57bd-cxkkj" and, therefore, it does not scale up the pods.
And it turned out that my guess is right. After I added a lable of function_name="find-prime" to the CPU usage alert, the gateway started to scale up the pod!
alexellis commentedon Aug 7, 2018
Derek close: inactive
alexellis commentedon Aug 7, 2018
That sounds really useful @kaikai-liu - would you be wiling to write-up the steps here?
kaikai-liu commentedon Aug 7, 2018
@alexellis I added more scrape setting of kubelet cAdvisor and more alert rule setting in prometheus config file and then I can see the pod scales up when there is a firing alert in prometheus. The following is the revised version of the yaml file
prometheus_config.yml
.The part of
job_name: "cadvisor"
is responsible for obtaining the cadvisor metrics of every Kubernetes node and the part ofmetric_relabel_configs
addsfunction_name
label to the pod namedfind-prime-79c698cd7b-4f67x
(a running OpenFass function pod). Then the alerting config expressions focus on thefunction_name
label, which would enable OpenFaas scaling if there is a firing alert.