Skip to content

Support scaling HPA to/from zero pods for object/external metrics #74526

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 16, 2019

Conversation

DXist
Copy link
Contributor

@DXist DXist commented Feb 25, 2019

What type of PR is this?

/kind feature

What this PR does / why we need it:
This PR targets worker deployments that use queues and scaling is based on object or external metric that depends on queue size. When workers are idle it is possible to scale corresponding deployment to zero replicas and save resources.

This technique is especially useful when workers request GPU resources and the amount of different idling worker types exceeds number of available GPUs.

Which issue(s) this PR fixes:

Fixes #69687

Special notes for your reviewer:
The PR is based on changes made in #61423

  1. Scale to/from zero changes are made
  2. Applied changes from In Horizontal Pod Autoscaler Controller when scaling on multiple metrics, handle invalid metrics. #61423
  3. HPA continues to scale as long as there is at least one metric value available.
    There is no conservative scale down behaviour introduced in In Horizontal Pod Autoscaler Controller when scaling on multiple metrics, handle invalid metrics. #61423
    Scaling down works even if we have just one metric value.
  4. HPA tolerance set through --horizontal-pod-autoscaler-tolerance flag is ignored when scaling up from zero pods.

Does this PR introduce a user-facing change?:

When HPAScaleToZero feature gate is enabled HPA supports scaling to zero pods based on object or external metrics. HPA remains active as long as at least one metric value available.

To downgrade the cluster to version that does not support scale-to-zero feature:
1. make sure there are no hpa objects with minReplicas=0. Here is a oneliner to update it to 1:
    $ kubectl get hpa --all-namespaces  --no-headers=true | awk  '{if($6==0) printf "kubectl patch hpa/%s --namespace=%s -p \"{\\\"spec\\\":{\\\"minReplicas\\\":1}}\"\n", $2, $1 }' | sh
2. disable HPAScaleToZero feature gate

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 25, 2019
@k8s-ci-robot
Copy link
Contributor

Hi @DXist. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Feb 25, 2019
@k8s-ci-robot k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 25, 2019
@spiffxp
Copy link
Member

spiffxp commented Feb 25, 2019

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 25, 2019
@Rajat-0
Copy link

Rajat-0 commented Feb 25, 2019

@DXist I am also working on this issue and I have discussed the below proposel in Sig weekly meeting, and as per discussion in meeting the @mwielgus will discuss this with networking team, and based on that we can modify the PR and I'll be happy to coordinate with you on this.

https://docs.google.com/document/d/1p_Xlk8e5V32WOBVeRbJqbxhEKcJy_mDu9JgVp5ZNDxs/

@DXist
Copy link
Contributor Author

DXist commented Feb 26, 2019

@Rajat-0

@DXist I am also working on this issue and I have discussed the below proposel in Sig weekly meeting, and as per discussion in meeting the Sig owner will discuss this with networking team, and based on that we can modify the PR and I'll be happy to coordinate with you on this.

https://docs.google.com/document/d/1p_Xlk8e5V32WOBVeRbJqbxhEKcJy_mDu9JgVp5ZNDxs/

This PR handles more general case and is not specific to HTTP workload.

If an HTTP load balancer/Ingress could buffer requests and export a number of queued requests for given service as a custom metric then HPA can use this metric for scaling. I see that there could be some signaling about metric change or push based model of metric delivery to HPA to speed up the scaling process.

Copy link
Contributor

@mwielgus mwielgus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of high-level items:

  1. How was it tested?
  2. Why didn't you handle: https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/podautoscaler/horizontal.go#L567
  3. Did you consider explicitly validating that there is Object/External metric configured?
  4. Maybe we should have a comment update in the API.

@mwielgus mwielgus self-assigned this Feb 28, 2019
@DXist
Copy link
Contributor Author

DXist commented Feb 28, 2019

Couple of high-level items:

  1. How was it tested?

There are unittests that cover scaling up from zero pods and scaling down to zero for custom object and external metrics.

For end-to-end testing I

  • built K8S binaries
  • run them using https://github.com/kubernetes-sigs/kubeadm-dind-cluster
  • set up RabbitMQ, Prometheus Operator, Prometheus adapter for custom metrics
  • set up the service I want to scale and configured Prometheus, Prometheus adapter and HPA.
    HPA used two Object metrics - worker utilization and queue ingress/egress ratio
  • run scripts to generate test load - a single message or a series of messages in the rate of ~5X processing throughput of a single pod

Only one metric value (worker utilization) was available when there were no pods.
Metrics used 1m window to calculate averaged value so I delayed start of pod readiness probes via initialDelaySeconds: 60. Due to this delay the next scaling decision was based on a full metric window since previous rescale. The delay worked in the same way as replica calculator code for resource metrics.

  1. Why didn't you handle: https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/podautoscaler/horizontal.go#L567

Github collapsed the diff for this module. I've added extra condition to enable scaling when minReplicas==0

  1. Did you consider explicitly validating that there is Object/External metric configured?

No, I didn't. It seems to be a good protection from HPA misconfiguration.

  1. Maybe we should have a comment update in the API.

A comment update should be near MinReplicas?

@k8s-ci-robot k8s-ci-robot added the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label Mar 1, 2019
@DXist
Copy link
Contributor Author

DXist commented Mar 1, 2019

Added metrics validation for minReplicas=0 case and a comment to the API.

@fejta-bot
Copy link

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

@thockin
Copy link
Member

thockin commented Mar 1, 2019

API LGTM

@mwielgus ping me when LGTM and I will approve

@DXist DXist mentioned this pull request Mar 5, 2019
@liggitt
Copy link
Member

liggitt commented Jul 16, 2019

/lgtm
/retest

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 16, 2019
@DXist
Copy link
Contributor Author

DXist commented Jul 16, 2019

/retest

@DXist
Copy link
Contributor Author

DXist commented Jan 21, 2020

@liggitt , @mwielgus What is the next step to make the feature GA?

@numbsafari
Copy link

Just stopping by after six months to see what, if anything, can be done to move this otherwise perfectly reasonable and working feature from Alpha to GA.

This would be great to have on hosted K8S environments like GKE or AKS or EKS or... basically anywhere.

@numbsafari
Copy link

Was just reading the release notes for 1.19,
and noticed the section on avoiding permanent beta, it would be great if there was some clarity on how to foster this work through the process.

@jeffreybrowning
Copy link

@liggitt , @mwielgus What is the next step to make the feature GA?

Would love an answer to this. It's been half a year or so since it was asked.

Very useful feature for shutting down workers processing jobs from queues when the queues have no tasks.

@lavalamp
Copy link
Member

Sounds like a good topic for the SIG Autoscaling meeting :)

@jeffreybrowning
Copy link

jeffreybrowning commented Sep 5, 2020

@lavalamp were there results from the meeting?

@lavalamp
Copy link
Member

lavalamp commented Sep 8, 2020

Apologies, that was super unclear of me-- I realize now it reads like I was going to take it to the SIG meeting, but I was actually only trying to suggest that it was the logical next thing to do. I don't personally have time to contribute to this other than the occasional review.

@jeffreybrowning
Copy link

jeffreybrowning commented Sep 8, 2020

@lavalamp no worries. Is there someone more central to the planning we can poke here to move things along? I think this should be a pretty innocent bump to GA. I'll go ahead and tag others who were in this ticket for Alpha release.

@liggitt @mwielgus

@johanneswuerbach
Copy link
Contributor

We are also using this feature heavily and I would be happy to contribute time driving this towards GA.

While I contributed some changes to k8s, I've never been part of an API graduation so I would need to some pointers how to do this if help is needed.

@liggitt
Copy link
Member

liggitt commented Sep 10, 2020

I think this should be a pretty innocent bump to GA. I'll go ahead and tag others who were in this ticket for Alpha release.

@liggitt @mwielgus

API changes need an associated doc describing the change and graduation requirements, called a KEP (Kubernetes Enhancement Proposal). The key things we'd be looking for are evidence of sufficient test coverage, any performance implications of the change, and upgrade/downgrade/skew compatibility implications.

It looks like #69687 (comment) might have been the original proposal associated with the alpha change, but before graduating to beta and being enabled by default, that should be ported to a KEP and make sure the test/performance/upgrade questions are answered. A template is available at https://github.com/kubernetes/enhancements/tree/master/keps/NNNN-kep-template and would be placed in https://github.com/kubernetes/enhancements/tree/master/keps/sig-autoscaling

@jeffreybrowning
Copy link

@johanneswuerbach @DXist is this something one of you can lead? It would be my first contribution, and is probably not a great thing to take a bite out of first.

If not, I can attempt the first pass at the proposal.

@DXist
Copy link
Contributor Author

DXist commented Sep 16, 2020

@jeffreybrowning go ahead. I changed project and work in another context.

@johanneswuerbach
Copy link
Contributor

I planned to get something drafted on Friday, but I also never worked on a KEP before. If you have more time feel free to pick it @jeffreybrowning :-)

@jeffreybrowning
Copy link

I am embroiled in an update to another package right now -- it would likely be a sizeable delay until I get to this.

@johanneswuerbach
Copy link
Contributor

I started the official propose here kubernetes/enhancements#2021. Looking forward to any of your comments or feedback on the KEP itself here kubernetes/enhancements#2022.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
Status: API review completed, 1.16
Development

Successfully merging this pull request may close these issues.

Allow HPA to scale to 0