Skip to content

Improve consistent hashing load balancing with a new algorithm(patch for 3.0) #8948

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 30, 2021

Conversation

laddcn
Copy link
Contributor

@laddcn laddcn commented Sep 29, 2021

The same pr has been merged in master branch. And this is a patch for 3.0

What is the purpose of the change

Improve consistent hashing load balancing with a new algorithm which can resolve problems mentioned in #4103

Brief changelog

A new algorithm "Consistent Hashing with Bounded Loads"introduced by Vahab Mirrokni (work at Google Research) in 2018 can resolve this problem.

Brief introduction quoted from a blog which use this new algorithm and works well (https://medium.com/vimeo-engineering-blog/improving-load-balancing-with-a-new-consistent-hashing-algorithm-9f1bd75709ed):

Here is a simplified sketch of the algorithm. Some details are left out, and if you intend to implement it yourself, you should definitely go to the original paper for information.
First, define a balancing factor, c, which is greater than 1. c controls how much imbalance is allowed between the servers. For example, if c = 1.25, no server should get more than 125% of the average load. In the limit as c increases to ∞, the algorithm becomes equivalent to plain consistent hashing, without balancing; as c decreases to near 1 it becomes more like a least-connection policy and the hash becomes less important. In my experience, values between 1.25 and 2 are good for practical use.
When a request arrives, compute the average load (the number of outstanding requests, m, including the one that just arrived, divided by the number of available servers, n). Multiply the average load by c to get a “target load”, t. In the original paper, capacities are assigned to servers so that each server gets a capacity of either ⌊t⌋ or ⌈t⌉, and the total capacity is ⌈cm⌉. Therefore the maximum capacity of a server is ⌈cm/n⌉, which is greater than c times the average load by less than 1 request. To support giving servers different “weights”, as HAProxy does, the algorithm has to change slightly, but the spirit is the same — no server can exceed its fair share of the load by more than 1 request.
To dispatch a request, compute its hash and the nearest server, as usual. If that server is below its capacity, then assign the request to that server. Otherwise, go to the next server in the hash ring and check its capacity, continuing until you find a server that has capacity remaining. There has to be one, since the highest capacity is above the average load, and it’s impossible for every server’s load to be above average. This guarantees some nice things:
No server is allowed to get overloaded by more than a factor of c plus 1 request.
The distribution of requests is the same as consistent hashing as long as servers aren’t overloaded.
If a server is overloaded, the list of fallback servers chosen will be the same for the same request hash — i.e. the same server will consistently be the “second choice” for a popular piece of content. This is good for caching.
If a server is overloaded, the list of fallback servers will usually be different for different request hashes — i.e. the overloaded server’s spillover load will be distributed among the available servers, instead of all landing on a single server. This depends on each server being assigned multiple points in the consistent hash ring.

Verifying this change

Test cases work well

Checklist

  • Make sure there is a GitHub_issue field for the change (usually before you start working on it). Trivial changes like typos do not require a GitHub issue. Your pull request should address just this issue, without pulling in other changes - one PR resolves one issue.
  • Each commit in the pull request should have a meaningful subject line and body.
  • Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
  • Check if is necessary to patch to Dubbo 3 if you are work on Dubbo 2.7
  • Write necessary unit-test to verify your logic correction, more mock a little better when cross module dependency exist. If the new feature or significant change is committed, please remember to add sample in dubbo samples project.
  • Add some description to dubbo-website project if you are requesting to add a feature.
  • GitHub Actions works fine on your own branch.
  • If this contribution is large, please follow the Software Donation Guide.

Sorry, something went wrong.

@codecov-commenter
Copy link

codecov-commenter commented Sep 29, 2021

Codecov Report

Merging #8948 (73d252f) into 3.0 (30d0fac) will increase coverage by 0.02%.
The diff coverage is 90.00%.

Impacted file tree graph

@@             Coverage Diff              @@
##                3.0    #8948      +/-   ##
============================================
+ Coverage     63.44%   63.47%   +0.02%     
- Complexity      312      315       +3     
============================================
  Files          1182     1182              
  Lines         50261    50280      +19     
  Branches       7522     7526       +4     
============================================
+ Hits          31890    31916      +26     
+ Misses        14901    14896       -5     
+ Partials       3470     3468       -2     
Impacted Files Coverage Δ
...cluster/loadbalance/ConsistentHashLoadBalance.java 91.66% <90.00%> (-1.02%) ⬇️
...in/java/org/apache/dubbo/common/utils/JVMUtil.java 81.13% <0.00%> (-11.33%) ⬇️
...ng/exchange/support/header/HeartbeatTimerTask.java 68.42% <0.00%> (-5.27%) ⬇️
...pache/dubbo/remoting/transport/AbstractClient.java 62.14% <0.00%> (-1.43%) ⬇️
...etadata/report/support/AbstractMetadataReport.java 60.76% <0.00%> (-0.96%) ⬇️
...ng/zookeeper/curator5/Curator5ZookeeperClient.java 67.96% <0.00%> (-0.49%) ⬇️
...vent/listener/ServiceInstancesChangedListener.java 78.81% <0.00%> (+1.69%) ⬆️
...exchange/support/header/HeaderExchangeHandler.java 70.79% <0.00%> (+1.76%) ⬆️
.../exchange/support/header/HeaderExchangeServer.java 68.86% <0.00%> (+1.88%) ⬆️
.../rpc/protocol/dubbo/LazyConnectExchangeClient.java 58.06% <0.00%> (+2.15%) ⬆️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 30d0fac...73d252f. Read the comment docs.

* or
* 2. Not have overloaded (request count already accept < thread (average request count * overloadRatioAllowed ))
*/
while (serverRequestCountMap.containsKey(serverAddress)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可能导致死循环

@AlbumenJ AlbumenJ added this to the 3.0.9 milestone Jun 20, 2022
AlbumenJ added a commit to AlbumenJ/dubbo that referenced this pull request Jun 29, 2022
AlbumenJ added a commit that referenced this pull request Jun 30, 2022

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants