Improve consistent hashing load balancing with a new algorithm(patch for 3.0) #8948

laddcn · 2021-09-29T01:49:58Z

The same pr has been merged in master branch. And this is a patch for 3.0

What is the purpose of the change

Improve consistent hashing load balancing with a new algorithm which can resolve problems mentioned in #4103

Brief changelog

A new algorithm "Consistent Hashing with Bounded Loads"introduced by Vahab Mirrokni (work at Google Research) in 2018 can resolve this problem.

Brief introduction quoted from a blog which use this new algorithm and works well (https://medium.com/vimeo-engineering-blog/improving-load-balancing-with-a-new-consistent-hashing-algorithm-9f1bd75709ed):

Here is a simplified sketch of the algorithm. Some details are left out, and if you intend to implement it yourself, you should definitely go to the original paper for information.
First, define a balancing factor, c, which is greater than 1. c controls how much imbalance is allowed between the servers. For example, if c = 1.25, no server should get more than 125% of the average load. In the limit as c increases to ∞, the algorithm becomes equivalent to plain consistent hashing, without balancing; as c decreases to near 1 it becomes more like a least-connection policy and the hash becomes less important. In my experience, values between 1.25 and 2 are good for practical use.
When a request arrives, compute the average load (the number of outstanding requests, m, including the one that just arrived, divided by the number of available servers, n). Multiply the average load by c to get a “target load”, t. In the original paper, capacities are assigned to servers so that each server gets a capacity of either ⌊t⌋ or ⌈t⌉, and the total capacity is ⌈cm⌉. Therefore the maximum capacity of a server is ⌈cm/n⌉, which is greater than c times the average load by less than 1 request. To support giving servers different “weights”, as HAProxy does, the algorithm has to change slightly, but the spirit is the same — no server can exceed its fair share of the load by more than 1 request.
To dispatch a request, compute its hash and the nearest server, as usual. If that server is below its capacity, then assign the request to that server. Otherwise, go to the next server in the hash ring and check its capacity, continuing until you find a server that has capacity remaining. There has to be one, since the highest capacity is above the average load, and it’s impossible for every server’s load to be above average. This guarantees some nice things:
No server is allowed to get overloaded by more than a factor of c plus 1 request.
The distribution of requests is the same as consistent hashing as long as servers aren’t overloaded.
If a server is overloaded, the list of fallback servers chosen will be the same for the same request hash — i.e. the same server will consistently be the “second choice” for a popular piece of content. This is good for caching.
If a server is overloaded, the list of fallback servers will usually be different for different request hashes — i.e. the overloaded server’s spillover load will be distributed among the available servers, instead of all landing on a single server. This depends on each server being assigned multiple points in the consistent hash ring.

Verifying this change

Test cases work well

Checklist

Make sure there is a GitHub_issue field for the change (usually before you start working on it). Trivial changes like typos do not require a GitHub issue. Your pull request should address just this issue, without pulling in other changes - one PR resolves one issue.
Each commit in the pull request should have a meaningful subject line and body.
Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
Check if is necessary to patch to Dubbo 3 if you are work on Dubbo 2.7
Write necessary unit-test to verify your logic correction, more mock a little better when cross module dependency exist. If the new feature or significant change is committed, please remember to add sample in dubbo samples project.
Add some description to dubbo-website project if you are requesting to add a feature.
GitHub Actions works fine on your own branch.
If this contribution is large, please follow the Software Donation Guide.

codecov-commenter · 2021-09-29T02:18:20Z

Codecov Report

Merging #8948 (73d252f) into 3.0 (30d0fac) will increase coverage by 0.02%.
The diff coverage is 90.00%.

@@             Coverage Diff              @@
##                3.0    #8948      +/-   ##
============================================
+ Coverage     63.44%   63.47%   +0.02%     
- Complexity      312      315       +3     
============================================
  Files          1182     1182              
  Lines         50261    50280      +19     
  Branches       7522     7526       +4     
============================================
+ Hits          31890    31916      +26     
+ Misses        14901    14896       -5     
+ Partials       3470     3468       -2

Impacted Files	Coverage Δ
...cluster/loadbalance/ConsistentHashLoadBalance.java	`91.66% <90.00%> (-1.02%)`	⬇️
...in/java/org/apache/dubbo/common/utils/JVMUtil.java	`81.13% <0.00%> (-11.33%)`	⬇️
...ng/exchange/support/header/HeartbeatTimerTask.java	`68.42% <0.00%> (-5.27%)`	⬇️
...pache/dubbo/remoting/transport/AbstractClient.java	`62.14% <0.00%> (-1.43%)`	⬇️
...etadata/report/support/AbstractMetadataReport.java	`60.76% <0.00%> (-0.96%)`	⬇️
...ng/zookeeper/curator5/Curator5ZookeeperClient.java	`67.96% <0.00%> (-0.49%)`	⬇️
...vent/listener/ServiceInstancesChangedListener.java	`78.81% <0.00%> (+1.69%)`	⬆️
...exchange/support/header/HeaderExchangeHandler.java	`70.79% <0.00%> (+1.76%)`	⬆️
.../exchange/support/header/HeaderExchangeServer.java	`68.86% <0.00%> (+1.88%)`	⬆️
.../rpc/protocol/dubbo/LazyConnectExchangeClient.java	`58.06% <0.00%> (+2.15%)`	⬆️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 30d0fac...73d252f. Read the comment docs.

xiaotongwang1 · 2022-06-15T02:20:13Z

...luster/src/main/java/org/apache/dubbo/rpc/cluster/loadbalance/ConsistentHashLoadBalance.java

+             * or
+             * 2. Not have overloaded (request count already accept < thread (average request count * overloadRatioAllowed ))
+             */
+            while (serverRequestCountMap.containsKey(serverAddress)


可能导致死循环

Improve consistent hashing load balancing with a new algorithm

73d252f

laddcn mentioned this pull request Sep 29, 2021

[Dubbo-5961] Improve consistent hashing load balancing with a new algorithm #5989 #8916

Merged

6 tasks

AlbumenJ merged commit fee0dba into apache:3.0 Sep 30, 2021

stone-98 mentioned this pull request Oct 13, 2021

ConsistentHashLoadBalance can not actor load balance when the method does not have argument #3468

Closed

xiaotongwang1 mentioned this pull request Jun 15, 2022

Critical bug: Consistency loadbalance cause the CPU usage is 100%. #10111

Closed

xiaotongwang1 reviewed Jun 15, 2022

View reviewed changes

AlbumenJ added this to the 3.0.9 milestone Jun 20, 2022

AlbumenJ added a commit to AlbumenJ/dubbo that referenced this pull request Jun 29, 2022

Revert apache#8948

b1e648f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve consistent hashing load balancing with a new algorithm(patch for 3.0) #8948

Improve consistent hashing load balancing with a new algorithm(patch for 3.0) #8948

laddcn commented Sep 29, 2021 •

edited

Loading

codecov-commenter commented Sep 29, 2021 •

edited

Loading

xiaotongwang1 Jun 15, 2022

Improve consistent hashing load balancing with a new algorithm(patch for 3.0) #8948

Improve consistent hashing load balancing with a new algorithm(patch for 3.0) #8948

Conversation

laddcn commented Sep 29, 2021 • edited Loading

What is the purpose of the change

Brief changelog

Verifying this change

Checklist

codecov-commenter commented Sep 29, 2021 • edited Loading

Codecov Report

xiaotongwang1 Jun 15, 2022

Choose a reason for hiding this comment

laddcn commented Sep 29, 2021 •

edited

Loading

codecov-commenter commented Sep 29, 2021 •

edited

Loading