Enhance RoundRoubinLoadBalancer position #747

NeatGuyCoding · 2020-04-30T08:10:00Z

Is your feature request related to a problem? Please describe.

Assume there are retries for a rpc call to a remote service, and the retry scheme is retry twice for a call. Assume there are two instances (instance A and instance B) for this service. We want the retry is made twice and each time to a different instance.

But there is a problem while using the RoundRobinLoadBalancer. The position is a Atomic field shared across threads and there would be a case that the retry is made to the same instance to the last time:

Thread 1: position = 0, call A;
Thread 2: position = 1, call B;
Thread 1: call A failed, position = 2, still retry A;

Describe the solution you'd like

Make the position a thread local field and each initialize with random value:

ThreadLocal<Long> position =  ThreadLocal.withInitial(() ->  ThreadLocalRandom.current().nextInt(1000));

The text was updated successfully, but these errors were encountered:

NeatGuyCoding · 2020-04-30T08:32:07Z

Well, I found that the result is retrived through:

supplier.get().next().map(this::getInstanceResponse);

this is assined to another thread in reactor, therefore the threadlocal position is not good enough.

Consider usinf context to keep origin call context

NeatGuyCoding · 2020-04-30T08:38:33Z

one solution is that:

private final ConcurrentHashMap<Long, Integer> position;

 public Mono<Response<ServiceInstance>> choose(Request request) {
        // TODO: move supplier to Request?
        // Temporary conditional logic till deprecated members are removed.
        long threadId = Thread.currentThread().getId();
        if (serviceInstanceListSupplierProvider != null) {
            ServiceInstanceListSupplier supplier = serviceInstanceListSupplierProvider
                    .getIfAvailable(NoopServiceInstanceListSupplier::new);
            return supplier.get().next().map(serviceInstances -> getInstanceResponse(serviceInstances, threadId));
        }
        ServiceInstanceSupplier supplier = this.serviceInstanceSupplier
                .getIfAvailable(NoopServiceInstanceSupplier::new);
        return supplier.get().collectList().map(serviceInstances -> getInstanceResponse(serviceInstances, threadId));
    }

    private Response<ServiceInstance> getInstanceResponse(
            List<ServiceInstance> instances, long threadId) {
        if (instances.isEmpty()) {
            log.warn("No servers available for service: " + this.serviceId);
            return new EmptyResponse();
        }
        // TODO: enforce order?
        int pos = Math.abs(this.position.computeIfAbsent(threadId, k -> ThreadLocalRandom.current().nextInt(1000)));
        this.position.put(threadId, pos + 1);
        ServiceInstance instance = instances.get(pos % instances.size());
        return new DefaultResponse(instance);
    }

spencergibb · 2020-05-04T14:49:06Z

A few issues. The number of instances is low these algorithms work much better with higher numbers of instances. Can you increase the retry count?

NeatGuyCoding · 2020-05-07T08:57:40Z

@spencergibb increase the retry count will cause the service enter circuit breaker on more often and increase the possibility of avalanche which is not preferred. Moreover, the more the number of threads, the more possibility of retrying same instance

NeatGuyCoding · 2020-06-03T11:22:32Z

A further solution is to create a random position for each request Id (when including sleuth, it should be traceId). Ensure the position is separated by request Id and retry will not be executed on same instance before

OlgaMaciaszek · 2020-09-21T14:21:54Z

@HashZhang That's a good point. We are now working on adding retry support and we'll probably be adding a similar solution there.

OlgaMaciaszek · 2020-09-21T14:23:30Z

Closing in favour of #659

OlgaMaciaszek · 2020-09-22T12:54:10Z

Actually, this will be useful for both blocking and non-blocking retries, so will keep it as a separate issue after all.

spring-projects-issues added the waiting-for-triage label Apr 30, 2020

spencergibb added waiting for feedback and removed waiting-for-triage labels May 4, 2020

spring-projects-issues added feedback-provided and removed waiting for feedback labels May 7, 2020

OlgaMaciaszek added the for team discussion label Sep 21, 2020

OlgaMaciaszek removed feedback-provided for team discussion labels Sep 21, 2020

OlgaMaciaszek closed this as completed Sep 21, 2020

OlgaMaciaszek mentioned this issue Sep 21, 2020

Introduce Retry Support for Spring Cloud LoadBalancer - reactive #659

Closed

OlgaMaciaszek reopened this Sep 22, 2020

OlgaMaciaszek self-assigned this Sep 22, 2020

OlgaMaciaszek added the enhancement label Sep 22, 2020

OlgaMaciaszek added this to the 3.0.0-M4 milestone Sep 22, 2020

OlgaMaciaszek added the in progress label Sep 28, 2020

OlgaMaciaszek mentioned this issue Sep 28, 2020

Avoid retrying on same instance #834

Merged

OlgaMaciaszek closed this as completed in #834 Sep 29, 2020

OlgaMaciaszek removed the in progress label Sep 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enhance RoundRoubinLoadBalancer position #747

Enhance RoundRoubinLoadBalancer position #747

NeatGuyCoding commented Apr 30, 2020 •

edited

Loading

NeatGuyCoding commented Apr 30, 2020

Uh oh!

NeatGuyCoding commented Apr 30, 2020

Uh oh!

spencergibb commented May 4, 2020

Uh oh!

NeatGuyCoding commented May 7, 2020

Uh oh!

NeatGuyCoding commented Jun 3, 2020

Uh oh!

OlgaMaciaszek commented Sep 21, 2020

Uh oh!

OlgaMaciaszek commented Sep 21, 2020

Uh oh!

OlgaMaciaszek commented Sep 22, 2020

Uh oh!

Enhance RoundRoubinLoadBalancer position #747

Enhance RoundRoubinLoadBalancer position #747

Comments

NeatGuyCoding commented Apr 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

NeatGuyCoding commented Apr 30, 2020

Uh oh!

NeatGuyCoding commented Apr 30, 2020

Uh oh!

spencergibb commented May 4, 2020

Uh oh!

NeatGuyCoding commented May 7, 2020

Uh oh!

NeatGuyCoding commented Jun 3, 2020

Uh oh!

OlgaMaciaszek commented Sep 21, 2020

Uh oh!

OlgaMaciaszek commented Sep 21, 2020

Uh oh!

OlgaMaciaszek commented Sep 22, 2020

Uh oh!

NeatGuyCoding commented Apr 30, 2020 •

edited

Loading