Skip to content

Enhance RoundRoubinLoadBalancer position #747

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
NeatGuyCoding opened this issue Apr 30, 2020 · 8 comments · Fixed by #834
Closed

Enhance RoundRoubinLoadBalancer position #747

NeatGuyCoding opened this issue Apr 30, 2020 · 8 comments · Fixed by #834
Assignees
Milestone

Comments

@NeatGuyCoding
Copy link
Contributor

NeatGuyCoding commented Apr 30, 2020

Is your feature request related to a problem? Please describe.

Assume there are retries for a rpc call to a remote service, and the retry scheme is retry twice for a call. Assume there are two instances (instance A and instance B) for this service. We want the retry is made twice and each time to a different instance.

But there is a problem while using the RoundRobinLoadBalancer. The position is a Atomic field shared across threads and there would be a case that the retry is made to the same instance to the last time:

  • Thread 1: position = 0, call A;
  • Thread 2: position = 1, call B;
  • Thread 1: call A failed, position = 2, still retry A;

Describe the solution you'd like

Make the position a thread local field and each initialize with random value:

ThreadLocal<Long> position =  ThreadLocal.withInitial(() ->  ThreadLocalRandom.current().nextInt(1000));

@NeatGuyCoding
Copy link
Contributor Author

Well, I found that the result is retrived through:

supplier.get().next().map(this::getInstanceResponse);

this is assined to another thread in reactor, therefore the threadlocal position is not good enough.

Consider usinf context to keep origin call context

@NeatGuyCoding
Copy link
Contributor Author

one solution is that:

private final ConcurrentHashMap<Long, Integer> position;

 public Mono<Response<ServiceInstance>> choose(Request request) {
        // TODO: move supplier to Request?
        // Temporary conditional logic till deprecated members are removed.
        long threadId = Thread.currentThread().getId();
        if (serviceInstanceListSupplierProvider != null) {
            ServiceInstanceListSupplier supplier = serviceInstanceListSupplierProvider
                    .getIfAvailable(NoopServiceInstanceListSupplier::new);
            return supplier.get().next().map(serviceInstances -> getInstanceResponse(serviceInstances, threadId));
        }
        ServiceInstanceSupplier supplier = this.serviceInstanceSupplier
                .getIfAvailable(NoopServiceInstanceSupplier::new);
        return supplier.get().collectList().map(serviceInstances -> getInstanceResponse(serviceInstances, threadId));
    }

    private Response<ServiceInstance> getInstanceResponse(
            List<ServiceInstance> instances, long threadId) {
        if (instances.isEmpty()) {
            log.warn("No servers available for service: " + this.serviceId);
            return new EmptyResponse();
        }
        // TODO: enforce order?
        int pos = Math.abs(this.position.computeIfAbsent(threadId, k -> ThreadLocalRandom.current().nextInt(1000)));
        this.position.put(threadId, pos + 1);
        ServiceInstance instance = instances.get(pos % instances.size());
        return new DefaultResponse(instance);
    }

@spencergibb
Copy link
Member

A few issues. The number of instances is low these algorithms work much better with higher numbers of instances. Can you increase the retry count?

@NeatGuyCoding
Copy link
Contributor Author

@spencergibb increase the retry count will cause the service enter circuit breaker on more often and increase the possibility of avalanche which is not preferred. Moreover, the more the number of threads, the more possibility of retrying same instance

@NeatGuyCoding
Copy link
Contributor Author

A further solution is to create a random position for each request Id (when including sleuth, it should be traceId). Ensure the position is separated by request Id and retry will not be executed on same instance before

@OlgaMaciaszek
Copy link
Collaborator

@HashZhang That's a good point. We are now working on adding retry support and we'll probably be adding a similar solution there.

@OlgaMaciaszek
Copy link
Collaborator

Closing in favour of #659

@OlgaMaciaszek
Copy link
Collaborator

Actually, this will be useful for both blocking and non-blocking retries, so will keep it as a separate issue after all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants