Skip to content

Eureka seems to do not recalculate numberOfRenewsPerMinThreshold during evicting expired leases #2407

Closed
@bakomchik

Description

@bakomchik

Hi
in my test environment services are restarts frequently by mesos-marathon without unregistration(SIGKILL)
so i noticed that eureka instance is in Self-preservation-mode frequently.
eureka has 65 registered services ,but numberOfRenewsPerMinThreshold is 368
that seems to be incorrect.
i found that numberOfRenewsPerMinThreshold decreased only via REST-API

com.netflix.eureka.registry.PeerAwareInstanceRegistryImpl#cancel

but when lease expired

com.netflix.eureka.registry.AbstractInstanceRegistry#evict(long)

numberOfRenewsPerMinThreshold does not decreases,
so numberOfRenewsPerMinThreshold does not correlate with real instance count and expectedNumberOfRenewsPerMin

some greps:

REGISTERD

grep "Registered instance" eureka.log| wc -l
----
286

EXPIRED

grep "Registry: expired lease for" eureka.log| wc -l
----
213

RENEWAL

grep "Current renewal" eureka.log
----
929790926 2017-10-30 10:08:11,991 INFO  [ ReplicaAwareInstanceRegistry - RenewalThresholdUpdater ] c.n.e.r.PeerAwareInstanceRegistryImpl                   | Current renewal threshold is : 368

Activity

p-zalejko

p-zalejko commented on Nov 21, 2017

@p-zalejko

Hi,
I also ran into the same problem and have been investigating it for a while. Here is what I found:

The "Renews threshold" is decreased if a service is shutdown gracefully. Then, before going down it calls com.netflix.eureka.resources.InstanceResource#cancelLease endpoint. It executes PeerAwareInstanceRegistryImpl#cancel method that updates "Renews threshold" value.

If, for instance, a service was killed by SIGKILL then the "cancelLease" is not called. At some point the "evict" method gets triggered and eventually it executes
org.springframework.cloud.netflix.eureka.server.InstanceRegistry#internalCancel method. And there is an issue: it cancels the instance but does not call the com.netflix.eureka.registry.PeerAwareInstanceRegistryImpl#cancel method (which updates "Renews threshold" value).

In my case, I tested this issue in the following way:

  1. For testing purposes, I updated my configuration to get services unregistered quickly:
eureka:
  client:
    registry-fetch-interval-seconds: 15
  server:
    evictionIntervalTimerInMs: 10000
    renewalThresholdUpdateIntervalMs: 15000
  instance:
    lease-renewal-interval-in-seconds: 15
    lease-expiration-duration-in-seconds: 35
    registry-fetch-interval-seconds: 15
  1. Launched an eureka server
  2. Launched 3 services and registered them within the eureka server.
  3. Started killing one of them and restarting again.

After repeating the step 3 many times I got a big "Renews threshold" value, even greater than Renews (last min) . If the self-preservation mode is enabled it can lead to problems because if the "Renews threshold" is never decreased then even after launching back all services, all the renews sent by them might not be enough to escape from the activated "self-preservation" mode (if it has been already activated).

holy12345

holy12345 commented on Nov 24, 2017

@holy12345
Contributor

@p-zalejko I've been following this discussion, is this problem solved?

p-zalejko

p-zalejko commented on Nov 27, 2017

@p-zalejko

Hi, @holy12345 . I think it isn't. I tested a milestone version (2.x) and it behaves the same.

holy12345

holy12345 commented on Nov 29, 2017

@holy12345
Contributor

@p-zalejko Hi,
This will not happen if Eureka Server self-preservation is turned off. The value of Renews threshold is correct.

PeerAwareInstanceRegistryImpl#updateRenewalThreshold()

    private void updateRenewalThreshold() {
        try {
            Applications apps = eurekaClient.getApplications();
            int count = 0;
            for (Application app : apps.getRegisteredApplications()) {
                for (InstanceInfo instance : app.getInstances()) {
                    if (this.isRegisterable(instance)) {
                        ++count;
                    }
                }
            }
            synchronized (lock) {
                // Update threshold only if the threshold is greater than the
                // current expected threshold of if the self preservation is disabled.
                if ((count * 2) > (serverConfig.getRenewalPercentThreshold() * numberOfRenewsPerMinThreshold)
                        || (!this.isSelfPreservationModeEnabled())) {
                    this.expectedNumberOfRenewsPerMin = count * 2;
                    this.numberOfRenewsPerMinThreshold = (int) ((count * 2) * serverConfig.getRenewalPercentThreshold());
                }
            }
            logger.info("Current renewal threshold is : {}", numberOfRenewsPerMinThreshold);
        } catch (Throwable e) {
            logger.error("Cannot update renewal threshold", e);
        }
    }

First of all, this scheduled task defaults to every 15 minutes. if self-preservation is off, the Renews threshold value is recalculated. if self-preservation is turned on, there is no problem as long as the first if statement is true,but i do not understand how to make this condition .apps.getRegisteredApplications() is to get all registered client information?

Do I understand it correctly? Best wishes.

p-zalejko

p-zalejko commented on Nov 29, 2017

@p-zalejko

Hi @holy12345, I think you are right:

If the self-preservation is turned off (server.enableSelfPreservation: false) then we do not have a problem because we do not have self-preservation ;)

But if it is enabled I would expect that thePeerAwareInstanceRegistryImpl#updateRenewalThreshold()updated the 'expectedNumberOfRenewsPerMin' periodically.

Unfortunately, in terms of the PeerAwareInstanceRegistryImpl#updateRenewalThreshold() I do not know why the Applications apps = eurekaClient.getApplications() returns an empty list of apps in my case. As a result, count is 0 so it never gets into the if statement and update the expectedNumberOfRenewsPerMin. It might be a different issue, that could also solve the problem we are discussing on.

holy12345

holy12345 commented on Nov 29, 2017

@holy12345
Contributor

@p-zalejko First of all thank you for your reply :) You're right, count is 0 too in my test environment.What i am trying to do next is to understand why the count is zero, maybe solve this problem, and the problem you ask is solved.If there is progress, I will inform you first, best wishes!

holy12345

holy12345 commented on Nov 30, 2017

@holy12345
Contributor

@p-zalejko Hello, is your eureka server one? I think it should not be a cluster. If it is not a cluster, the value actually obtained is an empty list. best wishes

yolenw

yolenw commented on Apr 26, 2018

@yolenw

i have to follow up on this one, does this mean we are not expected to turn on "self-preservation mode" for eureka server in a none-cluster mode? thanks. @holy12345

spencergibb

spencergibb commented on Jan 28, 2019

@spencergibb
Member

Closing this due to inactivity. Please re-open if there's more to discuss.

Jeffrey-Hassan

Jeffrey-Hassan commented on Aug 7, 2019

@Jeffrey-Hassan

@spencergibb This issue seems to persist as of latest version (1.9.12).

I came across this as well and did some digging on the source, it seems to go to the difference between a graceful shutdown and a non-graceful shutdown (e.g. the EvictTask is removing the instances, rather than a call to cancel a lease).

Ultimately, the AbstractInstanceRegistry.evict method makes a call to "internalCancel" which does not modify the "expectedNumberOfClientsSendingRenews" count. Now we have the "count" of total instances from the update task, which is calculated in PeerAwareInstanceRegistryImpl.updateRenewalThreshold by cycling through all apps and then instances of each app and keeping a tally, with one less for the evicted instance, but the "expectedNumberOfClientsSendingRenews" does not account for that evicted instance.

For a graceful shutdown of a service, PeerAwareInstanceRegistryImpl.cancel seems to be called, which decrements "expectedNumberOfClientsSendingRenews" properly.

I would've expected the variable to be decremented regardless of whether it's an eviction from the timer task or a graceful shutdown. The best place seems to be before internalCancel is called in AbstractInstanceRegistry.evict. Thoughts?

troshko111

troshko111 commented on Jan 27, 2020

@troshko111

@Jeffrey-Hassan this has been fixed in v1.9.17.

1 remaining item

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @spencergibb@Jeffrey-Hassan@troshko111@bakomchik@p-zalejko

        Issue actions

          Eureka seems to do not recalculate numberOfRenewsPerMinThreshold during evicting expired leases · Issue #2407 · spring-cloud/spring-cloud-netflix