Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed rate limiter? #350

Closed
javadevmtl opened this issue Feb 27, 2019 · 15 comments
Closed

Distributed rate limiter? #350

javadevmtl opened this issue Feb 27, 2019 · 15 comments
Labels

Comments

@javadevmtl
Copy link

Hi, unless I missed it. Will there be support to back the rate limiter by a distributed cache? Usually for fault tolerance, performance etc... We deploy at least 2 APIs side by side... So somehow the rate limiter would have to know between the API instances the count?

@RobWin
Copy link
Member

RobWin commented Mar 13, 2019

No, we don't have a distributed cache for rate limiters.

@javadevmtl
Copy link
Author

Is it something worth looking at? Thanks

@RobWin
Copy link
Member

RobWin commented Mar 14, 2019

@storozhukBM Correct me if I'm wrong, but my opinion is that Rate Limiters should be fast and not decrease response times and throughput.
A distributed cache introduces a lot of complexity and latency when caches must be synchronized or replicated.
I think the disadvantage is bigger than the advantage.

Maybe you need a centralized ratelimiter inside of a load balancer or proxy instead.

@jwcarman
Copy link

In lieu of some form of shared state for your rate limiters, what is the suggested approach for implementing rate limiting in an elastic/cloud environment? Using a single load balancer or proxy would create a single point of failure, so I don't think that's going to work for a lot of use cases.

@RobWin
Copy link
Member

RobWin commented Mar 21, 2019

There is no approach for the implementation of our current rate limiter.
But a better solution than rate limiters with a shared state could be adaptive capacity management.
See #201 and please have a look at the awesome talk.

@storozhukBM started with an implementation.

@javadevmtl
Copy link
Author

javadevmtl commented Mar 23, 2019

One way I guess it could be done is if a bunch of counters are pre generated for each node and each node gets a range and when it has exhausted it's range and goes back and gets more from the cache...

Something like this: https://apacheignite.readme.io/docs/id-generator

@RobWin
Copy link
Member

RobWin commented Mar 23, 2019

Do you want to implement rate limiting at client-side (source) or server-side (sink)?

If you want to have rate limiting at server-side and have a rate limit per client, you could use an API Gateway like Kong.

@RobWin
Copy link
Member

RobWin commented Mar 23, 2019

Resilience4j-ratelimiter is better used at client-side. A client can also be another server.
If you just want to protect the sink server from overload, I still think that adaptive capacity management (or congestion control at application layer) is a better choice than a distributed rate limiter. Thats why we are working on an initial implementation of an adaptive bulkhead implementation.

@RobWin
Copy link
Member

RobWin commented Mar 23, 2019

I don't think that you should use Resilience4j to reimplement an API gateway your own.

@RobWin
Copy link
Member

RobWin commented Mar 23, 2019

Netflix already implemented an adaptive bulkhead: https://github.com/Netflix/concurrency-limits

@storozhukBM
Copy link
Member

storozhukBM commented Mar 25, 2019

Totally agree with @RobWin on his thoughts.
It is better for your "elastic/cloud" distributed application to avoid any type of shared state by all means, especially if this state is shared across multiple nodes in your cluster.

For me adaptive capacity management looks like ideal solution in your case.

If it is not feasible/suitable for you for some reason, I'd recommend you to pre-calculate target throughput per node and configure it statically.

If your cluster is truly elastic and it is shrinking and growing under load it automatically means that you have some coordination solution (service discovery like consul or eureka, maybe some other type of coordination), in this case you already have some unavoidable shared state, so it would be convenient to use it to dynamically reconfigure your rate limiters on each node. In this case, cluster configuration changes it is relatively rare event of disturbance and all other time everything will work without unnecessary state sharing.

@RobWin RobWin closed this as completed Apr 3, 2019
@astubbs
Copy link

astubbs commented Nov 3, 2020

There are more scenarios I think where shared state rate limiting is a good option - for example where you don't control the target system and where your source system scales dynamically for other reasons, and where rate limit usage may not be the same between all clients in the cluster.
I've found this interesting: https://github.com/vladimir-bukhtoyarov/bucket4j/blob/master/doc-pages/jcache-usage.md and https://github.com/mokies/ratelimitj

@RobWin
Copy link
Member

RobWin commented Nov 3, 2020

Yes, but this would require a complete revision (overhaul) of our metric calculation and storage components.
We are open for contributions for Resilience4j 2.0.

@astubbs
Copy link

astubbs commented Nov 3, 2020

Haha yes ok understood, but that's not what was being pointed out previously. I'm just on the hunt for such a solution and came across all this stuff, so wanted to link them together for other people to also find :)

@jonathannaguin
Copy link

@RobWin do you have a general idea of what components we would need to change to support this? I also came across this thread and althoguh there seem to be alternatives, they don't seem as good as resilience4j.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants