Rate Limiter System Design

Highly efficient, scalable, responsive and simple rate limiter design is possible?

There are couple rate limiter designs available. In this article I will show you the simplest one. Advantage of this one design is not only that it is the simplest one, but it is also highly efficient, scalable and responsive.

Problem to solve: highly efficient API requests rate limiter system design.


First parameter that I'm adding to the class is allowedBucketSize - it is an amount of allowed requests in one time unit. Parameter refillAfterMs is time unit in milliseconds - it is time frame to run allowed amount of requests. Any request over that amount for the time unit will be rejected.

currentBucketSize parameter stores current count of available request amount to run. This is called the bucket - initially bucket is empty and requests are allowed. Every request makes the bucket bigger in size. If bucket is full - requests will be rejected. Bucket size is refilled not less than once per refillAfterMs.

To calculate how much bucket need to be refilled - parameter lastRefillTime is required. Data type of that parameter is Long as it will store timestamp of last refill.

RateLimiter class code listing:

Rate limiter system design class part 1

Rate limiter system design - refill() method

Rate limiter system design class part 2

Rate limiter system design - allowRequest() method

In refill() method I'm checking if lastRefillTime is empty. If it is null then it means rate limiter just started and we need to refill the bucket - set currentBucketSize to 0. In other case I do check if currentBucketSize > 0 and then calculate with how many requests I should refill the bucket. Please, pay attention to that.

How it can be scalable?

Imagine the situation when we do need to limit requests by each token. Every token can have its own threshold. In this case we should use HashMap to control every single token with it's own limit.

In case we have many API servers - how to synchronize that all? We should be using shared Redis to store RateLimiter parameters, so it will be synchronized between all the API servers and also have very good response performance.

The source code from the article?

You can find source code from this article on my Gitlab repo, just follow link below:


How about code tests?

I'm inviting you to watch code discussion and testing on the video that I've prepared on my YouTube. You can find the link to video at the bottom of this article.

Welcome to my YouTube channel, watch, comment, like and SUBSCRIBE