Rate Limiting is something that most projects get as a feature late, but that the earlier it comes the better for everyone. Any non-trivial service you use will have rate limits, whenever is a soft limit ("your XXXXX service can only handle Y operations per second on your current pricing tier") or a hard limit ("any table on the database cannot have more than 12k columns per table"), and this is good, because unbounded resources are points of failure. Without restrictions on an HTTP API, you're not only allowing abusive clients to DOS the platform, you're also risking any internal developer mistake to take it down, any big process (like a batch update or a yearly report).
So basically we can agree that every system should have resource limits. There are many different ways to put them in place, but commonly either the software you build (e.g. Python services) will use some component(s) (or implement their own), or use features of the web servers to limit certain type of actions based on some criteria.
Recently at work we wanted to build an internal REST API that would perform small tasks like Google Cloud Tasks (where you queue a task and when dequeued it calls an HTTPS endpoint and you're the one in charge of executing the task on one of the instances behind that hopefully load-balanced URL). To simplify the scenario, let's say we wanted to perform lots of jobs that individually should execute quickly but if massively batched could hurt a database. The best way to avoid problems is making hard for them to happen, so I wanted to put a limit to the amount of requests that the endpoint can receive, so the resources "breathe" enough to never reach too high values.
A good summary of choices regarding rate limit algorithms can be found in the following article: https://konghq.com/blog/how-to-design-a-scalable-rate-limiting-algorithm/
For example, to us didn't mattered much if we implemented a fixed or sliding window algorithm, we don't need that much precision, but one aspect was important: It had to be distributed, because the hosts are load balanced and sometimes there are few instances, but other times there are around a dozen, so leaving 2 tasks per second with 12 instances consumes more resources than when having 3 and could cause system instability. We prefered to be more accurate with load/usage predictions, so that ruled out alternatives like implementing rate limiting at Nginx.
Checking Python libraries the main requisites were for the chosen one to be distributed and easy to use (a decorator being the best option). After some digging, the winner was django-rate-limit, which offered:
- very easy setup as a django 1.11 middleware
- a fixed window distributed rate limit (using Redis for the distributed storage)
- a simple yet configurable decorator to mark http endpoints at the django views. As an added bonus it automatically returns 429 HTTP responses when the rate is exceeded, so no manual handling of exceptions!
- request-path rate-limit key, which while not perfect (no way to rate limit by ip or other custom mechanism like user_id or cookie), was good enough as a staring point and could be implemented in the future without much effort
The library implements one of the two official Redis recommendations of building a rate limiter pattern with INCR so it was good enough and race conditions small enough to not pose an issue.
We already have it working and I even did some quick django tests and confirmed everything works as expected.
As a fun fact, as the library requires Python 3, this was the main reason that I decided to give a try to migrating ticketea.com to Pyton 3.