Rate Limiting: Why Your API Is Capping Your Users

All Notes

Technology

An hourglass with falling sand.

The Distributed Denial of Intent

Sometimes, a user isn’t trying to attack you; they just wrote a bad loop in their script. Without Rate Limiting, one runaway script can eat all your server resources and lock out everyone else.

The Two Main Contenders

1. Fixed Window Counter

The simplest approach. You allow 100 requests per hour. On the stroke of the hour, the counter resets to zero.

The Problem: Users can blast 100 requests at 1:59 and another 100 at 2:00, effectively doubling their limit in one minute.

2. Sliding Window Log/Counter

Instead of resetting at the hour, this algorithm looks at the last 60 minutes relative to the current time.

The Benefit: It is much smoother and impossible to “game” by timing requests around window resets.

Token Bucket: The Gold Standard

Used by Amazon and Google, the Token Bucket algorithm allows for “bursts.” Imagine a bucket that fills with tokens at a steady rate. Every request costs one token. If the bucket is full, you can burst 10 requests at once. But once it’s empty, you are limited to the rate at which tokens are added.

Conclusion

Rate limiting is the first line of defense for any production API. It’s not just about “capping” users; it’s about ensuring Fairness and Availability for everyone.