Round Robin Load Balancing: Why It’s Not Enough Anymore

All Notes

Technology

Abstract network nodes and connections.

The Simplicity of Round Robin

Round Robin is the “Hello World” of load balancing. It’s fair, simple, and remarkably easy to implement. At its core, the algorithm works like a dealer in a card game: just send Request 1 to Server A, Request 2 to Server B, Request 3 to Server C, and then circle back to Request 4 for Server A.

In the early days of the web, when we had clusters of identical beige boxes running the same simple scripts, Round Robin was perfect. But in modern distributed systems, simplicity without context can be a dangerous trap.

Why Classic Round Robin Is Failing in 2025

When we move from toy examples to production-grade traffic with millions of users, the flaws of simple, blind Round Robin start to cause cascading failures:

1. The “Heterogeneous Infrastructure” Problem

In 2025, your “servers” aren’t just boxes. You might have some AWS t3.medium instances and some m5.xlarge instances in the same pool during a scaling event. Round Robin treats them as equals. If you send 100 requests to a “potato” server and 100 to a “beast” server, the potato will eventually saturate, latency will spike, and your global user experience will suffer.

2. The Weight of a Request

Not all requests are created equal. A “Ping” or “Health Check” request takes microseconds and almost no CPU. A “Generate PDF” or “Run Complex Analytics” request might hog a core for 5 seconds. Round Robin is context-blind; it can easily pile ten “heavy” requests onto one server while another handles ten “light” ones, leading to massive resource imbalance.

3. Statefulness and Sticky Sessions

Modern applications often require state. While we strive for statelessness, the reality is that many systems rely on session affinity. Simple Round Robin breaks session persistence because it distributes based on index, not on client identity. Without complex “sticky bits” implementation, a user might find themselves logged out or losing their cart between requests.

Better Alternatives: The Dynamic Era

If you are graduating from simple Round Robin, here are the patterns you should be looking at:

Weighted Round Robin (The First Step)

This is Round Robin with a brain. You assign a capacity weight to each server.

Server A (Weight 5): Receives 5 requests.
Server B (Weight 1): Receives 1 request. This acknowledges the reality of heterogeneous hardware and prevents smaller instances from being crushed.

Least Connections (The Pragmatic Choice)

This is often the default for modern load balancers (like NGINX or AWS ALB). The load balancer keeps track of how many active connections are currently hitting each server and always sends the next request to the one with the lowest number of active sessions. This naturally handles the “heavy request” problem because a server busy with a long task will naturally keep a connection open longer, causing the balancer to skip it.

IP Hashing / Consistent Hashing

Instead of a circular list, you use the client’s IP or a Session ID to calculate a hash. This ensures that the same client always hits the same backend server. This is the gold standard for stateful apps and caching layers, as it increases your cache hit rate by keeping user sessions local.

Dynamic Health-Based Balancing

Advanced balancers now incorporate “Passive” and “Active” health checks. If a server starts returning 5xx errors or its latency crosses a threshold, the balancer doesn’t just keep sending it 1/3 of the traffic. It temporarily ejects it from the pool or reduces its weight dynamically.

Comparison: Which Strategy Should You Use?

Strategy	Complexity	Best For	Caveats
Round Robin	Low	Static, identical clusters	Uneven loads kill it
Weighted RR	Medium	Managed hardware mixes	Hard to maintain manually
Least Conn	Medium	General long-lived requests	Requires state tracking
Hash-Based	High	Caching, Sticky Sessions	Horizontal scaling triggers re-hashes

Conclusion: When to Keep It Simple

Is Round Robin dead? No. It still has a place in static, homogeneous environments where every unit of compute is identical (like ephemeral Docker containers in a local dev environment or simple k8s pods with no state).

But for anything that faces the public internet or handles multi-tenant enterprise data, simple Round Robin is no longer enough. It’s time to embrace the “Weight” and the “Context” of your traffic.