The Green Dashboard That Lied
A client called us on a Friday afternoon because their e-commerce site had been dropping orders all week. Their NGINX load balancer showed all three backend servers as healthy. Traffic was flowing. No errors in the logs at first glance. But one server had a memory leak that made it respond in 8 seconds instead of 200 milliseconds, and round robin kept feeding it a third of all traffic. Three days of degraded service because nobody questioned the default NGINX load balancing methods configuration.
That engagement changed how we deploy every load balancer going forward. The method you choose determines who suffers when something goes wrong.
Round Robin: Simple Until It Isn’t
NGINX uses round robin by default. You don’t even have to declare it. Requests go to each backend server in sequential order, one after another. Server A, server B, server C, back to A.
upstream backend {
server 10.0.0.101;
server 10.0.0.102;
server 10.0.0.103;
}
This works perfectly when all your backends are identical in hardware, load, and response time. The moment one server slows down, round robin doesn’t care. It keeps sending the same share of traffic to the struggling node. That is exactly what happened to our client.
You can add weights to compensate for unequal hardware. A server with weight=3 receives three times the requests of a server with weight=1. But weights are static. They don’t adapt to real-time conditions.
upstream backend {
server 10.0.0.101 weight=3;
server 10.0.0.102 weight=2;
server 10.0.0.103 weight=1;
}
My honest take: round robin is fine for stateless APIs behind identical containers. For anything else, you are gambling that all backends stay equally healthy forever. They won’t.
Least Connections: Let the Numbers Decide
The least_conn directive sends each new request to whichever backend currently has the fewest active connections. If server A has 12 connections and server B has 4, the next request goes to B.
upstream backend {
least_conn;
server 10.0.0.101;
server 10.0.0.102;
server 10.0.0.103;
}
This is the method we switched to for the e-commerce client. Within minutes the slow server naturally received fewer requests because its connections stayed open longer. Traffic self-corrected without anyone touching a config file at 2 AM.
Least connections shines when request processing times vary. Long database queries, file uploads, report generation—any workload where some requests take 50 milliseconds and others take 5 seconds. The algorithm adapts in real time.
One caveat: least connections still doesn’t account for server capacity. A server with 2 CPUs holding 10 connections is more stressed than a 16-core machine holding 10 connections. Combine least_conn with weights if your backends aren’t identical.
IP Hash: When Sessions Must Stay Put
The ip_hash directive calculates a hash from the first three octets of the client’s IPv4 address and pins that client to a specific backend. Same IP, same server, every time.
upstream backend {
ip_hash;
server 10.0.0.101 weight=3;
server 10.0.0.102 weight=2;
server 10.0.0.103;
}
We implemented this for a healthcare client running a legacy application that stored session data locally on each web server. The app had no shared session store and no plans to add one before their compliance audit. IP hash gave us session persistence without rewriting their application.
But here is the problem nobody mentions in tutorials. Many ISPs assign dynamic IP addresses that rotate every 24 hours. Corporate users often share a single public IP through NAT, so hundreds of employees land on the same backend. Mobile users switch between Wi-Fi and cellular and get a new IP mid-session. IP hash is not a substitute for proper session management.
For custom distribution keys, NGINX supports the generic hash directive where you define your own key. You could hash on a cookie, a header, or any variable. This is useful for TCP load balancing as well—for example, distributing MySQL connections across replicas:
stream {
upstream mysql_cluster {
hash $remote_addr;
server 10.0.0.201 weight=2;
server 10.0.0.202;
server 10.0.0.203 backup;
}
server {
listen 3306;
proxy_pass mysql_cluster;
}
}
Picking the Right Method for Production
Match the Algorithm to the Workload
Round robin for identical, stateless backends behind uniform containers. Least connections for variable-duration requests or mixed backend capacity. IP hash only when you have no other option for session persistence and you understand the limitations.
Don’t Forget Health Checks
None of these methods help if NGINX keeps sending traffic to a dead server. Configure max_fails and fail_timeout on every backend. NGINX marks a server as unavailable after the specified number of failures and retries after the timeout expires. We set max_fails=3 and fail_timeout=30s as a baseline across client environments and adjust from there.
upstream backend {
least_conn;
server 10.0.0.101 max_fails=3 fail_timeout=30s;
server 10.0.0.102 max_fails=3 fail_timeout=30s;
server 10.0.0.103 backup;
}
Notice the backup directive on the third server. It only receives traffic when the primary servers are both down. This is your safety net, and it should be on different hardware or at least a different failure domain.
Test the Failure, Not Just the Config
After every deployment, we pull one backend out of rotation and verify that traffic redistributes correctly. Then we bring it back and confirm it rejoins the pool. If you manage backup monitoring for your infrastructure, apply the same discipline here. A load balancer you haven’t tested under failure is just a single point of hope.
The algorithm selection also interacts with your concurrency strategy upstream. If your application layer throttles connections per server, least connections will naturally respect those limits better than round robin.
Verify Before You Walk Away
After configuring your upstream block, test the config and watch the access logs on each backend:
nginx -t
nginx -s reload
# On each backend, watch requests arrive:
tail -f /var/log/nginx/access.log
Send test traffic with curl in a loop and confirm distribution matches your expectations. Check that failover works by stopping one backend and verifying requests shift to the survivors. The Ubuntu server documentation covers NGINX installation and service management if you need the basics.
The right NGINX load balancing method is the one that fails gracefully when a backend misbehaves—not the one that looks cleanest in the config file. If you need help designing a load balancing strategy that actually survives production failures, reach out to our team.

