Load Balancing for Beginners: Understanding Sticky Sessions Simply
Breaking down load balancers, sticky sessions, and the algorithms that keep the internet running
Hello guys, Imagine this: your web app works perfectly in staging, but the moment it scales to multiple servers in production, users start complaining — “I keep getting logged out!” or “My session disappears randomly!”
You check the authentication logic. It’s fine. You inspect your cookies. Also fine. Then it hits you — the problem isn’t in your code. It’s in how your load balancer is routing traffic.
Modern web applications are rarely served from a single machine. We use multiple servers to handle scale, speed, and reliability. But when a user’s requests jump between different servers without preserving session data, they lose authentication state — and boom, they’re logged out.
That’s where sticky sessions (or session affinity) come in. It’s a simple yet crucial concept that ensures users consistently hit the same backend server — preventing session chaos and login headaches.
In this post, we’ll break down why this happens, how load balancers manage user sessions, and the pros and cons of enabling sticky sessions. Whether you’re debugging an existing issue or designing a scalable authentication flow, this guide will help you truly understand what’s going on behind the scenes
For this article, I have teamed up with Sahil Sarwar, a passionate Software Engineer and we’ll dive into details about load balancers and sticky sessions.
By the way, if you are preparing for System design interviews and want to learn System Design in a limited time then you can also checkout sites like Codemia.io, ByteByteGo, and Design Guru , which have many great System design courses and questions for FAANG interview prep.
With that over to
to take you through rest of the article.You probably would have heard about load balancers if you went into some of the topics in system design. We will see WHAT their types are, and HOW they work.
In this blog, we will focus on the following -
Load balancers, WHAT they are, their types, and HOW they work
Sticky Sessions and their use
Load Balancers
As the name suggests, it balances the load. What kind of load, you ask? The requests that come to the servers. Let me explain it better.
How will you distribute the load when you do horizontal scaling for your servers? What decides which requests should go to which server?
Well, that’s what a load balancer does.
Types of Load Balancers
Layer 4 Load Balancer (Transport Layer)
Layer 4 (L4) refers to the Transport Layer of the OSI model, which deals with TCP and UDP protocols.
An L4 load balancer doesn’t look into the HTTP headers or payload. It just sees:
Source IP/Port
Destination IP/Port
Transport protocol (TCP/UDP)
It uses this metadata to decide which backend server should handle the request.
How it works
When a client sends a packet:
The L4 load balancer intercepts it by sitting in the middle (as a reverse proxy, a reverse proxy sits between clients and backend servers, forwarding requests on their behalf).
It uses IP + Port and protocol to apply a routing algorithm (round-robin, least connections, we will look at these later).
It forwards the packet to the chosen backend server, possibly rewriting the destination IP.
The backend responds to the load balancer; it can:
Relay the response back (proxy mode).
Or let the client talk directly to the backend after the connection is set
Why is it fast and efficient
No parsing of HTTP headers or payloads.
No TLS termination (unless configured).
Minimal CPU/memory overhead.
Works well for raw TCP (e.g., SSH, FTP) and UDP (e.g., DNS, gaming) traffic.
While looking into these, I stumbled across TLS termination. Let’s understand it in more detail.
TLS Termination
TLS termination is when the load balancer handles the TLS (or SSL) decryption of incoming HTTPS traffic before forwarding it to backend servers.
TLS termination breaks end-to-end encryption unless you re-encrypt before forwarding to the backend (called TLS re-encryption or TLS bridging).
Layer 7 Load Balancer (Application Layer)
L7 load balancers operate at the application layer of the OSI model, which means they understand and manipulate the content of the actual HTTP (or gRPC, WebSocket, etc.) requests, not just IPs and ports like L4.
How it works
1. Parses HTTP Headers, Cookies, URLs, etc.
Since L7 sees full HTTP requests, it can inspect:
Hostheader - useful for virtual hostingPath, Method, Query Params
Cookies (for sticky sessions)
JWT tokens, Auth headers - useful for auth-aware routing
This enables very fine-grained control of request routing.
GET /api/v1/orders HTTP/1.1
Host: shop.example.com
Authorization: Bearer <token>L7 LB can route:
/api/v1/ordersto microservice orders/auth/v1/cartsto microservice carts
2. Can Route Traffic Based on Content
Because it sees the full request, it can:
Do path-based routing:
/api/,/admin/,/static/Do host-based routing:
user1.myapp.com→ tenant1, etc.Rate-limit or redirect specific clients
3. Slightly Slower Due to Deep Packet Inspection
Every incoming packet must be fully reassembled to understand the complete HTTP request.
TLS termination (if done) adds CPU cost.
More memory usage for tracking sessions, cookies, and header parsing.
Still, modern L7 proxies (like NGINX) are optimized and blazing fast, often handling thousands of RPS.
But compared to L4 load balancers (which forward packets without even reading HTTP), they are a bit heavier.
Load Balancing Algorithms
Round Robin
Each request goes to the next server in the list.
Simple, doesn’t consider server load.
Good for equally capable backend nodes.
Least connection
Route new traffic to the server with the fewest active connections.
More dynamic; useful when backend servers handle variable traffic sizes.
IP Hashing
The LB applies a hash function (like
hash(client_ip) % N), where N is the number of backend servers.The result maps to a backend index, and the request is forwarded there.
This means every future request from the same client IP goes to the same backend, as long as the backend pool doesn’t change.
Sticky Sessions
When we deploy an app behind a load balancer, it distributes user requests across multiple backend servers. But in many apps, user sessions, like login status or cart items, are stored in memory.
If one request goes to server A and the next to server B, the session is lost.
Sticky sessions solve this by ensuring requests from the same user always go to the same server. It’s a quick fix, but not always the best long-term solution.
This is usually done by identifying users via their IP address, a cookie, or a custom header, so all future requests are directed to the correct backend.
How Sticky Sessions Work
Via Cookies (L7 Load Balancers)
The load balancer inserts a special cookie (
X-Backend-Id,AWSALB, etc.) in the response.On the next request, the client sends that cookie back.
The LB reads the cookie and routes the request to the same backend instance that originally set it.
Via IP Hashing (L4 or L7)
The LB hashes the client’s IP (
hash(ip) % backend_count) to pick a server.This ensures the same client IP always goes to the same server.
Downside: fails if IP changes or when servers scale up/down.
Via Session Tokens (Application Level)
The app sends a session token in headers or cookies.
The LB reads that token (in L7 mode) and forwards traffic to the correct server.
Sticky sessions are handy when you need quick fixes or are dealing with legacy systems.
Sticky sessions don’t handle failover well; if the assigned backend crashes, the session is lost unless it has a backup storage.
Modern systems often avoid sticky sessions by storing session data outside the app:
Use centralized stores like Redis or Memcached for session data.
Use client-side JWT tokens to keep auth data in cookies.
Make app servers stateless, so any backend can handle any request.
By the way, if you are preparing for System design interviews and want to learn System Design in a limited time then you can also checkout sites like Codemia.io, ByteByteGo, and Design Guru , which have many great System design courses and questions for FAANG interview prep.
In-Short
Load Balancers distribute incoming traffic across multiple servers to ensure reliability and scalability.
Types of Load Balancers:
Layer 4 (Transport Layer): Routes traffic based on IP, port, and protocol (TCP/UDP); fast and efficient, used for raw protocols.
Layer 7 (Application Layer): Routes based on HTTP headers, cookies, paths, etc.; allows fine-grained control like host/path-based routing.
TLS Termination is when the load balancer decrypts HTTPS traffic before sending it to the backend. Saves backend work but breaks end-to-end encryption unless re-encrypted.
Load Balancing Algorithms:
Round Robin: Equal distribution, ignores current load.
Least Connections: Chooses the server with the fewest active connections.
IP Hashing: Uses the client IP to always send them to the same backend.
Sticky Sessions ensure all requests from the same client go to the same backend server.
Useful when session state is stored in memory.
Sticky Session Techniques:
Cookies: Load balancer inserts a cookie to identify the backend.
IP Hashing: Same client IP always maps to the same server.
Session Tokens: App issues a token, and LB routes accordingly.
Modern Best Practice:
Avoid sticky sessions by using stateless app servers and storing session data in Redis, Memcached, or JWT tokens.
Conclusion
That’s all for this week’s blog. In the next part, we will try building a load balancer from scratch. We will implement the algorithms we came across here and also look at how to configure L4 and L7 load balancers on our machines.
And, if you like this article, don’t forget to subscribe to Sahil’s newsletter , “Brain, Bytes and Binary” where he shares his thoughts on system design and programming.
Other System Design Articles you may like












