Back to Networking
Networking
medium
mid

Why use WebSockets instead of HTTP polling for real time features?

HTTP is request/response — client must ask. Polling every N seconds wastes bandwidth, CPU, and battery, and adds latency up to N. WebSockets open a persistent full-duplex TCP connection so the server can push the moment something changes — near-zero latency, no per-tick overhead, one TCP+TLS handshake amortized across the session. Use WS for chat, presence, collab, live prices, notifications. Use HTTP/SSE for one-way streams; HTTP/long-poll only as a fallback.

8 min read·~5 min to think through

The polling problem

Polling is "ask the server every N seconds: anything new?"

js
setInterval(() => fetch('/api/messages').then(applyDelta), 3000);

Costs per poll, even when nothing changed:

  • Full HTTP request: headers (~500B–2KB with auth/cookies), TCP/TLS overhead on cold connections, server-side auth + DB query.
  • Battery drain on mobile (radio wake-up per poll).
  • Worst-case latency = poll interval (3s means up to 3s lag).
  • At scale: 10k users × 1 req/3s = 3,333 req/s sustained for zero new data.

Aggressive polling (1s or faster) multiplies all of that. Engineers reach for it because it's easy; ops pays the bill in CPU, bandwidth, and DB load.

WebSockets

A WS connection is established once via HTTP Upgrade handshake, then becomes a persistent TCP socket carrying framed messages in both directions:

js
const ws = new WebSocket('wss://example.com/feed');
ws.onmessage = e => applyMessage(JSON.parse(e.data));
ws.send(JSON.stringify({ type: 'subscribe', room: 'channel-42' }));

Properties:

  • Server push: the server sends a frame the instant something changes. Latency ≈ network RTT, not poll interval.
  • Tiny frames: ~2–14 bytes of framing overhead per message vs ~500B–2KB of HTTP headers.
  • Full-duplex: client and server can send independently.
  • One handshake: TCP + TLS + Upgrade once, then free.

When to pick which

NeedBest fitWhy
Chat / DMsWebSocketBidirectional, push-driven
Multiplayer / collab cursorsWebSocketLow-latency bidirectional, frequent small messages
Live price ticker / sports scoresSSEServer → client only; no need for full-duplex
Server-initiated notificationsSSE or WSSSE simpler; WS if you also need client→server
Slowly-changing data (every 1–5 min)HTTP pollingDon't pay for a persistent connection
One-off API callHTTPNo reason to keep a socket
AI streaming (token-by-token)SSE or HTTP streamOne-way server → client, fits HTTP semantics well

Server-Sent Events (SSE)

The often-forgotten middle ground. Standard HTTP, one-way server → client, auto-reconnect built in:

js
const es = new EventSource('/api/stream');
es.onmessage = e => applyEvent(JSON.parse(e.data));

Cheaper to operate than WS (it's just HTTP) and works through most corporate proxies. Pick SSE when you don't need client→server messages.

Long polling — the fallback

Client opens a request; server holds it open until it has data or a timeout (~30s), then responds; client immediately reopens. Mimics push over HTTP. Useful where WS is blocked (some enterprise proxies), but stateful, hard to scale, and largely obsolete.

Operating WebSockets in production

The hard parts aren't the API — they're operations:

  • Reconnection: connections die (mobile network changes, proxy timeouts). Implement exp-backoff reconnect + a resume token / last-event-id for missed messages.
  • Backpressure: a slow client can fill the server's send buffer. Drop or throttle.
  • Horizontal scaling: WS is stateful (the user is "on" server N). You need a pub/sub bus (Redis, NATS) so a message published on server A reaches a subscriber on server B.
  • Idle timeouts: load balancers (AWS ALB default 60s, Cloudflare ~100s) close idle sockets. Send ping frames every ~30s.
  • Auth: the initial handshake is HTTP, so cookies / JWT work. Reauth on token expiry is application-level.
  • Cost: persistent connections cost memory. ~10k connections per modern node is comfortable; tune ulimit + kernel.

When polling IS the right answer

  • Data changes slowly and infrequently (status pages refreshed every minute).
  • The client only cares "the next time I look," not "the moment it changes."
  • The infrastructure can't host WS (cheap static hosting + plain HTTP API).
  • Simplicity matters more than latency.

The case against is aggressive polling — 1s/2s intervals against APIs serving thousands of users. That's the antipattern. Once-a-minute polling for a low-stakes signal is fine.

Follow-up questions

  • When would you pick SSE over WebSockets?
  • How do you scale WebSocket fanout across multiple servers?
  • How do you handle WS reconnection and missed-message catch-up?
  • What's the difference between long polling and SSE?

Common mistakes

  • Polling every 1s without considering server cost — works locally, crushes prod at 10k users.
  • Building WS without a reconnect strategy — first network blip and the feature dies.
  • Pinning users to a single WS server and forgetting horizontal scaling — load balancer round-robins, users lose state on deploys.
  • No idle pings — LB closes the socket every 60s, users see flapping.
  • Reaching for WS for one-way data when SSE is simpler.
  • Holding application state in-memory on a WS server — lost on restart.

Performance considerations

  • A single WS message can be <20 bytes total (frame overhead + small payload). The same notification over HTTP polling is hundreds of bytes per poll *whether or not* anything changed. At scale, WS reduces server CPU (no per-poll auth/DB hit), bandwidth (compact frames), and client battery (no radio wake-ups). Memory is the trade-off: persistent connections cost ~50KB each in a typical Node setup.

Edge cases

  • HTTP/2 multiplexing + SSE behaves better than HTTP/1.1 SSE (no 6-connection limit per origin).
  • WebSockets through corporate proxies sometimes fail; SSE or long-poll may be the only option.
  • Browser tab throttling: background tabs throttle setInterval, but WS messages still arrive — important for chat-style apps.
  • TLS termination at the LB requires WSS handshake support (most modern LBs handle it, but check).
  • Message ordering across reconnects: design with sequence numbers / cursors so the client can resume.

Real-world examples

  • Slack, Discord, WhatsApp web: WS for messages + presence.
  • Google Docs, Figma, Notion: WS for collaborative editing.
  • Stock tickers, sports apps: SSE or WS for live updates.
  • ChatGPT-style UIs: SSE for token-by-token streaming.
  • Status pages, dashboards updated every minute: HTTP polling is fine.

Senior engineer discussion

Seniors frame this as a *cost model* conversation, not a 'cool tech' choice. They quantify: 'at 10k DAU and 1s polling, that's X req/s, $Y/mo CDN+origin, P95 latency = poll interval. WS halves the bill and gives sub-second latency, but adds reconnect logic, pub/sub fanout, and stateful operations cost.' They also know SSE exists and is the right answer for many cases where teams reach for WS by default.

Related questions