Back to System Design
System Design
medium
mid

How would you build a real time order tracker using WebSockets?

Open a WebSocket scoped to the order, render status from a server-pushed event stream, handle reconnection with backoff and state resync, fall back to SSE/polling, and keep the UI optimistic but reconcilable. Clean up the socket on unmount.

7 min read·~20 min to think through

An order tracker is a server-pushed state machine rendered in the UI. WebSockets give the low-latency push; the hard parts are connection lifecycle, resync, and graceful degradation.

Architecture

ts
Order service → events → WS gateway → client subscribes to order:{id}
Client renders a status timeline from the event stream

Connection lifecycle

  1. Connect on mount, scoped to the orderwss://api/orders/{id}/stream or a generic socket with a subscribe message. Authenticate (token in the connection handshake).
  2. Render from events — the server pushes { status, location, eta, updatedAt }. UI is a function of the latest known state.
  3. Clean up on unmount — close the socket / unsubscribe in the useEffect cleanup, or you leak connections.

Reliability — the part interviewers care about

  • Reconnect with exponential backoff + jitter — networks drop constantly on mobile. Cap the delay; don't hammer the server.
  • Resync on reconnect. A socket can miss events while down. On reconnect, fetch the current state via REST (or send a "last event id" so the server replays). Never assume the stream is gap-free.
  • Heartbeat / ping-pong to detect dead connections the OS hasn't noticed.
  • Fallback chain: WebSocket → SSE → long-poll → plain polling. Some networks/proxies block WS entirely.
  • Idempotency & ordering: include a sequence number or timestamp; drop stale/out-of-order events so a late packet can't move the status backwards.

UI concerns

  • Connection status indicator — "Live" vs "Reconnecting…" so the user trusts the screen.
  • Optimistic but reconcilable — show the latest known state immediately; reconcile when fresh data arrives.
  • Status timeline — render past steps as done, current as active, future as pending.
  • Background tab — throttled timers; rely on push, and resync on visibilitychange.

React shape

js
useEffect(() => {
  const ws = connectWithBackoff(`/orders/${id}/stream`, {
    onMessage: (evt) => dispatch(applyOrderEvent(evt)),
    onReopen: () => refetchOrderState(id), // resync gap
  });
  return () => ws.close();
}, [id]);

Scale notes

WS gateway needs sticky sessions or a pub/sub backplane (Redis) so any node can serve any client. For very high fan-out, consider a managed service (Ably, Pusher) instead of hand-rolling.

Follow-up questions

  • How do you handle events missed while the socket was disconnected?
  • Why exponential backoff with jitter instead of a fixed retry interval?
  • When would SSE be a better choice than WebSockets here?
  • How do you scale the WebSocket gateway horizontally?

Common mistakes

  • Not cleaning up the socket on unmount, leaking connections.
  • Assuming the event stream is gap-free — no resync after reconnect.
  • Fixed-interval reconnect that hammers the server during an outage.
  • No fallback when WS is blocked by a proxy or corporate network.

Performance considerations

  • WebSockets avoid polling overhead and give sub-second latency. But each open socket consumes server memory — scope connections, close them aggressively, and use a pub/sub backplane for fan-out. Batch rapid location updates to avoid re-render storms.

Edge cases

  • Out-of-order or duplicate events moving the status backwards.
  • Tab backgrounded for a long time, then resumed.
  • Token expiry mid-connection.
  • Order completed/cancelled while the user is watching — terminal states.

Real-world examples

  • Food delivery apps (Uber Eats, Swiggy) showing live courier location and status.
  • E-commerce shipment tracking with carrier webhook to WS push.

Senior engineer discussion

The senior signal is treating reliability as the core problem, not the happy path: reconnection with backoff+jitter, resync-on-reconnect to close event gaps, sequence numbers for ordering, and a full degradation chain to SSE/polling. They also raise horizontal scaling (sticky sessions, Redis backplane) and the build-vs-buy call (managed real-time services for high fan-out).

Related questions