Back to System Design
System Design
hard
mid

How would you design a chat application like Facebook Messenger?

WebSocket for real-time messaging; optimistic send with idempotency keys and per-message status (sending/sent/delivered/read); virtualized message list with cursor pagination (load older on scroll up); typing indicators via throttled events; presence via heartbeat; attachments uploaded separately and referenced by id; service worker for offline send queue; push notifications; e2e encryption optional but increasingly expected.

6 min read·~35 min to think through

A chat app touches real-time, optimistic UX, persistence, presence, and offline — most patterns in one product. Cover the architecture first, then 2–3 deep dives.

1. Transport — WebSocket

  • WebSocket (with auth on the upgrade) is the right answer for bidirectional, low-latency.
  • Fallback: long-polling for environments that block WS (rare today).
  • Reconnect with exponential backoff; resume from last received message id.

2. Message lifecycle & status

ts
typing → sending → sent → delivered → read
failed (retry)

Each message has a client-side state independent of the server. Render the status icon (check, double-check, "..."), and don't rely on it for correctness — server is the source of truth.

3. Optimistic send

ts
1. User hits Send.
2. Insert message into local thread with state="sending" and a clientId (UUID).
3. POST to server (or WS event) with clientId as Idempotency-Key.
4. Server returns a server-side message id; update local message {state: "sent", serverId}.
5. On delivery/read receipts from recipient(s), update state.

The Idempotency-Key prevents duplicate messages on retry / reconnect.

4. Message list — virtualized infinite scroll

Threads can have thousands of messages.

  • Virtualize (react-window with anchor-from-bottom).
  • Cursor pagination — load older messages on scroll up; preserve scroll position when prepending.
  • Anchor at the bottom by default; "scroll to bottom" button when scrolled up and a new message arrives.

The "preserve scroll on prepend" detail is tricky:

  • Record scrollHeight before fetch.
  • After prepending older items, set scrollTop += (newScrollHeight - oldScrollHeight).
  • Or use overflow-anchor CSS.

5. Typing indicators

When the user types:

  • Throttle events to ~one per 2–3 seconds (don't spam).
  • Send "typing" event with TTL.
  • Recipient shows "Anya is typing..." until TTL or a "stopped typing" event.

6. Presence

  • Heartbeat every 30s while WS is open → online.
  • Last seen persists when offline.
  • Show presence sparingly (privacy + accuracy concerns).

7. Attachments

  • Upload to object storage (S3) via pre-signed URL — separate from the chat WS.
  • Attachment in message references the uploaded id; recipient fetches lazily.
  • Progress indicator during upload; failed uploads retry separately.
  • Thumbnails generated server-side.

8. Offline & background

  • Service worker for offline shell + outbox.
  • Messages composed offline → outbox in IndexedDB → replayed on reconnect.
  • Push notifications for new messages while app is closed.

9. Read receipts & delivery

  • "Delivered" when recipient's device acks.
  • "Read" when message enters the recipient's viewport (debounced).
  • Privacy: opt-in for read receipts.

10. Threading / replies

  • Reply-to references the parent message id; UI shows a quote/jump.
  • Threads (separate sub-conversation) — separate channel with its own message list.

11. End-to-end encryption (if asked)

  • Keys per user; messages encrypted client-side with recipient's public key (Signal Protocol / similar).
  • Server stores ciphertext only.
  • Significant complexity (key distribution, group chats, key rotation) — flag the depth, don't dive unless asked.

12. Performance

  • Memoize messages so one new message doesn't re-render the thread.
  • Lazy-load attachments and media.
  • Code-split less-used surfaces (settings, integrations).
  • Coalesce typing/presence events on the wire.

13. Accessibility

  • Message list is a log (role="log", aria-live="polite"); announce new messages.
  • Each message is an article with author and timestamp.
  • Keyboard navigation; Esc to close threads; / to focus compose.
  • Screen reader–friendly status announcements.

Interview framing

"WebSocket transport with reconnection + resume from last message id. Each sent message has a clientId used as Idempotency-Key — duplicate-safe under retries. Message states drive the UI status icons (sending / sent / delivered / read). The thread is a virtualized list anchored at the bottom; older messages load on scroll up with cursor pagination, preserving scroll position on prepend. Typing indicators are throttled with TTLs. Attachments upload via S3 pre-signed URLs, separate from the chat channel. Offline support: outbox in IndexedDB via the service worker, replays on reconnect. Push for closed-app notifications. The depth deep dives are usually the message-state machine, scroll-on-prepend, and how the WS reconnect resumes without duplicates."

Follow-up questions

  • Why use an idempotency key on send?
  • How do you preserve scroll position when prepending older messages?
  • How does the WebSocket resume without dropping or duplicating messages?
  • What's the right transport for typing indicators?

Common mistakes

  • Re-fetching the whole thread on each new message.
  • Auto-scrolling to bottom even when user has scrolled up.
  • Sending typing events on every keystroke.
  • Same WS channel for attachments — saturates the message channel.
  • No idempotency — duplicate messages on reconnect.

Performance considerations

  • Virtualize; memoize messages; lazy-load attachments; throttle typing/presence; coalesce reads.

Edge cases

  • WS drops mid-send.
  • Same user on two devices receiving the same message.
  • Very old thread with 50k messages.
  • Time-zone display for timestamps.
  • Out-of-order message arrival.

Real-world examples

  • Messenger, WhatsApp, Slack, Discord, Signal.

Senior engineer discussion

Seniors separate concerns (transport, attachment upload, presence), nail the message state machine and idempotency, get scroll-on-prepend right, and treat offline as a first-class feature.

Related questions