Frontend system design: design WhatsApp Web (HLD)
Real-time messaging UI: WebSocket connection for live messages, a normalized client store of chats/messages, optimistic sending with delivery states, virtualized message lists, offline queue + local persistence (IndexedDB), pagination of history, presence/typing, and reconnection handling.
WhatsApp Web is a real-time messaging client — the high-level design centers on the live connection, the client data model, and message lifecycle.
1. Real-time transport
- WebSocket as the primary channel — bidirectional, low-latency, for sending/receiving messages, presence, typing, read receipts.
- Reconnection logic — exponential backoff, and on reconnect sync missed messages (the server replays what you missed since your last-seen cursor).
- Fallback considerations (long-polling) if WS unavailable.
2. Client data model — normalized store
A normalized store (not nested blobs):
chats: { [chatId]: { id, lastMessageId, unreadCount, participants } }
messages: { [messageId]: { id, chatId, body, senderId, status, timestamp } }
chatMessageIds: { [chatId]: [orderedMessageIds] }Normalization avoids duplication and makes updates (a status change on one message) cheap and consistent.
3. Message lifecycle — optimistic send + delivery states
When the user sends:
- Immediately add the message to the store with a temp id and status
sending— render it instantly (optimistic UI). - Send over the WebSocket.
- On ack, reconcile: replace temp id with the real id, status →
sent. - Status progresses:
sent → delivered → read(the checkmarks), driven by server events. - On failure: status →
failed, offer retry.
4. Rendering performance
- Virtualize the message list — chats have thousands of messages; render only the visible window (variable-height, stick-to-bottom, prepend-on-scroll-up —
react-virtuoso-style). - Virtualize the chat list too if large.
- Memoized message rows, stable keys.
- Lazy-load media (images/video) in messages.
5. History & pagination
- Load the most recent N messages per chat; paginate older history on scroll-up (cursor-based).
- Cache loaded history.
6. Offline & persistence
- Persist to IndexedDB — chats and messages so the app loads instantly and works offline.
- Offline send queue — messages composed offline are queued and flushed on reconnect.
- Sync/merge local and server state on reconnect.
7. Presence & typing
- Online/last-seen presence and "typing…" indicators — ephemeral, high-frequency, synced over the WS but not persisted.
8. Other concerns
- Notifications (new message while tab inactive), unread counts, search, media upload with progress, end-to-end encryption boundary (crypto is its own layer), multi-device sync, accessibility.
The framing
"It's a real-time messaging client, so: a WebSocket for live messages/presence/receipts with reconnection-and-resync logic; a normalized client store of chats and messages so updates are cheap and consistent; and an optimistic send lifecycle — render immediately with a temp id and sending status, then reconcile through sent/delivered/read. Performance comes from virtualizing the message and chat lists. History is paginated on scroll-up. IndexedDB persists everything for instant loads and offline use, with an offline send queue flushed on reconnect. Presence and typing are ephemeral WS state, not persisted."
Follow-up questions
- •Why normalize the client store?
- •Walk through the optimistic-send message lifecycle.
- •Why is virtualization essential here?
- •How do you handle reconnection and missed messages?
Common mistakes
- •Polling instead of using a WebSocket for real-time.
- •A denormalized/nested store causing update inconsistencies.
- •Waiting for the server before showing the user's own message.
- •Rendering all messages with no virtualization.
- •Persisting ephemeral presence/typing state.
Performance considerations
- •Virtualization caps DOM size for long chats and chat lists; normalized store makes status updates O(1); IndexedDB gives instant cold loads; lazy media loading and batched WS updates keep the main thread free.
Edge cases
- •Sending while offline — queue and flush on reconnect.
- •Reconnect after missing messages — server replay/sync.
- •Out-of-order message delivery.
- •Multi-device — same account open in multiple places.
- •Very large chats / media-heavy messages.
Real-world examples
- •WhatsApp Web, Telegram Web, Slack — all this architecture.
- •IndexedDB-backed offline-first messaging clients.