Back to System Design
System Design
hard
mid

How would you design a highly performant web app that renders large data tables with real time updates?

Virtualize rows (and columns if wide), paginate/window the data, push heavy sort/filter/aggregation to the server or a Web Worker, apply real-time updates as targeted patches with batching/throttling, and keep the UI responsive with memoization and stable references.

7 min read·~30 min to think through

A large data table with real-time updates stresses both rendering (too many cells) and update throughput (constant changes). The design has to handle both without jank.

1. Rendering — never render the whole table

  • Row virtualization — render only the visible window + overscan (@tanstack/react-virtual). DOM stays at ~30 rows regardless of dataset size.
  • Column virtualization too if the table is very wide.
  • Sticky headers/columns layered on top of the virtualized body.
  • Pagination or windowed fetching — don't even load all rows; fetch pages/cursors on demand or load incrementally.
  • Variable row heights → measurement cache + position index.

2. Data operations — keep heavy work off the main thread

Sorting/filtering/grouping/aggregating large datasets synchronously freezes the UI.

  • Server-side sort/filter/paginate when the dataset is large or shared — the server returns just the page you need. Best default for "millions of rows."
  • Web Worker for client-side ops on big in-memory datasets.
  • If on the main thread, chunk and yield, and debounce filter inputs.
  • Memoize derived data; use useDeferredValue so typing in a filter stays responsive.

3. Real-time updates — the hard part

Updates arriving constantly can cause a re-render storm.

  • Targeted patches — update only the changed rows/cells, keyed by id; never replace the whole dataset (which would re-render everything).
  • Batch & throttle — coalesce bursts of updates into one render per frame/interval instead of one render per message.
  • Normalize the data ({ [id]: row }) so a patch is an O(1) lookup, not an array scan.
  • Off-screen updates are cheap — virtualization means only visible changed rows actually re-render; off-screen patches just update the store.
  • Subtle UX — flash/highlight changed cells, but don't reorder rows out from under the user mid-scroll unless they asked to sort.
  • WebSocket/SSE for the stream, with reconnection + resync (refetch current page on reconnect).

4. React-level performance

  • Memoize rows (React.memo) keyed by id so unchanged rows skip re-render.
  • Stable references for callbacks/props passed into rows.
  • Selector-based store (Zustand/Redux) so a cell update notifies only that row's subscribers.
  • Avoid inline functions/objects in the row render path.

5. Putting it together

ts
Server: paginated + sorted + filtered query  ─┐
WebSocket: row patches ───────────────────────┤→ normalized store {id: row}

useVirtualizer → renders visible window ───────┘
  └ memoized <Row> subscribes to its own id's slice
  └ batched/throttled update flush

The framing

"Two pressures: rendering and update throughput. Rendering — virtualize rows/columns and page the data so DOM size is constant. Throughput — apply real-time changes as targeted, batched patches against a normalized store, so only visible changed rows re-render. Heavy sort/filter goes server-side or to a Web Worker. Plus row memoization and stable references. The principle is: bound the DOM, bound the work per frame, and patch surgically."

Follow-up questions

  • How do you stop a stream of real-time updates from causing a re-render storm?
  • When do you push sort/filter to the server vs a Web Worker vs the main thread?
  • Why normalize the row data, and how does it help with patches?
  • How do real-time row updates interact with virtualization?

Common mistakes

  • Rendering all rows/cells and freezing the browser.
  • Replacing the whole dataset on every update instead of patching by id.
  • One re-render per incoming message instead of batching.
  • Sorting/filtering huge datasets synchronously on the main thread.
  • Not memoizing rows, so any update re-renders every visible row.

Performance considerations

  • Virtualization caps DOM and reconciliation cost. Normalized store + targeted patches make updates O(1) and limit re-render blast radius. Batching/throttling bounds renders-per-second. Server-side or worker-side data ops keep the main thread free.

Edge cases

  • An update for a row that's currently off-screen (cheap — just patch the store).
  • A real-time update that changes sort order while the user is scrolling.
  • Reconnection — resync the current page after missed updates.
  • Variable row heights with live content changes.

Real-world examples

  • Trading dashboards, observability tables, analytics grids with live-updating rows.
  • @tanstack/react-virtual + a normalized Zustand store + batched WebSocket patches.

Senior engineer discussion

Seniors decompose it into rendering pressure (virtualize + page) and update throughput (normalized store, targeted batched patches, throttling), and place heavy data ops server-side or in a worker. They explain how virtualization makes off-screen updates nearly free and use memoized rows + selector stores to bound the re-render blast radius.

Related questions