Back to Machine Coding
Machine Coding
easy
mid

How would you build a virtualized list for very large datasets?

Render only the rows visible in the viewport + a small overscan buffer. Compute visible window from scrollTop / itemHeight. For variable heights, measure rows + use a positions cache. Total scroll height is a spacer div equal to total content height. Adopt TanStack Virtual or react-window — rolling your own is a rabbit hole.

4 min read·~25 min to think through

Native lists fall over around 5–10k rows: too many DOM nodes, scroll jank, big memory. Virtualization keeps DOM ~constant by rendering only what's on screen.

Core idea

ts
[ spacer top   ]   ← height = (firstVisible * rowHeight)
[ visible rows ]   ← actual DOM nodes
[ spacer bottom]   ← height = ((total - lastVisible) * rowHeight)

Or absolute-positioned rows inside a fixed-height scroller — same effect.

Fixed-height implementation

tsx
function VList({ items, rowHeight, height }) {
  const [scrollTop, setScrollTop] = useState(0);

  const totalHeight = items.length * rowHeight;
  const first = Math.floor(scrollTop / rowHeight);
  const visibleCount = Math.ceil(height / rowHeight);
  const overscan = 5;
  const start = Math.max(0, first - overscan);
  const end = Math.min(items.length, first + visibleCount + overscan);

  return (
    <div
      style={{ height, overflow: "auto" }}
      onScroll={(e) => setScrollTop(e.currentTarget.scrollTop)}
    >
      <div style={{ height: totalHeight, position: "relative" }}>
        {items.slice(start, end).map((item, i) => (
          <div
            key={item.id}
            style={{
              position: "absolute",
              top: (start + i) * rowHeight,
              height: rowHeight,
              width: "100%",
            }}
          >
            {item.label}
          </div>
        ))}
      </div>
    </div>
  );
}

O(visible) DOM nodes regardless of item count.

Variable height

Three strategies:

  1. Estimated heights — assume an average; measure rows as they render; correct on the fly. (TanStack Virtual approach.)
  2. Pre-measured — known heights up front (rare).
  3. Lazy measured — measure on mount; store; recompute total height.

Maintain a positions cache: positions[i] = positions[i-1] + heights[i-1].

Overscan

Render a few rows above/below the viewport so fast scrolling doesn't flash empty space. 5–10 is typical.

Scroll perf

  • Avoid re-rendering all visible rows on each scroll tick — memoize row component.
  • Throttle scroll events? Usually not needed — React 18 batching + memoization is enough.
  • Avoid layout in row renderers (no offsetHeight reads).

Sticky headers / grouping

Compute the active group by scrollTop; render a sticky header overlay above the absolute-positioned rows.

Edge cases

  • Scroll restoration — when user navigates back, restore scrollTop or item.
  • Resizing viewport — recompute visible window on resize.
  • Items load asynchronously — show placeholders; replace in place.
  • Anchored insertion — when items are prepended (chat), preserve scroll position by adjusting scrollTop.

Accessibility

  • Native scrollbar still works.
  • For grids, use role="grid" only if you implement full keyboard navigation; otherwise stick with <ul> / <table>.
  • Screen readers see only rendered rows — provide an "items 1-20 of 10000" indicator + load-more or proper aria-rowcount on grid.

When NOT to virtualize

  • Lists under ~200 rows. Plain rendering is simpler.
  • Lists where users print or Ctrl-F. Browser search misses unrendered rows.

Interview framing

"Keep DOM nodes ~constant by rendering only the visible slice + overscan. Compute visible range from scrollTop / itemHeight; position rows absolutely within a tall spacer. Variable heights add a measurement cache. Memoize the row component, avoid layout reads in renderers. Adopt TanStack Virtual or react-window in real code — rolling your own gets you to v1 fast and breaks on edge cases (variable height, sticky headers, scroll restoration, RTL). Don't virtualize lists under ~200 rows."

Follow-up questions

  • How would you handle variable row heights?
  • Why not throttle scroll events?
  • How would you implement infinite scroll on top of this?

Common mistakes

  • Re-rendering all visible rows per scroll tick.
  • Reading layout in row renderers (forces sync layout).
  • Forgetting to set scroll container height.
  • Virtualizing small lists.

Performance considerations

  • O(visible) DOM nodes. Memoize row components. Avoid layout in renderers. Use Web Worker for very heavy row computation.

Edge cases

  • Scroll restoration.
  • Anchored prepend (chat scrolling up).
  • Variable height + Ctrl-F.
  • Right-to-left languages.

Real-world examples

  • TanStack Virtual, react-window, react-virtuoso, AG Grid, Slack's message list.

Senior engineer discussion

Seniors adopt libraries and focus on the integration concerns (sticky headers, scroll restoration, anchored prepends) rather than reinventing the kernel.

Related questions