Back to Performance
Performance
medium
mid

How do you optimize frontend performance for an application at scale?

At scale, perf is layered: network (CDN, HTTP/2, compression, image formats), bundle (code splitting, tree-shake, dependency audit), render (avoid expensive re-renders, memoization where it matters, virtualization for long lists), runtime (debounce inputs, web workers for CPU work), and data (caching, dedup, pagination). Measure first (RUM + Core Web Vitals), find the bottleneck, fix the biggest one, repeat. Don't memoize prematurely; don't micro-optimize what isn't slow.

9 min read·~5 min to think through

Performance "at scale" usually means: high request volume, large bundles, lots of users on slow networks/devices. The approach is to measure, isolate the bottleneck, and fix in layers — not premature micro-optimization.

Step 0: measure first

You can't optimize what you don't measure. Two sources:

  • Lab data: Lighthouse, WebPageTest, local profiling. Reproducible, but synthetic.
  • Field data (RUM): Real-user metrics via web-vitals.js → analytics. P75/P95 from actual users on actual devices/networks. This is what matters.

Track Core Web Vitals: LCP (largest paint), INP (interaction responsiveness, replaced FID), CLS (layout stability). Add custom marks for app-specific flows (time-to-search-results, time-to-checkout-load).

Network layer

  • CDN: serve static assets and ideally HTML from edge.
  • HTTP/2 or HTTP/3: multiplexing eliminates 6-conn limit; header compression cuts overhead.
  • Compression: Brotli for text (~20% better than gzip).
  • Image formats: AVIF / WebP (30–50% smaller than JPEG).
  • Responsive images: srcset + sizes so mobile doesn't download 1600w when 400w fits.
  • Caching: long max-age on hashed assets; ETag/Last-Modified for HTML; service worker for offline.
  • Resource hints: preload LCP image + critical fonts, preconnect to third-party origins.

Bundle layer

  • Code split per route and per heavy widget. Initial bundle target: <200KB gzipped for typical SaaS.
  • Tree-shaking: import only what you use (import { debounce } from 'lodash-es', not the whole lodash).
  • Dependency audit: bundle-analyzer once a quarter; rip out duplicates, replace heavy deps (moment → date-fns or dayjs).
  • Modern JS for modern browsers: type=module + nomodule fallback. Skip polyfills for unsupported old browsers if your audience doesn't have them.
  • Defer non-critical CSS: inline above-the-fold, async-load the rest.

Render layer (React-specific)

  • Avoid unnecessary re-renders: stable callbacks (useCallback) and memoized values (useMemo) where they actually save work. Don't blanket-memoize — useMemo itself has cost.
  • List virtualization: react-window / TanStack Virtual for >100 rows.
  • Suspend heavy children: React.lazy + Suspense for modals, drawers.
  • Concurrent features: useTransition for non-urgent updates (filter input → big list re-sort).
  • Avoid layout thrash: don't interleave reads (offsetWidth) with writes (style.x) in a tight loop.

Runtime / interaction layer

  • Debounce / throttle rapid events (resize, scroll, input).
  • Web workers for CPU-heavy work (parsing, image manipulation, compression) — keeps main thread responsive for INP.
  • requestIdleCallback for low-priority background work (analytics flush, log shipping).
  • Avoid blocking the main thread: split long tasks (>50ms) with scheduler.yield() or chunked setTimeout.

Data layer

  • Cache at HTTP level (Cache-Control, ETag) and app level (React Query / RTKQ).
  • Dedup identical concurrent requests.
  • Paginate / cursor: don't return 10k rows when 50 will do.
  • Stream large responses (SSE for live, ReadableStream for big JSON).
  • Optimistic UI: update locally on mutation, reconcile with server.

Process

  1. Run RUM. Find the worst metric.
  2. Open DevTools → Performance → record the slow flow.
  3. Identify the bottleneck (LCP image too big? Long task on render? Waterfall fetch?).
  4. Fix the biggest one. Re-measure.
  5. Repeat.

What NOT to do

  • Don't useMemo every value — the comparison cost can exceed the recompute cost for cheap operations.
  • Don't reach for web workers for tiny tasks — postMessage overhead can eat the win.
  • Don't optimize what isn't profiled-slow.
  • Don't replace clear code with cryptic micro-optimizations for ~1% wins.
  • Don't trust synthetic benchmarks — measure real user metrics.

Mental model

Most performance wins at scale come from a handful of changes:

  1. Right image format + size.
  2. Smaller initial JS bundle.
  3. Cache.
  4. Defer offscreen.

The exotic stuff (web workers, virtualization, concurrent React) helps in specific cases but is rarely the biggest lever.

Follow-up questions

  • How do you decide what to memoize vs leave alone?
  • When does virtualization start to pay off?
  • What's the difference between INP and FID?
  • How do you decide between SSR, SSG, and CSR for a slow page?

Common mistakes

  • Optimizing without measuring — random improvements that don't move metrics.
  • Blanket useMemo/useCallback — adds overhead without saving work.
  • Ignoring real-user metrics; only looking at Lighthouse scores.
  • Lazy-loading the LCP image (tanks LCP).
  • Putting heavy deps in the main bundle 'just in case'.
  • Big-O thinking for tiny n — micro-optimizing a 10-item loop.

Performance considerations

  • The fastest code is no code. The fastest request is no request. Cache aggressively; defer aggressively; ship less. Compounding wins: smaller bundle → faster TTI → less main-thread blocking → better INP → happier users → better business metrics.

Edge cases

  • Low-end Android devices can be 5-10x slower than dev hardware — always test on a throttled CPU.
  • Network: don't assume fast WiFi; test on 'Slow 3G' DevTools throttling.
  • Battery / Save-Data signals — opt out of expensive prefetches.
  • First-time vs repeat visitors have very different bundles (cache state).
  • INP measures the worst interaction in a session — one slow handler can tank the metric for the whole visit.

Real-world examples

  • Pinterest cut JS by 50% and saw +40% sign-ups.
  • BBC moved to lazy-load below-fold images and dropped data transfer by 25%.
  • Etsy uses RUM (mPulse) to find slow real-user pages, not just slow synthetic ones.

Senior engineer discussion

Seniors lead with measurement (RUM + lab), then prioritize by biggest-lever-first. They distinguish 'feels slow' (interaction latency, INP) from 'loads slow' (LCP, FCP) — they're different fixes. They know which optimizations actually move metrics on real users and which are vanity tuning. They also tie perf to business outcomes (LCP -1s → +X% conversion) to get buy-in for the work.

Related questions