How do you optimize frontend performance for an application at scale?
At scale, perf is layered: network (CDN, HTTP/2, compression, image formats), bundle (code splitting, tree-shake, dependency audit), render (avoid expensive re-renders, memoization where it matters, virtualization for long lists), runtime (debounce inputs, web workers for CPU work), and data (caching, dedup, pagination). Measure first (RUM + Core Web Vitals), find the bottleneck, fix the biggest one, repeat. Don't memoize prematurely; don't micro-optimize what isn't slow.
Performance "at scale" usually means: high request volume, large bundles, lots of users on slow networks/devices. The approach is to measure, isolate the bottleneck, and fix in layers — not premature micro-optimization.
Step 0: measure first
You can't optimize what you don't measure. Two sources:
- Lab data: Lighthouse, WebPageTest, local profiling. Reproducible, but synthetic.
- Field data (RUM): Real-user metrics via web-vitals.js → analytics. P75/P95 from actual users on actual devices/networks. This is what matters.
Track Core Web Vitals: LCP (largest paint), INP (interaction responsiveness, replaced FID), CLS (layout stability). Add custom marks for app-specific flows (time-to-search-results, time-to-checkout-load).
Network layer
- CDN: serve static assets and ideally HTML from edge.
- HTTP/2 or HTTP/3: multiplexing eliminates 6-conn limit; header compression cuts overhead.
- Compression: Brotli for text (~20% better than gzip).
- Image formats: AVIF / WebP (30–50% smaller than JPEG).
- Responsive images: srcset + sizes so mobile doesn't download 1600w when 400w fits.
- Caching: long max-age on hashed assets; ETag/Last-Modified for HTML; service worker for offline.
- Resource hints: preload LCP image + critical fonts, preconnect to third-party origins.
Bundle layer
- Code split per route and per heavy widget. Initial bundle target: <200KB gzipped for typical SaaS.
- Tree-shaking: import only what you use (
import { debounce } from 'lodash-es', not the whole lodash). - Dependency audit: bundle-analyzer once a quarter; rip out duplicates, replace heavy deps (moment → date-fns or dayjs).
- Modern JS for modern browsers:
type=module+ nomodule fallback. Skip polyfills for unsupported old browsers if your audience doesn't have them. - Defer non-critical CSS: inline above-the-fold, async-load the rest.
Render layer (React-specific)
- Avoid unnecessary re-renders: stable callbacks (useCallback) and memoized values (useMemo) where they actually save work. Don't blanket-memoize — useMemo itself has cost.
- List virtualization: react-window / TanStack Virtual for >100 rows.
- Suspend heavy children: React.lazy + Suspense for modals, drawers.
- Concurrent features: useTransition for non-urgent updates (filter input → big list re-sort).
- Avoid layout thrash: don't interleave reads (offsetWidth) with writes (style.x) in a tight loop.
Runtime / interaction layer
- Debounce / throttle rapid events (resize, scroll, input).
- Web workers for CPU-heavy work (parsing, image manipulation, compression) — keeps main thread responsive for INP.
- requestIdleCallback for low-priority background work (analytics flush, log shipping).
- Avoid blocking the main thread: split long tasks (>50ms) with scheduler.yield() or chunked setTimeout.
Data layer
- Cache at HTTP level (Cache-Control, ETag) and app level (React Query / RTKQ).
- Dedup identical concurrent requests.
- Paginate / cursor: don't return 10k rows when 50 will do.
- Stream large responses (SSE for live, ReadableStream for big JSON).
- Optimistic UI: update locally on mutation, reconcile with server.
Process
- Run RUM. Find the worst metric.
- Open DevTools → Performance → record the slow flow.
- Identify the bottleneck (LCP image too big? Long task on render? Waterfall fetch?).
- Fix the biggest one. Re-measure.
- Repeat.
What NOT to do
- Don't useMemo every value — the comparison cost can exceed the recompute cost for cheap operations.
- Don't reach for web workers for tiny tasks — postMessage overhead can eat the win.
- Don't optimize what isn't profiled-slow.
- Don't replace clear code with cryptic micro-optimizations for ~1% wins.
- Don't trust synthetic benchmarks — measure real user metrics.
Mental model
Most performance wins at scale come from a handful of changes:
- Right image format + size.
- Smaller initial JS bundle.
- Cache.
- Defer offscreen.
The exotic stuff (web workers, virtualization, concurrent React) helps in specific cases but is rarely the biggest lever.
Follow-up questions
- •How do you decide what to memoize vs leave alone?
- •When does virtualization start to pay off?
- •What's the difference between INP and FID?
- •How do you decide between SSR, SSG, and CSR for a slow page?
Common mistakes
- •Optimizing without measuring — random improvements that don't move metrics.
- •Blanket useMemo/useCallback — adds overhead without saving work.
- •Ignoring real-user metrics; only looking at Lighthouse scores.
- •Lazy-loading the LCP image (tanks LCP).
- •Putting heavy deps in the main bundle 'just in case'.
- •Big-O thinking for tiny n — micro-optimizing a 10-item loop.
Performance considerations
- •The fastest code is no code. The fastest request is no request. Cache aggressively; defer aggressively; ship less. Compounding wins: smaller bundle → faster TTI → less main-thread blocking → better INP → happier users → better business metrics.
Edge cases
- •Low-end Android devices can be 5-10x slower than dev hardware — always test on a throttled CPU.
- •Network: don't assume fast WiFi; test on 'Slow 3G' DevTools throttling.
- •Battery / Save-Data signals — opt out of expensive prefetches.
- •First-time vs repeat visitors have very different bundles (cache state).
- •INP measures the worst interaction in a session — one slow handler can tank the metric for the whole visit.
Real-world examples
- •Pinterest cut JS by 50% and saw +40% sign-ups.
- •BBC moved to lazy-load below-fold images and dropped data transfer by 25%.
- •Etsy uses RUM (mPulse) to find slow real-user pages, not just slow synthetic ones.