Performance

medium

mid

How do you optimize frontend performance for an application at scale?

At scale, perf is layered: network (CDN, HTTP/2, compression, image formats), bundle (code splitting, tree-shake, dependency audit), render (avoid expensive re-renders, memoization where it matters, virtualization for long lists), runtime (debounce inputs, web workers for CPU work), and data (caching, dedup, pagination). Measure first (RUM + Core Web Vitals), find the bottleneck, fix the biggest one, repeat. Don't memoize prematurely; don't micro-optimize what isn't slow.

9 min read·~5 min to think through

Performance "at scale" usually means: high request volume, large bundles, lots of users on slow networks/devices. The approach is to measure, isolate the bottleneck, and fix in layers — not premature micro-optimization.

Step 0: measure first

You can't optimize what you don't measure. Two sources:

Lab data: Lighthouse, WebPageTest, local profiling. Reproducible, but synthetic.
Field data (RUM): Real-user metrics via web-vitals.js → analytics. P75/P95 from actual users on actual devices/networks. This is what matters.

Track Core Web Vitals: LCP (largest paint), INP (interaction responsiveness, replaced FID), CLS (layout stability). Add custom marks for app-specific flows (time-to-search-results, time-to-checkout-load).

Network layer

CDN: serve static assets and ideally HTML from edge.
HTTP/2 or HTTP/3: multiplexing eliminates 6-conn limit; header compression cuts overhead.
Compression: Brotli for text (~20% better than gzip).
Image formats: AVIF / WebP (30–50% smaller than JPEG).
Responsive images: srcset + sizes so mobile doesn't download 1600w when 400w fits.
Caching: long max-age on hashed assets; ETag/Last-Modified for HTML; service worker for offline.
Resource hints: preload LCP image + critical fonts, preconnect to third-party origins.

Bundle layer

Code split per route and per heavy widget. Initial bundle target: <200KB gzipped for typical SaaS.
Tree-shaking: import only what you use (import { debounce } from 'lodash-es', not the whole lodash).
Dependency audit: bundle-analyzer once a quarter; rip out duplicates, replace heavy deps (moment → date-fns or dayjs).
Modern JS for modern browsers: type=module + nomodule fallback. Skip polyfills for unsupported old browsers if your audience doesn't have them.
Defer non-critical CSS: inline above-the-fold, async-load the rest.

Render layer (React-specific)

Avoid unnecessary re-renders: stable callbacks (useCallback) and memoized values (useMemo) where they actually save work. Don't blanket-memoize — useMemo itself has cost.
List virtualization: react-window / TanStack Virtual for >100 rows.
Suspend heavy children: React.lazy + Suspense for modals, drawers.
Concurrent features: useTransition for non-urgent updates (filter input → big list re-sort).
Avoid layout thrash: don't interleave reads (offsetWidth) with writes (style.x) in a tight loop.

Runtime / interaction layer

Debounce / throttle rapid events (resize, scroll, input).
Web workers for CPU-heavy work (parsing, image manipulation, compression) — keeps main thread responsive for INP.
requestIdleCallback for low-priority background work (analytics flush, log shipping).
Avoid blocking the main thread: split long tasks (>50ms) with scheduler.yield() or chunked setTimeout.

Data layer

Cache at HTTP level (Cache-Control, ETag) and app level (React Query / RTKQ).
Dedup identical concurrent requests.
Paginate / cursor: don't return 10k rows when 50 will do.
Stream large responses (SSE for live, ReadableStream for big JSON).
Optimistic UI: update locally on mutation, reconcile with server.

Process

Run RUM. Find the worst metric.
Open DevTools → Performance → record the slow flow.
Identify the bottleneck (LCP image too big? Long task on render? Waterfall fetch?).
Fix the biggest one. Re-measure.
Repeat.

What NOT to do

Don't useMemo every value — the comparison cost can exceed the recompute cost for cheap operations.
Don't reach for web workers for tiny tasks — postMessage overhead can eat the win.
Don't optimize what isn't profiled-slow.
Don't replace clear code with cryptic micro-optimizations for ~1% wins.
Don't trust synthetic benchmarks — measure real user metrics.

Mental model

Most performance wins at scale come from a handful of changes:

Right image format + size.
Smaller initial JS bundle.
Cache.
Defer offscreen.

The exotic stuff (web workers, virtualization, concurrent React) helps in specific cases but is rarely the biggest lever.

Follow-up questions

•How do you decide what to memoize vs leave alone?
•When does virtualization start to pay off?
•What's the difference between INP and FID?
•How do you decide between SSR, SSG, and CSR for a slow page?

Common mistakes

•Optimizing without measuring — random improvements that don't move metrics.
•Blanket useMemo/useCallback — adds overhead without saving work.
•Ignoring real-user metrics; only looking at Lighthouse scores.
•Lazy-loading the LCP image (tanks LCP).
•Putting heavy deps in the main bundle 'just in case'.
•Big-O thinking for tiny n — micro-optimizing a 10-item loop.

Performance considerations

•The fastest code is no code. The fastest request is no request. Cache aggressively; defer aggressively; ship less. Compounding wins: smaller bundle → faster TTI → less main-thread blocking → better INP → happier users → better business metrics.

Edge cases

•Low-end Android devices can be 5-10x slower than dev hardware — always test on a throttled CPU.
•Network: don't assume fast WiFi; test on 'Slow 3G' DevTools throttling.
•Battery / Save-Data signals — opt out of expensive prefetches.
•First-time vs repeat visitors have very different bundles (cache state).
•INP measures the worst interaction in a session — one slow handler can tank the metric for the whole visit.

Real-world examples

•Pinterest cut JS by 50% and saw +40% sign-ups.
•BBC moved to lazy-load below-fold images and dropped data transfer by 25%.
•Etsy uses RUM (mPulse) to find slow real-user pages, not just slow synthetic ones.

Senior engineer discussion

Seniors lead with measurement (RUM + lab), then prioritize by biggest-lever-first. They distinguish 'feels slow' (interaction latency, INP) from 'loads slow' (LCP, FCP) — they're different fixes. They know which optimizations actually move metrics on real users and which are vanity tuning. They also tie perf to business outcomes (LCP -1s → +X% conversion) to get buy-in for the work.