Back to Performance
Performance
medium
mid

How do you measure performance in real world projects?

Lab + field together. Lab: Lighthouse CI in PRs (catches regressions before deploy), WebPageTest for deep dives. Field: web-vitals.js shipped from real users via sendBeacon to analytics; segment by device/network/geography/route. Watch p75 and p95, not averages. Pair with custom marks for product-critical flows. Tie metrics to business KPIs (conversion, bounce) so perf work gets prioritized. Alert on rate-of-change.

9 min read·~5 min to think through

Real-world measurement combines synthetic tests for catching regressions before deploy with field data for prioritizing investments based on actual users.

Lab measurement (synthetic)

Lighthouse CI runs Lighthouse on every PR:

yaml
- name: Lighthouse CI
  uses: treosh/lighthouse-ci-action@v10
  with:
    urls: |
      https://staging.example.com/
      https://staging.example.com/checkout
    budgetPath: ./lighthouse-budget.json
    uploadArtifacts: true

lighthouse-budget.json:

json
[
  {
    "path": "/*",
    "resourceSizes": [
      { "resourceType": "script", "budget": 200 },
      { "resourceType": "image", "budget": 300 }
    ],
    "resourceCounts": [{ "resourceType": "third-party", "budget": 10 }],
    "timings": [{ "metric": "interactive", "budget": 3000 }]
  }
]

Fails the PR if a budget is exceeded. Pin Lighthouse to a specific version so scores are comparable across runs.

WebPageTest: deeper analysis — filmstrip, waterfall, request blocking. Great for one-off investigations; too slow for CI.

Field measurement (RUM)

web-vitals package, shipped to your analytics:

js
import { onCLS, onINP, onLCP, onFCP, onTTFB } from 'web-vitals';

function send(metric) {
  const body = JSON.stringify({
    name: metric.name,
    value: metric.value,
    rating: metric.rating,
    id: metric.id,
    route: location.pathname,
    deviceMemory: navigator.deviceMemory,
    hardwareConcurrency: navigator.hardwareConcurrency,
    connection: navigator.connection?.effectiveType,
    saveData: navigator.connection?.saveData,
  });
  navigator.sendBeacon('/api/rum', body) ||
    fetch('/api/rum', { body, method: 'POST', keepalive: true });
}

onCLS(send); onINP(send); onLCP(send); onFCP(send); onTTFB(send);

sendBeacon survives page unload; falls back to keepalive fetch.

Custom marks for product flows

Web Vitals are generic. Add marks for user-visible flows:

js
performance.mark('checkout:start');
// user fills form, submits, gets confirmation
performance.mark('checkout:done');
performance.measure('checkout', 'checkout:start', 'checkout:done');
const dur = performance.getEntriesByName('checkout').pop().duration;
beacon('flow.checkout', dur);

Segmentation

Aggregated metrics across all users hide real issues. Always segment by:

  • Device class (low/mid/high via deviceMemory, hardwareConcurrency).
  • Network (4g/3g/2g via effectiveType).
  • Country / region.
  • Route (/checkout vs /home have different bars).
  • First visit vs repeat (cache state matters).
  • Logged in vs anonymous.

A 2s p75 LCP "global" usually hides a 5s p95 LCP for users in slow markets on slow phones.

Percentiles

p50/median hides the tail. p75 is Google's threshold for "good." p95 catches the worst-affected. Track both.

Alerting

Alert on rate of change, not absolute:

  • "p75 LCP for /checkout regressed >10% week-over-week" ← real signal.
  • "LCP > 3s" ← fires constantly during traffic events, useless.

CrUX — free real-user data for any URL

PageSpeed Insights → field data section shows CrUX p75 LCP/INP/CLS for any public URL. Useful for:

  • Benchmarking against competitors.
  • Tracking your own progress over months.
  • Confirming RUM matches independent measurement.

Tools

NeedTool
PR regression catchLighthouse CI
Deep one-off investigationWebPageTest
Field data, dashboardsDatadog RUM, New Relic, Sentry Performance
Public CrUXPageSpeed Insights
Privacy-friendly RUMCloudflare Web Analytics, Plausible
Inline measurementweb-vitals + your analytics pipeline
Vercel-deployed Next.js appsVercel Speed Insights

Tying to business

Get buy-in by tying metrics to revenue:

  • "p75 LCP -800ms → projected +X% conversion (per Akamai/BBC studies)."
  • "INP regression coincides with -Y% engagement on /checkout."
  • "Mobile India p95 LCP improvement → +Z signups."

Without dollar attribution, perf work loses to feature work.

Process

  1. Ship RUM. Wait a week for baseline.
  2. Find the worst (route × device × geography) p75 metric.
  3. Open DevTools → Performance, reproduce on a throttled CPU.
  4. Identify the dominant cost (image, JS, third-party, server).
  5. Fix. Validate metric movement on real users (not just lab).
  6. Set a CI budget to prevent regression.

Mental model

Lab = leading indicator (catches regressions). Field = trailing indicator (proves real-user impact). Use both. Trust field for prioritization; trust lab for "did this change break anything."

Follow-up questions

  • Why use sendBeacon over fetch for RUM?
  • How do you avoid alert fatigue on perf metrics?
  • What's the difference between Lighthouse and CrUX?
  • How would you measure performance of a flow that spans multiple pages?

Common mistakes

  • Measuring only in the lab — synthetic perf can pass while real users suffer.
  • Looking at averages instead of p75/p95.
  • Not segmenting — global average hides the slow market.
  • Alerting on absolute thresholds that fire all the time.
  • Forgetting custom marks for product-critical flows.
  • Letting RUM data collect dust — measure to act, not just to measure.

Performance considerations

  • RUM itself has tiny perf cost (beacon, listener). Don't ship redundant analytics SDKs. Sample low-priority data. The biggest cost is misreading the data — chasing the wrong metric wastes weeks of engineering time.

Edge cases

  • INP only fires on real interactions; pages with no clicks produce no INP entry.
  • SPAs: route transitions need explicit instrumentation; web-vitals doesn't auto-fire.
  • Cross-origin iframes have their own perf timing, invisible to parent.
  • Bfcache (back/forward) loads behave differently — Chrome reports separate metrics.
  • Sampling: at high traffic, 1-10% RUM sample is enough; full sample burns analytics quota.

Real-world examples

  • Pinterest, Etsy, Shopify all publish their RUM-driven perf wins.
  • Vercel Speed Insights is built on web-vitals + per-deploy comparison.
  • Sentry Performance ties RUM to error context.
  • Cloudflare's free web analytics surfaces Core Web Vitals without tracking cookies.

Senior engineer discussion

Seniors lead with field data, use lab for regression catch, segment thoughtfully, and tie metrics to business outcomes. They don't chase Lighthouse scores in isolation; they instrument product flows because Core Web Vitals don't measure those.

Related questions