Back to System Design
System Design
medium
mid

How would you implement A/B testing on the frontend without affecting current users?

Gate variants behind feature flags / an experimentation platform, assign users to stable buckets via consistent hashing, evaluate server-side (or pre-paint) to avoid flicker, keep the control path unchanged, instrument metrics, and ensure a clean kill switch and exposure logging.

7 min read·~15 min to think through

"Without affecting current users" means: the control group's experience is unchanged, there's no flicker, assignment is stable, and you can kill it instantly. A/B testing is feature flags + bucketing + measurement.

1. Assignment — stable and consistent

  • Consistent hashing — hash a stable id (user id, or a persistent anonymous id) → bucket. The same user always gets the same variant, across sessions and devices. Never random-per-render or per-session.
  • Define the population — who's eligible (new users? a region? logged-in?). Everyone else stays on control, untouched.
  • Percentage rollout — start small (1–5%), ramp up. The other 95–99% are unaffected by definition.
  • Respect held-out control groups for clean measurement.

2. Delivery — no flicker, control untouched

  • Evaluate server-side / at the edge where possible — the user gets the right variant in the initial HTML, no flash of the control then a snap to variant (the classic A/B "flicker"/FOUC).
  • If client-side, assign before first paint (a blocking head script or bootstrapped flag values) and gate rendering on flag readiness.
  • Don't fork the control path — variant code is additive and behind a flag; if the flag is off or evaluation fails, you fall through to the exact existing code path. Control users literally run the same code as before.
  • Code-split variant code so control users don't even download large variant bundles unnecessarily (or accept a small shared cost).

3. Use an experimentation platform

LaunchDarkly, Statsig, Optimizely, GrowthBook, Split, or in-house. They provide: bucketing, targeting, gradual rollout, exposure logging (who saw what, when), metric association, and statistical analysis. Don't hand-roll the stats.

4. Measurement — the whole point

  • Exposure events — log when a user is actually bucketed into the experiment (not just eligible), so analysis is sound.
  • Define metrics upfront — primary success metric + guardrail metrics (don't improve conversion while tanking performance or error rate).
  • Tie into analytics; let the platform run significance testing. Don't peek-and-stop early.

5. Safety

  • Kill switch — turn the experiment off instantly without a deploy if a variant misbehaves. This is the core "without affecting users" guarantee for variant users too.
  • Monitor error rates and performance per variant — a broken variant gets auto-flagged.
  • Clean teardown — when the experiment concludes, ship the winner as the default and delete the flag and the losing branch (experiment flags are tech debt if they linger).
  • Avoid overlapping experiments that confound each other (mutual exclusion groups).

The framing

"A/B testing is feature flags + consistent bucketing + measurement. Control users are unaffected because variant code is additive behind a flag — flag off means the original code path runs untouched. I'd assign via consistent hashing on a stable id, evaluate server-side to avoid flicker, start at a small percentage, log exposure events, define success + guardrail metrics, keep a kill switch, and clean up the flag when it's done."

Follow-up questions

  • Why is consistent hashing important for bucket assignment?
  • How do you avoid the A/B test 'flicker' of control showing before the variant?
  • What are guardrail metrics and why do they matter?
  • Why must you log exposure events, not just eligibility?

Common mistakes

  • Random per-session assignment, so a user flips between variants.
  • Client-side evaluation causing a flash of the control before the variant.
  • Forking the control path, so a bug in the experiment affects control users.
  • No kill switch; no cleanup of finished experiment flags.
  • Peeking at results and stopping early; no guardrail metrics.

Performance considerations

  • Server-side/edge evaluation removes a client request and the flicker. Code-split variant bundles so control users aren't penalized. Exposure logging must be lightweight. Bucketing should be a cheap local hash, not a network call per render.

Edge cases

  • Anonymous users with no stable id (need a persistent client id).
  • A user crossing devices mid-experiment.
  • Overlapping experiments confounding each other.
  • A variant that breaks — must be killable instantly.

Real-world examples

  • Statsig/LaunchDarkly/GrowthBook running a gradual rollout with consistent bucketing and exposure logging.
  • SSR-evaluated experiments so users never see the control flash before their variant.

Senior engineer discussion

Seniors frame A/B testing as feature flags + consistent bucketing + measurement, and explain 'no impact on current users' precisely: variant code is additive behind a flag so control falls through to the unchanged path. They cover consistent hashing on a stable id, server-side evaluation to kill flicker, exposure logging, guardrail metrics, kill switches, and flag cleanup as the lifecycle discipline.

Related questions