Back to React
React
medium
mid

How do you run A/B tests in React applications?

Use a feature-flag/experiment SDK (GrowthBook, Statsig, LaunchDarkly, Optimizely, Vercel/Posthog flags). Initialize on app start with a stable user id, then read variant via a hook: `const variant = useExperiment('checkout-cta')`. Render conditionally. Track exposure + conversion to the same analytics pipeline. SSR needs the variant resolved on the server to avoid hydration flicker. Always have a kill switch.

8 min read·~5 min to think through

A/B tests in React: a feature-flag SDK plus discipline around exposure tracking and SSR.

Pick an SDK

ToolNote
GrowthBookOpen source, self-hostable.
StatsigGenerous free tier, good DX.
LaunchDarklyEnterprise standard, expensive.
OptimizelyMarketer-friendly UI.
PostHogBundled with product analytics.
Vercel Edge ConfigLightweight, edge-native.

Don't roll your own — bucketing math and analytics integration are non-trivial.

Basic shape

tsx
// providers
import { GrowthBookProvider, useFeatureValue } from '@growthbook/growthbook-react';

const gb = new GrowthBook({
  apiHost: '...',
  clientKey: '...',
  enableDevMode: import.meta.env.DEV,
  attributes: { id: user.id, country: user.country },
});

<GrowthBookProvider growthbook={gb}>
  <App />
</GrowthBookProvider>
tsx
// at usage site
function Checkout() {
  const variant = useFeatureValue('checkout-cta', 'control');
  return variant === 'b' ? <NewCTA /> : <OldCTA />;
}

SSR / hydration safety

The biggest pitfall: server renders the control, client hydrates with the variant → flicker + mismatch.

Solutions:

  • Resolve the variant on the server (cookie-based user id) and pass it through.
  • Edge middleware (Next.js middleware.ts) rewrites the request based on bucketing.
  • For static pages, generate both variants and serve one based on edge.
ts
// Next.js middleware example
export function middleware(req: NextRequest) {
  let id = req.cookies.get('uid')?.value;
  if (!id) { id = crypto.randomUUID(); }
  const variant = hash(id, 'experiment-1') < 0.5 ? 'a' : 'b';
  const res = NextResponse.next();
  res.cookies.set('uid', id);
  res.cookies.set('exp-1', variant);
  return res;
}

Exposure tracking

You must log when a user is exposed to a variant — not when the experiment is configured.

tsx
useEffect(() => {
  analytics.track('experiment_exposure', {
    name: 'checkout-cta',
    variant,
  });
}, [variant]);

Most SDKs auto-fire exposure when the variant is read.

Metric design

  • Primary metric: the thing you predicted would change (conversion rate).
  • Guardrail metrics: things that shouldn't get worse (load time, error rate).
  • Sample size + duration: calculate before launch, don't peek early.
  • Significance: most SDKs run sequential testing or report p-values directly.

Engineering hygiene

  • Default to control: when SDK fails to load, render control.
  • Kill switch: feature flags decoupled from rollout %, so you can disable instantly.
  • Cleanup: remove flag once shipped. Don't let dead branches accumulate.
  • Don't A/B test in interview unless asked: experiments slow shipping for small teams.

Component pattern

tsx
function ExperimentSwitch({
  name, variants,
}: { name: string; variants: Record<string, ReactNode> }) {
  const variant = useFeatureValue(name, 'control');
  return <>{variants[variant] ?? variants.control}</>;
}

// usage
<ExperimentSwitch name="checkout-cta" variants={{
  control: <OldCTA />,
  b: <NewCTA />,
}} />

Anti-patterns

  • Bucketing client-side without SSR awareness → flicker.
  • Bucketing on browser fingerprint that changes → users flip variants.
  • Running 10 experiments at once on the same page → interactions confound.
  • Forgetting to clean up shipped experiments → forever-flags.

Follow-up questions

  • How do you prevent variant flicker during SSR?
  • Why must exposure events be fired, not just feature reads?
  • What does sequential testing solve compared to fixed-horizon p-values?

Common mistakes

  • Bucketing client-only — produces flicker and breaks SEO.
  • Forgetting to clean up shipped experiments — flag rot.
  • Running overlapping experiments without interaction analysis.

Performance considerations

  • SDKs add 5–30 KB to bundle. Edge resolution avoids round-trips. Lazy-load the SDK after the initial paint if experiments aren't on the LCP. Cache variant assignments in a cookie to avoid recomputation.

Edge cases

  • Users with multiple devices: bucketing on session id flips them per device.
  • Bots / web scrapers can skew metrics — filter at ingest.
  • Privacy regs (GDPR/CCPA) require consent for some experiment cookies.

Real-world examples

  • Booking.com is famous for running thousands of concurrent experiments. Vercel's open-source flag SDK uses Edge Config for low-latency resolution. Most SaaS apps use LaunchDarkly + Segment for end-to-end tracking.

Senior engineer discussion

Senior framing: experiments are organizational tools as much as technical ones. Discuss metric design, sample-size math, guardrails, and cleanup discipline alongside the React mechanics. Lack of process around experiments produces lots of code but no learning.