Back to System Design
System Design
hard
senior

How would you design a frontend feature flag system?

Two flag types: release flags (kill-switch, gradual rollout — short-lived) and experiment flags (A/B with variants and metrics — bounded life). Evaluate server-side when possible to avoid bundle-time flicker; deliver to the client via a context provider and a stable hashing rule on user/session id. Bake in: targeting rules, percentage rollouts, sticky assignment, defaults that fail safe, dashboards, and a process to remove dead flags.

10 min read·~20 min to think through

Feature flags are easy to add and impossible to remove. Designing the system means designing the lifecycle, not just the runtime check.

Two flag types — keep them separate

Release flags. Ship code dark; toggle on for some / all users; remove the flag once stable. Short-lived (days/weeks). Used for: gradual rollout, kill-switch, ops control.

Experiment flags. A/B test variants, measure metrics, pick a winner. Bounded life (1–4 weeks typical), then promote winner and remove.

Mixing them — "is this user in the new flow?" being both a rollout gate AND an experiment — is the path to flag chaos. Use different namespaces, dashboards, and ownership.

The evaluation question: server, edge, or client?

LocationProsCons
Server (during SSR)No flash, encoded in HTMLNeed user id at request time
Edge (CDN with vary key)Fast, no SSR costVary keys multiply cache entries
Client (after first paint)Simple, low infraUI flicker, blocked first-paint logic

Default to server-side evaluation. Resolve flags during SSR / on the first server-rendered HTML, embed the resolved values in the page, and hydrate. Avoids the "user briefly sees the old UI then it flips" experience.

If you must evaluate client-side, bootstrap with defaults that are safe (the conservative path) so first paint isn't blocked.

Data flow

ts
flag service (LaunchDarkly / GrowthBook / Statsig / self-hosted)
     ↓ rule definitions (push or pull)
SSR server   →    HTML with resolved flags inlined

                Browser (FlagProvider context)

                useFlag("new-checkout")

Sticky assignment. A user must get the same variant across sessions, devices, and re-fetches. Hash a stable key (user id, or anonymous id from a long-lived cookie) into the bucket:

ts
function bucket(userId: string, flag: string, pct: number): boolean {
  const hash = murmur3(`${flag}:${userId}`);
  return (hash % 100) < pct;
}

Critical that the hash function is stable across server and client — otherwise SSR shows one variant and client renders another.

The API surface

tsx
// hook
const showNewCheckout = useFlag("new-checkout", { default: false });

// component (for code splitting)
<Flag name="new-checkout" fallback={<OldCheckout />}>
  <NewCheckout />
</Flag>

// imperative (for non-React, e.g., analytics middleware)
flags.isEnabled("new-checkout", { userId });

Defaults that fail safe. The default should be the boring, known-good path. If the flag service is down or the value is missing, render the legacy experience.

Targeting rules

The runtime needs to evaluate against attributes:

  • User id (for sticky pct rollout).
  • User attributes (plan tier, region, beta opt-in).
  • Session attributes (device, locale, app version).
  • Random salt for control vs treatment.

Rule examples: "Enable for 5% of pro users in US"; "Enable for users in this allow-list".

Lifecycle — the part everyone skips

A flag system without a cleanup process accumulates flags forever. After 18 months: 200 flags, 30% dead code paths, nobody knows what's safe to remove.

Process to enforce.

  1. Required metadata per flag: owner, created date, expected removal date, purpose (release / experiment / ops).
  2. Stale-flag alerts: any flag older than 90 days with 100% / 0% rollout is dead — bot opens a removal PR.
  3. Two-step removal: first remove the flag's check in code (always-on or always-off); deploy; then remove the definition in the flag service. Avoids race during deploy.
  4. Code review rule: every new flag PR must include a removal plan.

The release-flag workflow

ts
1. PR adds new code behind flag, default off.
2. Merge to main; deploy. Code is dark.
3. Enable for internal users (employee email allow-list).
4. Enable for 1%5%25%50%100% over days.
5. Monitor error rates and product metrics per cohort.
6. At 100% stable for a week, schedule removal.
7. PR removes the flag check; code becomes default.
8. Delete the flag definition.

The experiment-flag workflow

ts
1. Define hypothesis + primary metric.
2. Set variant split (e.g., 50/50).
3. Run for N users / N days for statistical power.
4. Analyze; pick winner.
5. Ship the winner; delete the loser path.

Need a stats layer (CI intervals, p-values, sequential testing). Don't roll your own — use GrowthBook / Statsig / Optimizely.

Pitfalls

  1. Flag depth. Code paths nested 3 flags deep are untestable. Limit nesting; refactor into composable branches when needed.
  2. Flag-coupled state. Storing data shapes that differ per variant is dangerous — both branches must read each other's data on rollback.
  3. Hash drift. Changing the hash function or seed reshuffles users; an experiment in flight gets invalidated.
  4. Client-side only. Easy mode; opens the door to flicker, bundle-bloat, and tampering (user toggles flag in DevTools). Server-side at minimum for sensitive flows.
  5. Default-on flags. Make a deploy non-rollbackable for users already on. Keep defaults off until tested.
  6. No removal owner. "I'll clean up later" never happens. Owner + expiry up front.

Tooling — build vs buy

NeedBuildBuy
10 flags, 10 engineersYAML config in repo, deploy on changeoverkill
100 flags, 50 engineersSelf-host (Unleash, GrowthBook OSS)LaunchDarkly / Statsig
Experiment statisticsHard — use a libraryGrowthBook / Statsig built-in

Build when flags are simple, infrequent, or compliance-restricted. Buy when you need targeting rules, gradual rollouts with monitoring, and experimentation in one place.

Senior framing. The candidate who can describe (1) flag types separation, (2) server-side evaluation to avoid flicker, (3) sticky hashing for consistent assignment, (4) lifecycle / cleanup process, (5) build-vs-buy with reasons — is senior. The "we use process.env.FEATURE_X" answer is junior.

Follow-up questions

  • Why is server-side evaluation usually preferred?
  • What metadata does every flag need to avoid permanent accumulation?
  • How would you handle a long-running experiment fairly across new users joining mid-experiment?
  • What's the trade-off between LaunchDarkly and self-hosted Unleash?

Common mistakes

  • Mixing release flags and experiment flags in one system.
  • Client-only evaluation causing flicker.
  • No cleanup process — flag debt accumulates.
  • Defaults that fail-open into unfinished code paths.

Performance considerations

  • Edge evaluation needs cache vary keys per flag combination — can balloon CDN entries.
  • Client SDKs have a small initial payload + a streaming connection for live updates — budget the kB.

Edge cases

  • Anonymous → logged-in user assignment continuity.
  • Flag rollout during a deploy — code expects flag to exist before the value is published.
  • Multi-region: flag changes propagate asynchronously; brief inconsistency between regions.

Real-world examples

  • LaunchDarkly, GrowthBook, Statsig, Optimizely, Vercel Toolbar flags.
  • GitHub's `feature.enabled` system, Shopify's stages of rollout.

Related questions