Feature flags at scale

Use a flag service (LaunchDarkly/Unleash/Flagsmith or in-house) with typed flag definitions, evaluate flags with sensible defaults, support targeting/gradual rollout/kill switches, evaluate server-side where possible to avoid flicker, and enforce flag lifecycle hygiene so they don't become permanent tech debt.

7 min read·~12 min to think through

Feature flags decouple deploy from release. At scale, the hard parts aren't toggling a boolean — they're consistency, performance, and lifecycle hygiene.

Flag types (they're not all the same)

Release flags — short-lived, gate an in-progress feature. Delete after launch.
Ops / kill switches — long-lived, instantly disable a feature in an incident.
Experiment flags — A/B tests, tied to analytics.
Permission / entitlement flags — per-plan or per-tenant; effectively config, long-lived by design.

Treating all four the same is a common mistake — their lifecycles differ.

Architecture

A flag service — LaunchDarkly / Unleash / Flagsmith / Statsig, or in-house. It owns flag definitions, targeting rules, and rollout state.
SDK in the app — fetches the flag ruleset, evaluates locally (fast), updates via streaming/polling.
Typed flag registry — flags defined in code with types and explicit defaults, so a missing/failed flag fetch falls back safely.

Evaluation: server-side where you can

Server-side / SSR evaluation avoids the client-side flicker where the UI renders the "off" state, then snaps to "on" after the flag loads.
If evaluating client-side, gate rendering on flag readiness or use sane defaults to avoid flash.
Bootstrapping — ship initial flag values with the HTML/SSR payload.

Targeting & rollout

Percentage rollouts, ring deployments (internal → beta → GA), targeting by user/tenant/region/plan.
Consistent bucketing — hash the user id so a given user always gets the same variant (no flickering between sessions).
Kill switch — every risky feature ships behind a flag you can flip off without a deploy.

Performance & reliability

Evaluate locally from a cached ruleset — never a network call per flag check.
Fail safe — if the flag service is down, fall back to defaults; the app must not break.
Bound the ruleset size; lazy-load experiment configs.

The discipline that actually matters: lifecycle hygiene

Flags rot. Without process you get hundreds of stale flags, dead code paths, and untestable combinatorial state.

Every release flag gets an owner and an expiry/cleanup ticket at creation.
Audit and remove stale flags regularly; lint for flags older than N days.
Test the flag-on and flag-off paths.
Don't nest flags into combinatorial explosions.

Summary

A flag service + typed registry with safe defaults + server-side evaluation + consistent bucketing gets you correctness and performance. Lifecycle hygiene is what keeps flags from becoming the worst tech debt in the codebase.

Follow-up questions

•How do you avoid the client-side flag flicker (FOUC of the wrong variant)?
•What happens if the flag service is unreachable?
•Why does consistent bucketing matter?
•How do you stop feature flags from becoming permanent tech debt?

Common mistakes

•No defaults — a failed flag fetch breaks or flickers the UI.
•Client-side-only evaluation causing a flash of the wrong variant.
•Never cleaning up flags — hundreds of stale flags and dead code paths.
•Treating release flags, kill switches, and entitlements as the same thing.

Performance considerations

•Flags must evaluate locally from a cached ruleset — a network round-trip per check is unacceptable. Server-side evaluation removes client flicker and an extra request. Keep the ruleset payload bounded; stream updates rather than poll aggressively.

Edge cases

•Flag service outage — must fail safe to defaults.
•A user's variant changing between sessions (inconsistent bucketing).
•Nested/dependent flags creating untested combinations.
•Flags evaluated differently on server vs client.

Real-world examples

•LaunchDarkly/Unleash gating a gradual rollout with a one-click kill switch.
•SSR-evaluated flags so logged-in users never see the wrong variant flash.

Senior engineer discussion

Seniors classify flags by lifecycle (release / ops / experiment / entitlement), insist on typed registries with safe defaults and fail-safe behavior, and prefer server-side evaluation to kill flicker. The senior signal is treating lifecycle hygiene — owners, expiry, cleanup, testing both paths — as the central long-term problem, not the toggling mechanism.