How would you implement feature flags safely?

Evaluate flags server-side when possible, ship a typed flag client, default to OFF, kill-switch every change, and prune flags aggressively after launch.

8 min read·~15 min to think through

Feature flags are how mature teams decouple deploy from release: you can ship code dark, enable it for 1% of users, watch metrics, and roll forward or kill the change without redeploying. Done badly, they create runtime hazards (a stale flag fires in production six months later), security issues (flag names leak product plans), and a codebase littered with dead branches no one dares delete. A safe implementation has roughly seven pieces.

1. A single source of truth. Use a flag service — LaunchDarkly, Statsig, GrowthBook, ConfigCat, Optimizely, or a homegrown service backed by a database with auditing. Never hardcode flag conditions in components, never use environment variables as flags (you can't change them at runtime, and they couple flagging to deploy). The service should expose a typed config, audit log, and rollback button.

2. Server-side evaluation when possible. Evaluate flags on the server (in Next.js: RSC, route handlers, edge middleware) and embed the resolved booleans into the rendered HTML. This avoids client flicker (the "v1 → v2 swap" you can see), prevents leaking unreleased flag names into the JS bundle (christmas_promo_2026 shows up in Sources tab), and keeps the SSR cache key consistent with the variant. For Next.js, libraries like Vercel's @vercel/flags and GrowthBook's SDK handle this idiomatically.

3. A typed flag client. Codegen a TS object from the flag definitions so flags.checkoutV2 is type-safe and flags.get("checkuot-v2") (typo) is a compile error. When a flag is removed from the config, every consumer becomes a type error, surfacing the dead code automatically. Without this, flag references drift and you can't tell which ones are still live.

4. Default OFF + kill switch. Every flag must have a "kill" value that resolves to the previous, known-good behavior. isEnabled('newCheckout', { fallback: false }) — and the fallback path is the unconditionally tested legacy code. Test the kill path in CI by running the suite once with all flags off. The kill switch should be reachable without a deploy and ideally under 60s end-to-end.

5. Lifecycle hygiene. Every flag gets an owner, an expiry date, and a removal ticket filed at creation time. A weekly Slack reminder or automated PR removes expired flags. Long-lived "permanent" flags (kill switches for infrastructure dependencies, premium tier gates) are tagged differently so they don't get pruned. Without this, a 3-year-old codebase has 400 dormant flags and the cyclomatic complexity of a fractal.

6. Targeting & rollout. Support multiple rules: user id, percentage rollout, environment, geography, plan tier, query-param override (?ff=checkoutV2) for QA. Always hash on a stable user id (or anonymous session id) so a user gets the same variant across visits — sticky bucketing matters for honest A/B tests. Percentage rollouts should be deterministic: hash(userId + flagName) % 100 < pct.

7. Observability. Emit a structured event whenever a flag is evaluated (with user id, flag, variant, timestamp), not when it's defined. Track per-flag conversion / error metrics. The first sign of a bad rollout is usually an error rate that diverges between variants — you should see it within minutes, not days. Hook the flag service to a metrics platform so anyone can plot variant impact.

SSR / CSR consistency. A flag evaluated on the server must be passed to the client (via initial props, cookies, or a context) so the first client render matches. Otherwise hydration mismatches.

Anti-patterns:

Boolean flags for what is really configuration (use a separate config system).
Nesting flags (if (newCheckout && newAddressForm && betaSearch)) — combinatorial paths nobody tests.
Putting business logic behind flags that survive shipping (the flag becomes architectural; refactor instead).
Reading flags inside hot loops instead of resolving once per request.

Treat flags like locks: cheap to add, expensive to leave around.

Code

Follow-up questions

•How do you handle hydration when a flag flips between SSR and CSR?
•What's your strategy for flag cleanup at scale?
•How would you A/B test an SSR-rendered page?

Common mistakes

•Ad-hoc booleans in env vars masquerading as flags.
•Forgetting to remove a flag after rollout — old branches rot.
•Evaluating client-side only → user sees flicker as flag arrives.

Performance considerations

•Server-evaluated flags add zero client cost.
•Watch for cache-key explosion if you key SSR cache on flag combinations.

Edge cases

•Flag service outage — must fall back to safe defaults, not block the app.
•Sticky bucketing across logged-out → logged-in transitions requires merging anonymous and user IDs.

Real-world examples

•GitHub uses flipper, Shopify uses Beta Flags, Vercel uses Edge Config + flag SDK for instant evaluation.