How would you design a system to handle client side caching, API retries, and error boundaries gracefully?
Three layers. Cache: a data-fetching lib (React Query / RTKQ / SWR) with TTL, stale-while-revalidate, tag-based invalidation, and dedup. Retries: exponential backoff with jitter, only for idempotent methods, capped at 2-3 attempts; circuit-break after consecutive failures. Error boundaries: route-level and feature-level boundaries with fallback UIs, plus a global handler for unhandled rejections, plus log to monitoring (Sentry). Tie them together with a single fetch wrapper.
Treat caching, retries, and error handling as one system. They overlap (retried request still caches; cached response avoids retry) and the boundaries between them define the resilience story.
Layer 1: cache + dedup + invalidation
Don't roll this layer yourself. Use React Query, RTK Query, or SWR. All three give:
- Cache keyed by query args.
- Dedup in-flight requests with the same key.
- Stale-while-revalidate: return cache instantly, fetch in background, update on success.
- TTL (
staleTime/cacheTime). - Tag-based invalidation: mutate X → automatically refetch queries tagged X.
- Refetch on focus / reconnect / interval.
const { data } = useQuery({
queryKey: ['user', id],
queryFn: ({ signal }) => fetch(`/users/${id}`, { signal }).then(r => r.json()),
staleTime: 60_000,
retry: 2,
retryDelay: attempt => Math.min(1000 * 2 ** attempt, 30_000),
});Two components asking for the same user issue one request. The cache survives unmounts. If the user comes back to the tab, refetch in the background — no spinner.
Layer 2: retry policy
Retry only idempotent methods (GET, HEAD, PUT with idempotency key, DELETE on a tombstone). Never auto-retry POST without an idempotency-key header — duplicate charges, duplicate emails.
async function fetchWithRetry(input, init = {}, retries = 2) {
let attempt = 0;
while (true) {
try {
const res = await fetch(input, init);
if (res.status >= 500 && attempt < retries) throw new Error(`retry ${res.status}`);
return res;
} catch (err) {
attempt++;
if (attempt > retries || init.method !== undefined && !isIdempotent(init.method)) throw err;
const delay = 200 * 2 ** (attempt - 1) + Math.random() * 100; // exp backoff + jitter
await new Promise(r => setTimeout(r, delay));
}
}
}Jitter is critical: without it, a downed service comes back online and gets dog-piled by every client retrying in sync.
Circuit breaker for tougher resilience: after N consecutive failures to a host, stop retrying for M seconds. Prevents retry storms.
class CircuitBreaker {
failures = 0;
openedAt = 0;
isOpen() { return this.failures >= 5 && Date.now() - this.openedAt < 30_000; }
record(ok) { if (ok) this.failures = 0; else { this.failures++; this.openedAt = Date.now(); } }
}Layer 3: error boundaries
Two scopes:
Route-level: catches anything that breaks a whole page. Shows a fallback with retry + report.
<ErrorBoundary fallback={<RouteError />}>
<RouteContent />
</ErrorBoundary>Feature-level: catches errors inside a widget so the rest of the page survives.
<ErrorBoundary fallback={<WidgetError />}>
<Chart />
</ErrorBoundary>React's built-in error boundaries don't catch async errors or event handlers. For those, use onError callbacks from your data lib + a global handler:
window.addEventListener('unhandledrejection', e => log(e.reason));
window.addEventListener('error', e => log(e.error));Wire all of it into Sentry or similar so production errors are visible.
Putting it together: a single fetch wrapper
type Options = RequestInit & { timeoutMs?: number; retries?: number };
export async function api<T>(path: string, opts: Options = {}): Promise<T> {
const { timeoutMs = 10_000, retries = 2, ...rest } = opts;
const url = `${BASE_URL}${path}`;
for (let attempt = 0; attempt <= retries; attempt++) {
const ctrl = new AbortController();
const timer = setTimeout(() => ctrl.abort(), timeoutMs);
try {
const res = await fetch(url, { ...rest, signal: ctrl.signal });
if (res.status === 401) handle401();
if (res.status >= 500 && attempt < retries && isIdempotent(opts.method ?? 'GET')) {
await sleep(200 * 2 ** attempt + Math.random() * 100);
continue;
}
if (!res.ok) {
const body = await res.json().catch(() => ({}));
throw new ApiError(res.status, body.message ?? res.statusText, body);
}
return res.headers.get('content-type')?.includes('json') ? res.json() : (await res.text() as unknown as T);
} catch (err: any) {
if (err.name === 'AbortError' && attempt < retries && isIdempotent(opts.method ?? 'GET')) {
await sleep(200 * 2 ** attempt + Math.random() * 100);
continue;
}
throw err;
} finally {
clearTimeout(timer);
}
}
throw new Error('unreachable');
}Then plug api into React Query's queryFn:
useQuery({ queryKey: ['user', id], queryFn: () => api<User>(`/users/${id}`) });React Query handles caching, dedup, refetch. The wrapper handles auth, timeout, server-error retry. Error boundaries catch what propagates up. Optimistic updates handle UX for mutations.
UX surface
- Optimistic UI for mutations — instant feedback, rollback on rejection.
- Stale data while revalidating — never show a spinner if you have a cached value.
- Retry banner on errors with a button — don't auto-retry forever.
- Offline detection via
navigator.onLine+ queue mutations. - Toast for transient errors, inline for form-field errors, page for catastrophic.
Things to avoid
- Auto-retrying mutations without idempotency keys.
- Infinite retries — pick a cap.
- No jitter — synchronized retry storms.
- Treating all errors the same — 401/403/404/5xx have different UX.
- Eating errors silently — they should reach logging.
- Hand-rolling cache when React Query exists.
Follow-up questions
- •When is it safe to auto-retry a POST?
- •What's an idempotency key and how do you implement one?
- •How does React Query's stale-while-revalidate work under the hood?
- •What's a circuit breaker and when do you need one client-side?
Common mistakes
- •Auto-retrying POST without idempotency keys — duplicate side effects.
- •Retry without jitter — retry storms when service recovers.
- •Catching errors and not reporting them — invisible production failures.
- •Rolling your own cache instead of using React Query / RTKQ.
- •Error boundaries only at the root — one error nukes the whole app.
- •Optimistic updates without rollback — UI shows success when the server actually failed.
Performance considerations
- •Caching reduces request count by 50-90% for typical apps; dedup prevents thundering herds. Retries with backoff smooth over transient failures; circuit breakers prevent client-side amplification of server outages. Error boundaries contain blast radius so one widget's failure doesn't take down the page.
Edge cases
- •401 mid-session: refresh token, replay original request, or redirect to login — pick one and be consistent.
- •Offline → queue mutations → replay on reconnect with conflict resolution.
- •Background tab: pause polling, refetch on focus.
- •WebSocket reconnect: needs its own retry strategy + missed-message catch-up.
- •Server-Sent Events: built-in retry, but tune retry interval and add resume tokens.
Real-world examples
- •GitHub's web app uses React Query-style caching; cached responses don't count against rate limit thanks to ETags.
- •Linear pre-fetches related data so navigation feels instant.
- •Sentry / Datadog / Honeybadger ingest unhandled rejections from window.onerror.