Design a Rate limiter for API calls in a SPA.
A client-side limiter that throttles outgoing requests: a token-bucket or sliding-window counter to cap request rate, a queue for excess requests, request deduplication and caching to reduce volume, plus respecting server 429s with backoff. The client limiter is UX/protection, not security.
A client-side rate limiter caps the rate at which a SPA sends requests — to respect a known API limit, protect a fragile backend, and smooth bursts. Important framing: a client limiter is a cooperation/UX mechanism, not a security control — the server must still enforce its own limits.
Why you'd want one
- The API has a documented limit (e.g. 10 req/s) and you want to stay under it rather than getting 429s.
- Prevent self-inflicted bursts — a component that fires on every keystroke/scroll.
- Be a good citizen to a shared or fragile backend.
Core algorithm — token bucket (most common)
- A bucket holds up to
Ntokens; tokens refill at a steady rate (e.g. 10/sec). - Each request consumes a token. Token available → send immediately. No token → queue or reject.
- Allows short bursts (up to bucket size) while bounding the sustained rate. Simple and flexible.
Alternatives: sliding-window counter (count requests in the trailing window, block over the limit — more precise, no bursts), fixed window (simplest, but boundary bursts), leaky bucket (perfectly smooth output).
The request queue
When over the limit, you usually want to queue, not drop:
request() → token available? ── yes ──▶ send
│
no ──▶ enqueue ──▶ drained as tokens refill- Add priority — user-initiated requests jump ahead of background prefetches.
- Add a queue cap and timeout so it can't grow unbounded.
- Optionally concurrency limiting too — at most M requests in flight at once.
Reduce volume first (the best rate limiting is fewer requests)
- Debounce/throttle the events that trigger requests.
- Deduplicate in-flight identical requests; cache responses (React Query/SWR).
- Batch multiple calls into one where the API supports it.
Respect the server
- Honor 429 +
Retry-After; exponential backoff + jitter on retries. - Read
X-RateLimit-Remaining/-Resetif exposed and proactively slow down.
Implementation shape
Wrap fetch/axios in a limiter module (or an axios interceptor): limitedFetch(url) checks the bucket, sends or enqueues, and the queue drains on the refill timer. Centralized — every call site inherits it.
const limiter = createTokenBucket({ capacity: 10, refillPerSec: 10 });
async function apiCall(url, opts) {
await limiter.acquire(); // resolves when a token is free
return fetch(url, opts);
}UI
Don't fail silently — if requests are queued/delayed, reflect it ("syncing…"), and disable spammy actions.
The framing
"A token bucket caps the outgoing rate while allowing small bursts; excess requests go into a priority queue that drains as tokens refill, with a queue cap to stay bounded. But I'd reduce volume first — debounce, dedup, cache, batch — and respect server 429s with backoff. Crucially, the client limiter is for UX and being a good citizen; the server still must enforce limits itself, since the client can't be trusted."
Follow-up questions
- •Token bucket vs sliding window vs leaky bucket — tradeoffs?
- •Why is a client-side rate limiter not a security control?
- •How do you handle priority in the request queue?
- •How does this interact with server-side 429 responses?
Common mistakes
- •Treating a client limiter as a security mechanism (it's bypassable).
- •Dropping requests instead of queueing them.
- •An unbounded queue that grows forever under load.
- •Not reducing volume first (debounce/dedup/cache) — limiting symptoms not cause.
- •Ignoring server 429s and Retry-After.
Performance considerations
- •Reducing request volume (debounce, dedup, cache, batch) is more effective than limiting. Token bucket allows useful bursts; sliding window is smoother but stricter. A bounded priority queue keeps memory and latency predictable.
Edge cases
- •Queue growing unbounded — needs a cap and timeouts.
- •High-priority request stuck behind low-priority ones.
- •Server limit differs from the client's assumed limit.
- •Page navigation/unmount with requests still queued.
Real-world examples
- •A client wrapper enforcing a third-party API's documented rate limit.
- •p-limit / bottleneck-style concurrency+rate limiting around fetch; axios interceptor queueing requests.