How would you design a real time, scalable gaming dashboard?
A realtime gaming dashboard needs sub-second latency for game state, leaderboards, and player stats. Architecture: WebSocket for live state, SSE or polling for slower updates, edge pub/sub fan-out (Redis / NATS), CDN-cached snapshots for cold loads, virtualized leaderboard rendering. Critical concerns: backpressure, reconnect, last-write-wins ordering, fairness during partial outages, anti-cheat read paths. Trade-off: WebSocket fan-out scales horizontally but adds infra; SSE is simpler but one-way.
What 'realtime gaming dashboard' implies
A surface showing:
- Live game state (scores, positions, events) — sub-second.
- Leaderboards updating as events occur.
- Player stats and activity.
- Possibly chat, notifications, friend status.
Scale: tens of thousands to millions of concurrent viewers.
Latency budget
| Surface | Target latency |
|---|---|
| Live match state | <500ms tick-to-pixel |
| Leaderboard | 1-3s |
| Player stats | seconds-to-minutes |
| Static metadata | minutes (CDN-cached) |
The architecture splits along this gradient.
High-level architecture
┌───────────────────────┐
│ Game servers │
│ (events: kills, hits)│
└───────────┬───────────┘
│ pub
┌────────▼────────┐
│ Pub/sub bus │
│ Redis / NATS │
│ Kafka │
└─┬──────┬────────┘
│ │
┌─────────────┘ └──────────────┐
▼ ▼
┌───────────────────┐ ┌────────────────────┐
│ WS edge gateway │ │ Aggregator │
│ (fan-out per │ │ (leaderboard │
│ match channel) │ │ rollups) │
└──────┬────────────┘ └──────┬─────────────┘
│ WebSocket │ writes
▼ ▼
┌────────────┐ ┌────────────────┐
│ Browser │ ◀──────── HTTP ────── │ Cache (Redis) │
└────────────┘ snapshots/SSE └────────────────┘
│
▼
┌────────────────┐
│ Postgres │
│ (history) │
└────────────────┘Connection strategy
WebSocket for live game-state. Bi-directional isn't strictly required (clients mostly read), but WS scales horizontally and most ecosystems are built around it.
SSE is a great alternative for read-only streams — simpler, auto-reconnect, no protocol headache. Good for leaderboard ticks.
HTTP snapshots for cold-load: client first fetches "current state at T", then opens the live stream from T forward. This avoids needing the full history over WS.
Fan-out at scale
A single match can have 100k+ viewers. Per-connection broadcast is infeasible.
- Edge gateways subscribe to a Redis (or NATS / Kafka) channel per match.
- Gateways fan out to connected sockets.
- Scale by adding gateways; partition by match id.
match_id=42 ─▶ Redis channel "match:42"
├─▶ Gateway A ─▶ 10k sockets
├─▶ Gateway B ─▶ 10k sockets
└─▶ Gateway C ─▶ 10k socketsLeaderboard
Hot path. Patterns:
- Maintain in Redis sorted set (ZADD on score change, ZRANGE for top-N).
- Aggregator subscribes to events, updates sorted set.
- Read path: HTTP cached for N seconds OR SSE stream per leaderboard.
- Pagination: top-100 + viewer's neighborhood (10 above / 10 below their rank).
Snapshot + delta protocol
To avoid retransmitting full state:
- Client connects, server sends full snapshot
{ matchId, state, version: 42 }. - Subsequent messages are deltas:
{ version: 43, op: 'score', team: 'red', value: 5 }. - If client misses messages (gap in versions), it requests a fresh snapshot.
This is how every multiplayer game protocol works (with binary instead of JSON for the wire).
Reconnection
WebSockets die. Plan for it:
- Exponential backoff reconnect (1s, 2s, 4s, max 30s).
- Resume from last seen version.
- If gap > threshold, re-snapshot.
- Show subtle "reconnecting" indicator, not a modal.
Virtualized rendering
Leaderboard with 10k rows → virtualize. Top-100 + your row pinned. Don't render off-screen.
Live event feed: append-only, also virtualized.
State on the client
- React + Zustand / Jotai / Valtio for live state slice.
- TanStack Query for HTTP-fetched snapshots and history.
- Memoize aggressively — at 10 events/sec a naive React tree re-renders the world.
Backpressure
What if events arrive faster than the client can render?
- Coalesce: drop intermediate snapshots, only render the latest per frame.
- requestAnimationFrame batching — N events → 1 paint.
- Throttle leaderboard to 1Hz on the wire.
Failure modes & UX
- WS dies → reconnect silently, then re-snapshot.
- Backend lag → show "behind by Xs" badge.
- Leaderboard stale → cached value with timestamp.
- Total outage → graceful "live updates paused" banner; show last known state.
Anti-abuse / fairness
For competitive scenarios:
- Don't expose raw event timestamps that could leak server-side info.
- Throttle per-user open WS connections.
- Authenticate WS handshake with same token as REST.
Observability
- Connections: count, churn rate, geographic distribution.
- Events: rate, lag (publish → delivered).
- Render: FPS, dropped frames on the dashboard.
- Snapshot frequency: spikes = something wrong upstream.
Cost levers
- WS connections cost memory at the gateway — pick instances with enough sockets/RAM.
- Fan-out via Redis is cheap; per-message DB writes are expensive — batch.
- Use a CDN for snapshot-cacheable views (top leaderboards, schedules).
Recommended stack
- Gateway: Node + uWebSockets or Go.
- Pub/sub: Redis Streams or NATS JetStream.
- Cache: Redis sorted sets.
- Snapshot store: Postgres + materialized views.
- Edge: CloudFront / Fastly for static + cold snapshots.
- Frontend: React + TanStack Query + react-virtual + a thin WS client.
Mental model
The architecture splits by latency requirement: WS for live game state, SSE/polling for leaderboards, HTTP+cache for static. Fan-out via a pub/sub bus, gateways subscribe per channel, clients connect to the nearest gateway. Snapshot + delta protocol keeps wire volume sane. Plan for reconnect, backpressure, and partial outages from day one — at gaming scale they happen constantly.
Follow-up questions
- •How would you scale the WebSocket layer past 1M concurrent viewers?
- •How do you handle backpressure when events outpace render?
- •What's the snapshot+delta protocol look like in detail?
- •How do you keep the leaderboard fair under partial outages?
Common mistakes
- •Naive per-connection broadcast — does not scale past a few thousand.
- •Full-state retransmits on every event — saturates client and bandwidth.
- •No snapshot fallback — clients with reconnect gaps see broken state.
- •Re-rendering React on every event without coalescing — UI jank.
- •No backpressure plan — event storms freeze the dashboard.
Performance considerations
- •WS gateways handle 50-100k sockets per process with proper tuning. Redis sorted set ZADD/ZRANGE is sub-ms at million-key scale. Client render is the usual bottleneck — coalesce events to ≤60Hz, virtualize lists, memoize.
Edge cases
- •Match server crashes mid-game — what state survives?
- •User on flaky mobile network — reconnects every 10s.
- •Leaderboard tie-breakers under partial event loss.
- •Region failover — viewers re-pin to new gateway.
Real-world examples
- •Twitch live stats use a similar fan-out architecture.
- •ESPN live scoreboards combine WS push with HTTP fallback.
- •Riot Games' live esports dashboards use pub/sub fan-out at edge.
- •Sports betting dashboards (DraftKings, FanDuel) need sub-second updates.