Back to System Design
System Design
hard
mid

How would you design a real time, scalable gaming dashboard?

A realtime gaming dashboard needs sub-second latency for game state, leaderboards, and player stats. Architecture: WebSocket for live state, SSE or polling for slower updates, edge pub/sub fan-out (Redis / NATS), CDN-cached snapshots for cold loads, virtualized leaderboard rendering. Critical concerns: backpressure, reconnect, last-write-wins ordering, fairness during partial outages, anti-cheat read paths. Trade-off: WebSocket fan-out scales horizontally but adds infra; SSE is simpler but one-way.

10 min read·~30 min to think through

What 'realtime gaming dashboard' implies

A surface showing:

  • Live game state (scores, positions, events) — sub-second.
  • Leaderboards updating as events occur.
  • Player stats and activity.
  • Possibly chat, notifications, friend status.

Scale: tens of thousands to millions of concurrent viewers.

Latency budget

SurfaceTarget latency
Live match state<500ms tick-to-pixel
Leaderboard1-3s
Player statsseconds-to-minutes
Static metadataminutes (CDN-cached)

The architecture splits along this gradient.

High-level architecture

ts
                       ┌───────────────────────┐
                       │  Game servers         │
                       │  (events: kills, hits)│
                       └───────────┬───────────┘
                                   │ pub
                          ┌────────▼────────┐
                          │ Pub/sub bus     │
                          │ Redis / NATS    │
                          │ Kafka           │
                          └─┬──────┬────────┘
                            │      │
              ┌─────────────┘      └──────────────┐
              ▼                                   ▼
   ┌───────────────────┐                ┌────────────────────┐
   │ WS edge gateway   │                │ Aggregator         │
   │ (fan-out per      │                │ (leaderboard
match channel)   │                │  rollups)          │
   └──────┬────────────┘                └──────┬─────────────┘
          │ WebSocket                          │ writes
          ▼                                    ▼
   ┌────────────┐                       ┌────────────────┐
   │ Browser    │ ◀──────── HTTP ────── │ Cache (Redis)  │
   └────────────┘   snapshots/SSE       └────────────────┘


                                        ┌────────────────┐
                                        │  Postgres      │
                                        │  (history)     │
                                        └────────────────┘

Connection strategy

WebSocket for live game-state. Bi-directional isn't strictly required (clients mostly read), but WS scales horizontally and most ecosystems are built around it.

SSE is a great alternative for read-only streams — simpler, auto-reconnect, no protocol headache. Good for leaderboard ticks.

HTTP snapshots for cold-load: client first fetches "current state at T", then opens the live stream from T forward. This avoids needing the full history over WS.

Fan-out at scale

A single match can have 100k+ viewers. Per-connection broadcast is infeasible.

  • Edge gateways subscribe to a Redis (or NATS / Kafka) channel per match.
  • Gateways fan out to connected sockets.
  • Scale by adding gateways; partition by match id.
ts
match_id=42 ─▶ Redis channel "match:42"
                ├─▶ Gateway A ─▶ 10k sockets
                ├─▶ Gateway B ─▶ 10k sockets
                └─▶ Gateway C ─▶ 10k sockets

Leaderboard

Hot path. Patterns:

  • Maintain in Redis sorted set (ZADD on score change, ZRANGE for top-N).
  • Aggregator subscribes to events, updates sorted set.
  • Read path: HTTP cached for N seconds OR SSE stream per leaderboard.
  • Pagination: top-100 + viewer's neighborhood (10 above / 10 below their rank).

Snapshot + delta protocol

To avoid retransmitting full state:

  1. Client connects, server sends full snapshot { matchId, state, version: 42 }.
  2. Subsequent messages are deltas: { version: 43, op: 'score', team: 'red', value: 5 }.
  3. If client misses messages (gap in versions), it requests a fresh snapshot.

This is how every multiplayer game protocol works (with binary instead of JSON for the wire).

Reconnection

WebSockets die. Plan for it:

  • Exponential backoff reconnect (1s, 2s, 4s, max 30s).
  • Resume from last seen version.
  • If gap > threshold, re-snapshot.
  • Show subtle "reconnecting" indicator, not a modal.

Virtualized rendering

Leaderboard with 10k rows → virtualize. Top-100 + your row pinned. Don't render off-screen.

Live event feed: append-only, also virtualized.

State on the client

  • React + Zustand / Jotai / Valtio for live state slice.
  • TanStack Query for HTTP-fetched snapshots and history.
  • Memoize aggressively — at 10 events/sec a naive React tree re-renders the world.

Backpressure

What if events arrive faster than the client can render?

  • Coalesce: drop intermediate snapshots, only render the latest per frame.
  • requestAnimationFrame batching — N events → 1 paint.
  • Throttle leaderboard to 1Hz on the wire.

Failure modes & UX

  • WS dies → reconnect silently, then re-snapshot.
  • Backend lag → show "behind by Xs" badge.
  • Leaderboard stale → cached value with timestamp.
  • Total outage → graceful "live updates paused" banner; show last known state.

Anti-abuse / fairness

For competitive scenarios:

  • Don't expose raw event timestamps that could leak server-side info.
  • Throttle per-user open WS connections.
  • Authenticate WS handshake with same token as REST.

Observability

  • Connections: count, churn rate, geographic distribution.
  • Events: rate, lag (publish → delivered).
  • Render: FPS, dropped frames on the dashboard.
  • Snapshot frequency: spikes = something wrong upstream.

Cost levers

  • WS connections cost memory at the gateway — pick instances with enough sockets/RAM.
  • Fan-out via Redis is cheap; per-message DB writes are expensive — batch.
  • Use a CDN for snapshot-cacheable views (top leaderboards, schedules).

Recommended stack

  • Gateway: Node + uWebSockets or Go.
  • Pub/sub: Redis Streams or NATS JetStream.
  • Cache: Redis sorted sets.
  • Snapshot store: Postgres + materialized views.
  • Edge: CloudFront / Fastly for static + cold snapshots.
  • Frontend: React + TanStack Query + react-virtual + a thin WS client.

Mental model

The architecture splits by latency requirement: WS for live game state, SSE/polling for leaderboards, HTTP+cache for static. Fan-out via a pub/sub bus, gateways subscribe per channel, clients connect to the nearest gateway. Snapshot + delta protocol keeps wire volume sane. Plan for reconnect, backpressure, and partial outages from day one — at gaming scale they happen constantly.

Follow-up questions

  • How would you scale the WebSocket layer past 1M concurrent viewers?
  • How do you handle backpressure when events outpace render?
  • What's the snapshot+delta protocol look like in detail?
  • How do you keep the leaderboard fair under partial outages?

Common mistakes

  • Naive per-connection broadcast — does not scale past a few thousand.
  • Full-state retransmits on every event — saturates client and bandwidth.
  • No snapshot fallback — clients with reconnect gaps see broken state.
  • Re-rendering React on every event without coalescing — UI jank.
  • No backpressure plan — event storms freeze the dashboard.

Performance considerations

  • WS gateways handle 50-100k sockets per process with proper tuning. Redis sorted set ZADD/ZRANGE is sub-ms at million-key scale. Client render is the usual bottleneck — coalesce events to ≤60Hz, virtualize lists, memoize.

Edge cases

  • Match server crashes mid-game — what state survives?
  • User on flaky mobile network — reconnects every 10s.
  • Leaderboard tie-breakers under partial event loss.
  • Region failover — viewers re-pin to new gateway.

Real-world examples

  • Twitch live stats use a similar fan-out architecture.
  • ESPN live scoreboards combine WS push with HTTP fallback.
  • Riot Games' live esports dashboards use pub/sub fan-out at edge.
  • Sports betting dashboards (DraftKings, FanDuel) need sub-second updates.

Senior engineer discussion

Seniors size the latency budget first and use it to drive the protocol choice per surface. They design the snapshot+delta protocol explicitly, plan reconnect/resume from day one, and add coalescing/backpressure throughout. They distinguish the WS connection problem (infra) from the leaderboard problem (data structure) from the render problem (React) and design each separately.

Related questions