Back to System Design
System Design
medium
mid

How would you build a maps application like Google Maps with tile loading and a responsive UI?

Tile-based map: world is divided into z/x/y tiles per zoom level (~256x256 px). Load only visible tiles + small overscan. Use HTTP/2 multiplexing for parallel tile fetches; long-cache hashed tile URLs at the edge. Off-main-thread tile decode via OffscreenCanvas or web worker. Render via Canvas/WebGL — DOM doesn't scale. Pan/zoom via CSS transforms + RAF, not per-event DOM mutations. Predict next-likely tiles (direction of pan) and prefetch. Vector tiles + WebGL for richer interactions.

10 min read·~30 min to think through

Maps are a classic case where naïve "render everything" doesn't work. You can't put a billion DOM elements on a page; you can't fetch a billion tiles. The solution is a layered system: tile pyramid → viewport-aware loading → off-main-thread decode → Canvas/WebGL render → smooth pan/zoom.

Tile pyramid

The world is divided into a quadtree:

  • Zoom 0: 1 tile covers the whole world (256×256).
  • Zoom 1: 2×2 = 4 tiles.
  • Zoom 2: 4×4 = 16 tiles.
  • Zoom N: 2^N × 2^N tiles.

Each tile has a deterministic URL: /tiles/z/x/y.png (or .webp / .pbf for vector).

At any moment, only the visible tiles at the current zoom need to be loaded.

Viewport-aware loading

js
function visibleTiles(center, zoom, viewport) {
  const tileSize = 256;
  const minTileX = Math.floor((center.x - viewport.width / 2) / tileSize);
  const maxTileX = Math.ceil((center.x + viewport.width / 2) / tileSize);
  // … similar for y
  const tiles = [];
  for (let x = minTileX - 1; x <= maxTileX + 1; x++) {
    for (let y = minTileY - 1; y <= maxTileY + 1; y++) {
      tiles.push({ z: zoom, x, y });
    }
  }
  return tiles;  // ~20-30 tiles at typical viewport
}

The +1 / -1 padding (overscan) preloads adjacent tiles for smooth pan.

Caching

  • Browser HTTP cache: tile URLs are versioned; Cache-Control: public, max-age=31536000, immutable.
  • CDN edge: tiles are perfect CDN content — static, high hit rate, globally requested.
  • In-memory tile cache (JS map): keep recently-rendered tile imagery as decoded textures. Evict LRU when cache exceeds N tiles.
  • Service worker: optional, for offline maps.

Fetching at scale

  • HTTP/2 multiplexing: 20 tiles in parallel over one connection, no 6-per-origin limit.
  • AbortController: cancel in-flight requests for tiles no longer visible (rapid pan).
  • Priority queue: tiles at the center of the viewport get higher priority than edge tiles.

Decode

Decoding a PNG/WebP for each tile is CPU work. Off the main thread:

js
// Worker
self.onmessage = async (e) => {
  const blob = await fetch(e.data.url).then(r => r.blob());
  const bitmap = await createImageBitmap(blob);  // off-main-thread decode
  self.postMessage({ tileKey: e.data.tileKey, bitmap }, [bitmap]);
};

OffscreenCanvas + worker pipeline keeps the main thread free for pan/zoom interaction.

Render: Canvas/WebGL

DOM with thousands of <img> elements is a non-starter. Choices:

Canvas 2D: simpler, fine for raster tiles. Compose tiles + overlays per frame.

WebGL (or WebGPU): higher throughput, GPU shader pipeline, can render vector tiles (Mapbox GL, MapLibre style). Smooth zoom, rotation, 3D terrain.

Pan/zoom

The handler must be cheap:

js
let panX = 0, panY = 0, raf = 0;
canvas.addEventListener('pointermove', e => {
  if (dragging) {
    panX = e.clientX - startX;
    panY = e.clientY - startY;
    if (!raf) raf = requestAnimationFrame(applyPan);
  }
});
function applyPan() {
  raf = 0;
  canvas.style.transform = `translate3d(${panX}px, ${panY}px, 0)`;  // compositor-only
}

During the pan, use CSS transform on the entire canvas → compositor-only, no per-frame redraw. On pointerup, commit the new center, recalculate visible tiles, redraw at native position. This decouples pan smoothness from tile rendering cost.

For zoom, the same: scale-transform the canvas during pinch/wheel, then on settle, re-render at the new zoom level using the next zoom's tiles.

Predictive prefetch

When the user is panning in a direction, prefetch tiles in that direction:

js
const direction = { x: panVelocity.x > 0 ? 1 : -1, y: panVelocity.y > 0 ? 1 : -1 };
const predictedTiles = visibleTiles(center + 2 * direction, zoom, viewport);
predictedTiles.forEach(prefetchTile);

For zoom, prefetch the next zoom level's tiles when the user starts pinching in.

Vector tiles + WebGL

Modern maps (Mapbox, MapLibre, Google Maps Vector) ship vector tiles (.pbf) that the client renders via WebGL:

  • Smaller bytes (text-encoded geometry vs pixel data).
  • Smooth zoom (geometry scales without resampling).
  • Dynamic styling (light/dark mode without re-fetching).
  • Rotation, 3D buildings, terrain.

Cost: more client CPU/GPU work. Worth it for desktop and modern mobile.

UI responsiveness

  • All decode work on workers.
  • Render on Canvas/WebGL, not DOM.
  • Pan/zoom via CSS transform during gesture, full redraw on settle.
  • RAF-coalesce all updates.
  • Throttle tile-fetch requests (don't spam on every micro-move).

Memory management

  • Bounded tile cache (e.g., 256 tiles ≈ 64MB at 256×256 RGBA).
  • LRU eviction.
  • Release decoded ImageBitmaps when evicted.
  • For mobile, lower the budget.

Pitfalls

  • DOM-based map → unscalable.
  • Per-event DOM mutation → janky pan.
  • Synchronous decode → main thread blocked → INP regression.
  • Unbounded cache → OOM after long browsing.
  • No abort on rapid pan → wasted bandwidth.
  • No prefetch on continued pan → tiles pop in late.

Mental model

Maps are a streaming system: tiles are content fetched and decoded on demand, rendered via GPU, with predictive prefetch hiding latency. Every layer (network, decode, render, interaction) must be tuned independently because each can be the bottleneck.

Follow-up questions

  • How does vector tile rendering differ from raster?
  • How do you handle offline maps?
  • What's the right tile cache size for mobile?
  • How do you prevent decode-bound jank?

Common mistakes

  • DOM-based map elements — doesn't scale.
  • Per-pointer event redraw — janky.
  • Sync image decode on main thread.
  • Unbounded tile cache — eventual OOM.
  • No prefetch — tiles pop in.
  • Not cancelling stale tile requests on fast pan.

Performance considerations

  • Done right: 60fps pan/zoom on mid-range mobile, sub-second tile load over 4G, bounded memory. Done wrong: 10fps drag, multi-second tile loads, OOM. The architectural choices (Canvas vs DOM, worker decode, predictive prefetch) determine the perf ceiling.

Edge cases

  • Pinch zoom on mobile fires both wheel and touch events — coalesce.
  • Retina (DPR 2-3) needs different tile sizes or supersampled rendering.
  • Slow network: progressive tile fade-in beats blank tiles.
  • Memory pressure: respond to memorypressure event by evicting cache.
  • Anti-aliasing artifacts at tile boundaries — render with overscan.

Real-world examples

  • Google Maps: WebGL, vector tiles, predictive prefetch.
  • Mapbox GL / MapLibre: WebGL, vector tile pipeline, open source.
  • Leaflet: lightweight DOM-based for simple use cases; doesn't scale to richer interactions.
  • Apple Maps web: WebGL + vector.

Senior engineer discussion

Seniors design maps as a streaming, off-main-thread, GPU-rendered system. They pick the right tile format (raster vs vector) for the use case, design memory bounds, and treat pan/zoom as a separate concern from tile loading. They also leverage existing libraries (MapLibre) rather than building from scratch.

Related questions