Back to System Design
System Design
hard
senior

How would you design an offline first frontend application?

Treat the local store as the source of truth; the network is an eventual-consistency partner. Cache shell assets via a Service Worker, persist data in IndexedDB (Dexie), queue mutations for replay on reconnect, and resolve merge conflicts via versioning or CRDTs. Surface offline state in the UI; never block the user on a request. Plan for partial connectivity (slow, flaky, captive portal), not just on/off.

10 min read·~25 min to think through

Offline-first reframes the architecture: the network is unreliable; the local cache is authoritative. Every read hits local first, every write happens locally first and syncs in the background.

The four layers

ts
┌──────────────────────────────────────────────┐
UI: reads from local store, never blocks     │
│      on network. Shows sync status.          │
├──────────────────────────────────────────────┤
│ Local store: IndexedDB (Dexie / idb)          │
- structured data, mutations queue       │
├──────────────────────────────────────────────┤
│ Sync engine: background fetch + reconciliation│
- retry with backoff                     │
- conflict resolution                    │
├──────────────────────────────────────────────┤
│ Service Worker: app shell + asset cache      │
- cache-first for static, stale-while-
│        revalidate for HTML
└──────────────────────────────────────────────┘

Layer 1: app shell via Service Worker

Cache the HTML, JS, CSS, fonts so the app loads with no network at all.

js
// sw.js
self.addEventListener("install", e => {
  e.waitUntil(caches.open("shell-v1").then(c => c.addAll(["/", "/app.js", "/app.css"])));
});

self.addEventListener("fetch", e => {
  if (e.request.destination === "document") {
    // stale-while-revalidate for the HTML
    e.respondWith(staleWhileRevalidate(e.request));
  } else if (e.request.destination === "script" || e.request.destination === "style") {
    // cache-first for hashed assets
    e.respondWith(caches.match(e.request).then(r => r || fetch(e.request)));
  }
});

Use Workbox in production — handles versioning, navigation preload, expirations.

Layer 2: data in IndexedDB

localStorage is sync, capped at ~5MB, string-only. Use IndexedDB via a wrapper:

ts
// Dexie
const db = new Dexie("app");
db.version(1).stores({
  tasks: "id, projectId, updatedAt",
  mutations: "++id, createdAt",
});

await db.tasks.add({ id: "1", title: "buy milk", updatedAt: Date.now() });
const myTasks = await db.tasks.where({ projectId: "p1" }).toArray();

UI reads through the local store and uses subscriptions / liveQuery so writes (local or remote-synced) propagate to the UI automatically.

Layer 3: mutation queue

Every user action becomes a mutation that's stored locally and applied to the local DB optimistically:

ts
async function createTask(t: Task) {
  await db.transaction("rw", db.tasks, db.mutations, async () => {
    await db.tasks.add(t);
    await db.mutations.add({ op: "create-task", payload: t, createdAt: Date.now() });
  });
  flushQueue(); // fire-and-forget
}

async function flushQueue() {
  const pending = await db.mutations.orderBy("id").toArray();
  for (const m of pending) {
    try {
      await api.send(m);
      await db.mutations.delete(m.id);
    } catch {
      break; // retry on next reconnect
    }
  }
}

Trigger flushQueue on:

  • App start.
  • Network back online (window.addEventListener("online", flushQueue)).
  • After each user action.
  • Periodically (every 30s) for partial-connectivity resilience.

Background Sync API (registration.sync.register("sync-mutations")) lets the Service Worker flush even after the tab is closed — great for mobile.

Layer 4: conflict resolution

When two clients edit the same record while offline, who wins?

1. Last-write-wins (LWW).

Each record has a updatedAt timestamp. On sync, server takes the newer. Simple, but loses data silently if clocks drift.

2. Version vectors / Lamport timestamps.

Each record has a version per client. Conflict = both versions advanced. Surface to the user for manual resolve.

3. CRDTs.

The principled solution. Each operation is commutative — apply in any order, converge. Yjs / Automerge for structured docs. Higher up-front cost; eliminates manual conflict UI.

4. Optimistic with revert.

Apply locally, send to server. If server rejects (validation, conflict), roll back the local change and show a toast.

UI: surface state

  • Online / offline indicator in the app shell.
  • Per-item sync state — pending, synced, error.
  • Disable destructive actions offline only when truly required (most can queue).
  • Optimistic counts ("3 unsynced changes").
  • Conflict banner when manual merge is needed.

Don't lie. Don't show "Saved" when it's just in the local queue — say "Saved locally, syncing…"

The catches

1. Storage quotas. IndexedDB has per-origin limits (~10% of disk on desktop, less on mobile). Use navigator.storage.estimate() to monitor; navigator.storage.persist() to request persistent storage (won't be evicted under pressure).

2. Cross-tab coordination. Two tabs both flushing the queue → duplicate mutations. Use BroadcastChannel + a single-writer lock (navigator.locks.request()).

3. Auth. What if the refresh token expires while offline? The mutation queue holds work that requires a new login on reconnect. UX: explicit re-login banner; never silently drop work.

4. Schema migrations. A user opens the app after months — local DB is on schema v1, app expects v3. Migration scripts must run before any data is read.

5. Privacy / multi-user devices. Offline data persists across logouts. Clear IndexedDB on logout.

6. Captive portals & flaky networks. navigator.onLine === true doesn't mean you have internet — it means you have a link. The real test is whether a small ping to your API succeeds. Build a reachability probe, don't trust the flag alone.

When NOT to go offline-first

  • Read-only consumer products where users always have data (news, video streaming) — Service Worker for the shell is enough.
  • Highly collaborative real-time apps where the source of truth is server state — offline is a degraded mode, not the default.
  • Cost: offline-first roughly doubles the implementation effort. Adopt when the product requires it (field workers, mobile-flaky markets, "works on a plane" UX).

Senior framing. Mention: (1) Service Worker for shell, (2) IndexedDB + queue, (3) optimistic UI, (4) conflict strategy chosen consciously (LWW / CRDT), (5) cross-tab coordination, (6) auth-while-offline, (7) UI honesty about sync state. The "we use a Service Worker" answer is junior; the architecture above is senior.

Follow-up questions

  • Why is `navigator.onLine` unreliable?
  • When would you choose CRDTs over LWW for conflict resolution?
  • How do you coordinate sync across multiple tabs?
  • What happens to the mutation queue when the user logs out?

Common mistakes

  • Trusting `navigator.onLine` as a connectivity check.
  • Using localStorage for app data — sync, capped, string-only.
  • Letting two tabs flush the queue concurrently.
  • Silent LWW conflict resolution that drops user changes.

Performance considerations

  • IndexedDB writes are async but can be slow on mobile — batch in transactions.
  • Service Worker cache strategies — stale-while-revalidate balances freshness and speed.
  • Don't pre-cache the entire dataset; sync on demand by entity.

Edge cases

  • Schema migrations from old offline data.
  • Storage quotas hit mid-write.
  • Background Sync not supported on iOS Safari (use periodic sync polyfill on app open).

Real-world examples

  • Notion, Linear, Things — local-first with sync engines.
  • Google Docs — online-first with offline as degraded mode.
  • Replicache, Liveblocks, Triplit — commercial sync engines.

Related questions