Back to System Design
System Design
hard
mid

How would you design a real time collaborative text editor like Google Docs?

The core problem is conflict resolution on concurrent edits: OT (Operational Transformation) or CRDTs. Plus: a sync server (WebSockets), local-first optimistic edits, presence/cursors, document model + persistence, offline support, and undo/redo per-user. CRDTs (e.g. Yjs) are the modern go-to.

6 min read·~15 min to think through

A collaborative editor's hard problem isn't the UI — it's conflict resolution: two people editing the same spot at the same time must converge to the same document.

The core: OT vs CRDT

Naively syncing raw text positions breaks instantly — if I insert a character at index 5 while you delete index 2, my "index 5" is now wrong. Two solutions:

Operational Transformation (OT) — represent edits as operations (insert, delete at position). When a remote op arrives that was made concurrently with local ops, you transform it against them so it applies correctly. Powerful but complex to implement correctly; usually needs a central server to order operations. (This is what Google Docs historically used.)

CRDTs (Conflict-free Replicated Data Types) — data structures designed so that concurrent edits always merge deterministically regardless of order, without a central coordinator. Each character gets a unique, stable identity, so positions don't shift out from under you. Yjs and Automerge are mature CRDT libraries — the modern default because they're more robust, support offline/peer-to-peer, and you don't hand-roll the transform logic.

The rest of the architecture

  • TransportWebSockets for low-latency bidirectional sync of ops/updates. A sync server relays updates between clients (and with CRDTs can be a "dumb" relay).
  • Local-first / optimistic edits — apply the user's edit to their local document immediately (the editor must feel instant), then sync in the background. Never wait for the server round-trip.
  • Document model — a structured representation (not a raw string) — a rich-text model handling formatting, blocks, etc. (ProseMirror/Slate as the editor layer, often paired with Yjs).
  • Persistence — the server persists the document (snapshots + recent ops/updates) so it survives and new joiners can load it.
  • Presence — show who's online and their cursors/selections in real time. This is ephemeral state, synced separately from the document (Yjs has an "awareness" channel) — you don't persist it.
  • Undo/redo — must be per-user: undoing should revert my changes, not my collaborator's. CRDT libs provide scoped undo managers.
  • Offline support — local-first + CRDT means edits made offline merge cleanly on reconnect.
  • Access control & versioning — permissions, document history/snapshots.

The framing

"The hard part is conflict resolution on concurrent edits — raw position syncing breaks immediately. Two approaches: Operational Transformation, which transforms concurrent ops against each other but is complex and server-coordinated; or CRDTs, data structures where concurrent edits merge deterministically regardless of order. CRDTs — Yjs, Automerge — are the modern default: robust, offline-capable, no hand-rolled transform logic. Around that: WebSockets for sync, local-first optimistic edits so typing feels instant, a structured rich-text document model, server persistence with snapshots, a separate ephemeral channel for presence and cursors, and per-user scoped undo/redo."

Follow-up questions

  • Why does naive position-based syncing break?
  • OT vs CRDT — what are the trade-offs?
  • Why must undo/redo be per-user?
  • Why is presence synced separately from the document?

Common mistakes

  • Syncing raw text/positions without OT or CRDT.
  • Waiting for the server before applying the user's own edit (laggy typing).
  • Persisting ephemeral presence/cursor state with the document.
  • Global undo that reverts other people's changes.
  • Underestimating OT's implementation complexity.

Performance considerations

  • Local-first edits keep typing at zero latency. CRDT metadata grows with edit count — periodic snapshotting/garbage collection keeps document size bounded. Batch and debounce updates over the wire; presence is high-frequency but ephemeral so it can be lossy.

Edge cases

  • Two users editing the exact same character simultaneously.
  • A user editing offline for a long time, then reconnecting.
  • Network partition / reconnection and catching up on missed updates.
  • A late joiner needing the full current document state.
  • Large documents — snapshotting vs replaying all ops.

Real-world examples

  • Google Docs (OT historically); Figma, Linear, and many apps using CRDTs.
  • Yjs + ProseMirror/Slate + a WebSocket provider as a common production stack.

Senior engineer discussion

Seniors center the design on OT-vs-CRDT conflict resolution, favor CRDTs for robustness and offline support, insist on local-first optimistic edits, separate ephemeral presence from the persisted document, and handle per-user undo, reconnection, and snapshotting.

Related questions