How do you handle large data from API efficiently
Don't fetch it all — paginate or stream. On the client: virtualize rendering, normalize and cache, process heavy work in a Web Worker, select only needed fields, and use server-side filtering/sorting/aggregation. Push as much data work to the server as possible.
"Large data from an API" is a problem at three layers — transport, processing, and rendering — and the best fixes push work away from the browser.
1. Don't transport it all — the server should send less
- Pagination — cursor-based (
?after=...&limit=50), not offset, for stable, fast deep pages. Or infinite scroll. - Field selection — GraphQL or REST endpoints that return only the fields the UI needs. Don't pull whole fat objects.
- Server-side filtering / sorting / aggregation — let the database do it and return just the slice/summary. This is the biggest lever — the server is built for this.
- Streaming — for genuinely large responses, stream and process chunks as they arrive instead of waiting for the whole payload (streaming fetch, NDJSON).
- Compression (Brotli/gzip) on the response.
2. Cache and reuse — don't re-fetch
- React Query / SWR — caching, dedup, background refetch, stale-while-revalidate. The same query across components hits the cache.
- HTTP caching (
ETag,Cache-Control) for cacheable endpoints. - For very large client-side datasets, IndexedDB as a local store.
3. Process off the main thread
- Heavy client-side transforms (parsing, sorting, aggregating 100k+ rows) → Web Worker, so the UI doesn't freeze.
- Or chunk-and-yield across frames.
- Normalize the data (
{ [id]: item }) so lookups and updates are O(1). - Memoize derived computations.
4. Render only what's visible
- Virtualization — render the visible window only; DOM stays small no matter the dataset size.
- This applies even after pagination — a "page" of 1000 still shouldn't all hit the DOM.
5. Choose the right shape for the use case
- User browsing a list → paginate + virtualize.
- User searching/filtering → server-side filter, return matches.
- Need an aggregate/summary → compute server-side, send the summary.
- Bulk export/processing → stream, or do it server-side entirely.
The framing
"Three layers. Transport: don't send it all — cursor pagination, field selection, and especially server-side filtering/sorting/aggregation, plus streaming for huge payloads. Caching: React Query so we don't re-fetch. Processing: normalize, and move heavy transforms to a Web Worker. Rendering: virtualize so the DOM only holds the visible window. The guiding principle — push data work to the server and the network, and only ever hold/render what the user actually needs."
Follow-up questions
- •Why is cursor-based pagination better than offset?
- •When should data processing happen server-side vs in a Web Worker?
- •How does virtualization help even after you've paginated?
- •When would you use IndexedDB on the client?
Common mistakes
- •Fetching the entire dataset in one request.
- •Over-fetching fat objects when only a few fields are needed.
- •Doing heavy sort/filter/aggregation on the client main thread.
- •Rendering all rows without virtualization.
- •Re-fetching the same data repeatedly with no caching.
Performance considerations
- •Server-side filtering/aggregation is the highest-leverage fix — the DB is optimized for it. Pagination bounds payload; virtualization bounds DOM; Web Workers free the main thread; caching cuts request count. Each layer attacks a different bottleneck.
Edge cases
- •Data too large for memory even after pagination.
- •Real-time updates layered on top of paginated data.
- •Streaming responses that need incremental rendering.
- •Offline access requiring a local store.
Real-world examples
- •A dashboard: server-side aggregated summaries + cursor-paginated detail tables, virtualized rendering, React Query caching.
- •Parsing a large uploaded file in a Web Worker.