Monitoring + error tracking strategy
Layer it: error tracking (Sentry) for exceptions + source maps, RUM for real-user performance (Core Web Vitals), product analytics for behavior, and synthetic/uptime checks. Add error boundaries, global handlers, alerting with thresholds, release tracking, and PII scrubbing. The goal: know it broke before users tell you.
A monitoring strategy answers one question: when something breaks or slows down in production, do you find out — and can you diagnose it — before users complain? It's layered.
1. Error tracking
Catch and report exceptions — Sentry, Datadog, etc.:
- Global handlers —
window.onerror,unhandledrejectionfor uncaught errors and promise rejections. - React error boundaries — catch render-time crashes, report them, show a fallback instead of a white screen.
- Source maps uploaded to the service (not served publicly) so stack traces map to original code.
- Context — user id (non-PII), release version, browser, route, breadcrumbs (recent actions) — so an error is debuggable.
- Release tracking — tag errors with the deploy version to spot regressions and know which release introduced what.
2. Real User Monitoring (RUM) — performance
Measure what actual users experience:
- Core Web Vitals — LCP, CLS, INP — collected via
PerformanceObserver/ theweb-vitalslibrary. - Navigation/resource timing, API latency from the client.
- Segmented by device, geography, connection — averages hide the bad tail; watch p75/p95.
3. Product analytics — behavior
What users do — funnels, drop-off, feature usage (Amplitude, PostHog, GA). Distinct from error/perf monitoring but part of "is the app healthy."
4. Synthetic monitoring / uptime
Scripted checks hitting critical flows (login, checkout) on a schedule from outside — catches outages even when no real user has hit the bug yet.
5. Alerting — the part that makes it useful
Monitoring without alerting is just dashboards nobody looks at:
- Threshold + anomaly alerts — error rate spike, Web Vitals regression, a new error type, checkout funnel drop.
- Routed to the right people (Slack/PagerDuty), tuned to avoid noise/fatigue.
6. Cross-cutting
- PII scrubbing — never send passwords, tokens, personal data to third-party monitoring; scrub before send.
- Sampling — RUM/breadcrumbs sampled to control cost and volume.
- Privacy/consent — respect Do Not Track / consent where required.
- Dashboards — error rate, Web Vitals, uptime in one place.
The framing
"I'd layer it. Error tracking — Sentry with global handlers, React error boundaries, uploaded source maps, and release tagging so I can see which deploy broke what. RUM for real-user performance — Core Web Vitals via PerformanceObserver, watching p75/p95 not averages. Product analytics for behavior and funnels. Synthetic checks on critical flows so I catch outages before users do. The piece that makes it real is alerting — threshold and anomaly alerts on error rate and Web Vitals, routed to people, tuned against fatigue. And throughout: scrub PII before anything leaves the client. The goal is finding out it broke before users tell me."
Follow-up questions
- •Why upload source maps to your error tracker?
- •Why look at p75/p95 instead of average performance?
- •What's the difference between RUM and synthetic monitoring?
- •How do you avoid sending PII to third-party monitoring?
Common mistakes
- •Monitoring without alerting — dashboards nobody watches.
- •No source maps — unreadable minified stack traces.
- •Tracking averages, missing the bad tail (p95).
- •Sending PII/tokens to third-party services.
- •No error boundaries — render crashes white-screen silently.
- •Alert fatigue from noisy, untuned alerts.
Performance considerations
- •The monitoring SDKs themselves add weight and run code — load them async, sample RUM/breadcrumbs to control overhead and cost, and don't let instrumentation block the main thread.
Edge cases
- •Errors from browser extensions / third-party scripts polluting the data.
- •A spike that's actually one user in a loop.
- •Errors only in old cached client versions.
- •Ad blockers blocking the monitoring script itself.
Real-world examples
- •Sentry for errors + source maps + releases; web-vitals reporting to analytics for RUM.
- •Synthetic checks on the checkout flow alerting before customers report an outage.