reliability

Singleflight cache fill to prevent thundering herd

When a cache key expires, it’s easy for a burst of requests to stampede the database. I use singleflight.Group to ensure only one goroutine performs the expensive fill per key while others wait for the shared result. This doesn’t replace proper TTLs o

Email delivery via HTTP provider with context, timeout, and idempotency

For transactional email, the reliability problem is usually latency and retries, not MIME formatting. I prefer an HTTP email provider because requests are easy to bound with context.WithTimeout and easier to observe than raw SMTP. The code below build

HTTP handler with timeouts, request IDs, and context cancellation

I treat context.Context as the contract between the edge and everything downstream. The pattern here starts by creating a per-request timeout using context.WithTimeout, then storing a requestID in the context so logs and traces can correlate without p

Schema-Backed Enums (DB Constraint + Rails enum)

Rails enums are nice, but the DB should enforce allowed values. Use a CHECK constraint (or native enum type) plus the Rails enum mapping. It prevents bad writes from console scripts and future migrations.

Counter Cache Repair Job (Consistency Tooling)

Counter caches drift (deleted records, backfills, manual SQL). A repair job that recomputes counts safely is invaluable. It’s the kind of operational code you’re glad you wrote the first time a dashboard is wrong.

Backend: normalize errors with a single Express handler

Without a centralized error handler, you end up with a mix of thrown errors, ad-hoc res.status(500) blocks, and inconsistent JSON shapes. I use one Express error middleware that maps known errors to stable codes and logs unknown errors with request co

Circuit breaker wrapper for flaky third-party APIs

When a dependency starts timing out, naive retries can amplify the outage by piling on more work. A circuit breaker gives the system a chance to breathe: after enough failures, it opens and returns a fast error, then it half-opens to probe recovery. I

Health checks with readiness + liveness

One /health endpoint is ambiguous: is the process alive, or is it actually ready to serve traffic? I split them. Liveness answers ‘should the orchestrator restart me?’ and is usually just ‘the event loop is alive’. Readiness answers ‘can I accept traf

Background Job Backpressure with Queue Depth Guard

When downstream systems degrade, jobs pile up and amplify outages. Add a simple “queue depth guard” so non-critical jobs skip or reschedule instead of making the backlog worse.

HTTP server timeouts that prevent slowloris and stuck connections

The default http.Server will happily keep connections open longer than you intended, which is how you end up with “mysterious” goroutine growth during partial outages. I set ReadHeaderTimeout to protect against slowloris-style attacks, keep IdleTimeou

Store unstructured JSON safely with json.RawMessage

Not every integration payload maps cleanly to a static struct. When I need to accept “mostly known” JSON but preserve unknown fields for auditing or forward compatibility, I use json.RawMessage. That allows me to decode the envelope (event name, versi

Panic recovery middleware for HTTP servers

Even in Go, panics happen: a nil pointer in an edge case, a bad slice index, or a library bug. I don't want a single request to take down the whole process, so I wrap handlers with a recovery middleware that captures panics, logs them with request con