reliability

Transactional Email “Send Once” with Delivered Marker

Emails should be idempotent. Store a delivered marker (or unique key) so retries don’t spam users. This pattern is especially useful for receipts and password reset flows.

by Sarah Chen

HTTP Timeouts + Retries Wrapper (Faraday)

I wrapped external HTTP calls once I realized most “flaky APIs” were actually my fault: no timeouts, unclear retries, and logs that didn’t tell a story. In Client with timeouts, I centralize a Faraday connection with explicit open_timeout and timeout,

by Sarah Chen

Graceful Degradation: Feature-Based Rescue

Not every failure should be a 500. If a non-critical dependency fails (e.g., recommendations), rescue narrowly, emit a metric/log, and serve a baseline response.

by Sarah Chen

Safer Background Job Arguments (Serialize IDs only)

Jobs should accept simple primitives (IDs, strings), not full objects. It avoids serialization surprises and makes jobs resilient across deploys. This also reduces job payload size.

by Sarah Chen

Schema-Backed Enums (DB Constraint + Rails enum)

Rails enums are nice, but the DB should enforce allowed values. Use a CHECK constraint (or native enum type) plus the Rails enum mapping. It prevents bad writes from console scripts and future migrations.

by Sarah Chen

Counter Cache Repair Job (Consistency Tooling)

Counter caches drift (deleted records, backfills, manual SQL). A repair job that recomputes counts safely is invaluable. It’s the kind of operational code you’re glad you wrote the first time a dashboard is wrong.

by Sarah Chen

Backend: normalize errors with a single Express handler

Without a centralized error handler, you end up with a mix of thrown errors, ad-hoc res.status(500) blocks, and inconsistent JSON shapes. I use one Express error middleware that maps known errors to stable codes and logs unknown errors with request co

by Mateo Rodriguez

Circuit breaker wrapper for flaky third-party APIs

When a dependency starts timing out, naive retries can amplify the outage by piling on more work. A circuit breaker gives the system a chance to breathe: after enough failures, it opens and returns a fast error, then it half-opens to probe recovery. I

by Mateo Rodriguez

Health checks with readiness + liveness

One /health endpoint is ambiguous: is the process alive, or is it actually ready to serve traffic? I split them. Liveness answers ‘should the orchestrator restart me?’ and is usually just ‘the event loop is alive’. Readiness answers ‘can I accept traf

by Mateo Rodriguez

reliability

Transactional Email “Send Once” with Delivered Marker

HTTP Timeouts + Retries Wrapper (Faraday)

Graceful Degradation: Feature-Based Rescue

Safer Background Job Arguments (Serialize IDs only)

Schema-Backed Enums (DB Constraint + Rails enum)

Counter Cache Repair Job (Consistency Tooling)

Backend: normalize errors with a single Express handler

Circuit breaker wrapper for flaky third-party APIs

Health checks with readiness + liveness

Background Job Backpressure with Queue Depth Guard

Safer Time-Based Deletes with “mark then sweep”

Idempotency keys for “create” endpoints