Retry Postgres serialization failures with bounded attempts

14815
0

For workloads that run at SERIALIZABLE isolation (or that hit serialization conflicts under load), retries are part of the contract. The important part is to retry only the safe errors (typically SQLSTATE 40001) and to keep the loop bounded so you don’t create runaway contention. I like to wrap the transaction function in a small retry helper that re-runs the unit of work on serialization failures only. In production, I also add jittered backoff to reduce thundering herds when many transactions collide. This approach keeps call sites clean and ensures you don’t accidentally “retry everything”, which can hide real bugs. When you combine this with good observability (attempt count, conflict rate), you can tune the system rather than guessing. It’s one of those boring patterns that saves you during peak load.