Rate limits are normal in production; what matters is how clients behave when they hit them. Instead of hammering an upstream with immediate retries, I parse Retry-After (seconds) and sleep before retrying. I still keep an upper bound so one request doesn’t stall forever. The helper uses context.Context so cancellations stop the wait, which is important during shutdown or when the caller times out. I also treat 429 differently from 5xx: 429 means “you’re too fast,” so the right response is to slow down, not to fan out more retries. In practice I pair this with a token bucket limiter so the client stays under the rate most of the time. This is one of those small details that makes integrations stable.