429, retry with exponential backoff and jitter, then lower burst traffic if repeated retries occur.
Limit concurrency
The following Python example caps concurrent chat requests with an async semaphore:Retry rate limits
The following JavaScript example retries429 responses with jitter:
Common errors
| Error | Fix |
|---|---|
| Unlimited parallel requests | Add a semaphore, queue, or worker pool. |
| Retrying all failures | Retry only 429 and temporary server failures. |
| No per-model metrics | Log route, model ID, status, and latency for each request. |
| Retry storm | Add jitter and cap the maximum retry delay. |